A text-to-speech model optimized for dialogue scenarios like LLM assistants, supporting mixed Chinese and English input. It generates natural and expressive speech with fine-grained control over prosodic features like laughter and pauses.
One-Minute Overview#
ChatTTS is a generative speech model designed specifically for dialogue scenarios like AI assistants and role-playing, with particular strength in handling mixed Chinese and English inputs.
Core Value: It produces speech that is more natural and conversational than most open-source TTS models, offering fine-grained control over prosodic elements like laughter and pauses.
Quick Start#
Installation Difficulty: Moderate - Requires Python 3.11+ and PyTorch environment
# Clone the repository
git clone https://github.com/2noise/ChatTTS
cd ChatTTS
# Install dependencies
pip install --upgrade -r requirements.txt
Is this suitable for me?
- ✅ AI Development: Adding voice to virtual assistants or NPCs where natural dialogue is key.
- ✅ Research: Exploring prosody control or multi-speaker synthesis.
- ❌ Commercial Products: The model is licensed under CC BY-NC 4.0, strictly prohibiting commercial use.
- ❌ High-Fidelity Audio: To prevent misuse, the open-source version has compressed audio quality (MP3 format).
Core Capabilities#
1. Conversational TTS - Solving the "Robotic Voice" Problem#
- Optimized for dialogue scenarios, generating fluent speech with natural intonation and rhythm. User Benefit: Significantly improves the human-like experience of AI interactions, reducing listener fatigue.
2. Fine-Grained Prosody Control - Solving Monotony#
- Supports special tokens (e.g.,
[laugh],[break]) to control laughter, pauses, and interjections within the text. User Benefit: Developers can precisely orchestrate the emotional timing of the speech, adding dramatic flair.
3. Multi-Speaker Support - Solving Limited Variety#
- Supports multi-speaker synthesis and allows sampling random speaker embeddings from a Gaussian distribution. User Benefit: Ideal for scenarios requiring multiple characters, enabling voice switching without training new models.
Maintenance Status#
- Development Activity: Active, with a rich ecosystem of community-driven extensions.
- Recent Updates: Ongoing development; Roadmap includes streaming generation and DVAE encoder.
- Community Response: Active community via QQ and Discord with responsive issue handling.
Commercial & Licensing#
License: AGPLv3+ (Code) / CC BY-NC 4.0 (Model)
- ✅ Commercial Use: Prohibited (Model is for academic/educational use only)
- ✅ Modification: Allowed (with attribution)
- ⚠️ Restrictions: Strictly prohibited for illegal or malicious purposes