ChatTTS

A text-to-speech model optimized for dialogue scenarios like LLM assistants, supporting mixed Chinese and English input. It generates natural and expressive speech with fine-grained control over prosodic features like laughter and pauses.

One-Minute Overview#

ChatTTS is a generative speech model designed specifically for dialogue scenarios like AI assistants and role-playing, with particular strength in handling mixed Chinese and English inputs.

Core Value: It produces speech that is more natural and conversational than most open-source TTS models, offering fine-grained control over prosodic elements like laughter and pauses.

Quick Start#

Installation Difficulty: Moderate - Requires Python 3.11+ and PyTorch environment

# Clone the repository
git clone https://github.com/2noise/ChatTTS
cd ChatTTS

# Install dependencies
pip install --upgrade -r requirements.txt

Is this suitable for me?

✅ AI Development: Adding voice to virtual assistants or NPCs where natural dialogue is key.

✅ Research: Exploring prosody control or multi-speaker synthesis.

❌ Commercial Products: The model is licensed under CC BY-NC 4.0, strictly prohibiting commercial use.

❌ High-Fidelity Audio: To prevent misuse, the open-source version has compressed audio quality (MP3 format).

Core Capabilities#

1. Conversational TTS - Solving the "Robotic Voice" Problem#

Optimized for dialogue scenarios, generating fluent speech with natural intonation and rhythm. User Benefit: Significantly improves the human-like experience of AI interactions, reducing listener fatigue.

2. Fine-Grained Prosody Control - Solving Monotony#

Supports special tokens (e.g., [laugh], [break]) to control laughter, pauses, and interjections within the text. User Benefit: Developers can precisely orchestrate the emotional timing of the speech, adding dramatic flair.

3. Multi-Speaker Support - Solving Limited Variety#

Supports multi-speaker synthesis and allows sampling random speaker embeddings from a Gaussian distribution. User Benefit: Ideal for scenarios requiring multiple characters, enabling voice switching without training new models.

Maintenance Status#

Development Activity: Active, with a rich ecosystem of community-driven extensions.
Recent Updates: Ongoing development; Roadmap includes streaming generation and DVAE encoder.
Community Response: Active community via QQ and Discord with responsive issue handling.

Commercial & Licensing#

License: AGPLv3+ (Code) / CC BY-NC 4.0 (Model)

✅ Commercial Use: Prohibited (Model is for academic/educational use only)
✅ Modification: Allowed (with attribution)
⚠️ Restrictions: Strictly prohibited for illegal or malicious purposes

One-Minute Overview#

Quick Start#

Core Capabilities#

1. Conversational TTS - Solving the "Robotic Voice" Problem#

2. Fine-Grained Prosody Control - Solving Monotony#

3. Multi-Speaker Support - Solving Limited Variety#

Maintenance Status#

Commercial & Licensing#

Related Projects

oh-my-codex

Ironcurtain

vibe-remote

STAY UPDATED