Text-to-Speech (TTS) & Voice Cloning

Boundless Flow not only "understands" your words but also "speaks" your text. Through the built-in Python Bridge and local models, it provides powerful Text-to-Speech (TTS) and voice cloning capabilities. The system currently integrates two core engines: Qwen3-TTS and Index-TTS2, to meet various scenario needs.

Open 'Model Configuration'
Open "Model Configuration"
Select a model and configure the model path or API.
Select a model and configure the model path or API.
Upload reference audio
Upload reference audio
Synthesized speech text and parameters
Synthesized speech text and parameters
Generate audio files
Generate audio files

Qwen3-TTS Engine

Qwen3-TTS is a powerful, multi-functional voice synthesis engine that supports three different working modes:

🎙️ Base Model

Provides high-quality, natural, and fluent standard voice synthesis. Suitable for general text reading and audiobook production, generating clear speech without any reference audio.

👥 CustomVoice (Cloning)

By providing a clear 5-15 second reference audio, you can clone a voice highly similar to the reference. Perfect for creating personalized dubbing or digital avatars.

✨ VoiceDesign

No reference audio needed. Directly "design" the voice you want using text prompts. For example, input "A young female voice, speaking happily", and the model will generate a brand new voice matching the description.

Index-TTS2 Engine

Building upon voice cloning, the Index-TTS2 engine introduces finer emotional and stylistic control capabilities:

💡

Tip: The effect of voice cloning largely depends on the quality of the reference audio. Please try to use audio with clear pronunciation, no background noise, and a moderate speaking rate as a reference.

Local Model Download & Directory Setup (Recommended)

If you use local TTS (Qwen3-TTS / Index-TTS2), download the model files first, then fill in the TTS Model Directory in settings (see ModelScope docs; beginner steps: Appendix A).

Qwen3-TTS (Base / CustomVoice / VoiceDesign)

Download model folders for each mode:

modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./Qwen/Qwen3-TTS-12Hz-1.7B-Base
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local_dir ./Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local_dir ./Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign

In Boundless Flow settings, set TTS Model Directory to the model folder you want to use (e.g., Base mode uses ./Qwen/Qwen3-TTS-12Hz-1.7B-Base).

Index-TTS2 (Voice Cloning)

modelscope download --model IndexTeam/IndexTTS-2 --local_dir ./IndexTeam/IndexTTS-2

In Boundless Flow settings, set TTS Model Directory to the IndexTTS-2 model folder.

⚠️

Note: In offline mode, models are not auto-downloaded. If the directory does not exist or is not usable, TTS will fail.

Cloud API Service Configuration

In addition to powerful local models, Boundless Flow also supports integrating various mainstream cloud TTS API services, providing you with more diverse voice options and a more stable synthesis experience. Currently supported cloud APIs include:

Volcengine Configuration

After selecting Volcengine TTS in the settings panel, the minimum required fields are:

Optional fields include UID, Encoding, sample Rate, Speed/Volume/Pitch ratios, Emotion, etc. For signup and VoiceType selection, see Appendix C.

⚙️

Configuration Method: Please go to the "Settings" -> "API Configuration" panel in Boundless Flow, select your desired service provider, and enter the corresponding API Key and related parameters. Once configured, you can directly select the corresponding cloud voices in the TTS interface.

TTS Runtime Environment Configuration

To use the TTS and voice cloning features, you need to ensure that your local environment is correctly configured: