Text-to-Speech (TTS) & Voice Cloning

Boundless Flow not only "understands" your words but also "speaks" your text. Through the built-in Python Bridge and local models, it provides powerful Text-to-Speech (TTS) and voice cloning capabilities. The system currently integrates two core engines: Qwen3-TTS and Index-TTS2, to meet various scenario needs.

Open 'Model Configuration' — Open "Model Configuration"

Select a model and configure the model path or API.

Qwen3-TTS Engine

Qwen3-TTS is a powerful, multi-functional voice synthesis engine that supports three different working modes:

🎙️ Base Model

Provides high-quality, natural, and fluent standard voice synthesis. Suitable for general text reading and audiobook production, generating clear speech without any reference audio.

👥 CustomVoice (Cloning)

By providing a clear 5-15 second reference audio, you can clone a voice highly similar to the reference. Perfect for creating personalized dubbing or digital avatars.

✨ VoiceDesign

No reference audio needed. Directly "design" the voice you want using text prompts. For example, input "A young female voice, speaking happily", and the model will generate a brand new voice matching the description.

Index-TTS2 Engine

Building upon voice cloning, the Index-TTS2 engine introduces finer emotional and stylistic control capabilities:

Emotional Vector Control: Allows injecting specific emotional vectors during synthesis, giving the generated speech rich emotional colors like joy, sadness, or anger.
Prompt Guidance: Combined with text prompts, it can more accurately control the tone, intonation, and rhythm of pronunciation, making the cloned voice not only "sound alike" but also "feel real".

💡

Tip: The effect of voice cloning largely depends on the quality of the reference audio. Please try to use audio with clear pronunciation, no background noise, and a moderate speaking rate as a reference.

Local Model Download & Directory Setup (Recommended)

If you use local TTS (Qwen3-TTS / Index-TTS2), download the model files first, then fill in the TTS Model Directory in settings (see ModelScope docs; beginner steps: Appendix A).

Qwen3-TTS (Base / CustomVoice / VoiceDesign)

Download model folders for each mode:

modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./Qwen/Qwen3-TTS-12Hz-1.7B-Base
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local_dir ./Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local_dir ./Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign

In Boundless Flow settings, set TTS Model Directory to the model folder you want to use (e.g., Base mode uses ./Qwen/Qwen3-TTS-12Hz-1.7B-Base).

Index-TTS2 (Voice Cloning)

modelscope download --model IndexTeam/IndexTTS-2 --local_dir ./IndexTeam/IndexTTS-2

In Boundless Flow settings, set TTS Model Directory to the IndexTTS-2 model folder.

⚠️

Note: In offline mode, models are not auto-downloaded. If the directory does not exist or is not usable, TTS will fail.

Cloud API Service Configuration

In addition to powerful local models, Boundless Flow also supports integrating various mainstream cloud TTS API services, providing you with more diverse voice options and a more stable synthesis experience. Currently supported cloud APIs include:

Volcengine: Provides a wealth of high-quality voices, supporting multiple dialects and foreign languages.
OpenAI (TTS): Offers natural and realistic voice synthesis, supporting classic voices like alloy, echo, fable, onyx, nova, and shimmer.
MiniMax: A leading domestic voice large model, supporting highly expressive and emotional voice generation.

Volcengine Configuration

After selecting Volcengine TTS in the settings panel, the minimum required fields are:

AppId: Volcengine application identifier
Token: access token
Cluster: cluster id (e.g. volcano_tts / volcengine_tts)
VoiceType: voice id

Optional fields include UID, Encoding, sample Rate, Speed/Volume/Pitch ratios, Emotion, etc. For signup and VoiceType selection, see Appendix C.

⚙️

Configuration Method: Please go to the "Settings" -> "API Configuration" panel in Boundless Flow, select your desired service provider, and enter the corresponding API Key and related parameters. Once configured, you can directly select the corresponding cloud voices in the TTS interface.

TTS Runtime Environment Configuration

To use the TTS and voice cloning features, you need to ensure that your local environment is correctly configured:

Full Installation Package: If you are using the full installation package that includes the TTS runtime, no additional configuration is required.
Lite Package + Runtime Download: If you are using the Lite package, the application will prompt you to download and extract the TTS runtime (Python environment and related dependencies) the first time you use the TTS feature. Please follow the prompts or refer to the detailed instructions in INSTALL.md.

齐码蓝智能（大理市）有限责任公司