Real-time Speech-to-Text (STT) & Model Selection

The core feature of Boundless Flow is real-time Speech-to-Text (STT) based on local models. It accurately and rapidly converts your speech into text and supports multiple output methods.

Mini Mode floating subtitle overlay
Mini Mode — floating real-time subtitle overlay

Real-time STT Introduction

Boundless Flow uses the advanced SenseVoice ONNX local model for inference, supporting both real-time output and final result output. This means that as you speak, the text will immediately appear on the screen.

ℹ️

There are now two STT paths: microphone real-time recognition still uses local SenseVoice ONNX inference, while the upgraded native-stt path is designed for offline audio-file transcription with native Whisper / SenseVoice backends.

Output Methods

In the settings, you can choose different output methods:

How to Start Real-time STT

1

Configure Model

Ensure you have correctly configured the model directory (see below).

2

Start Recording

Method 1: Click the "Start Recording" button on the main interface.
Method 2: Press the Right Alt key (RightAlt) on your keyboard from any screen.

3

Speak and Stop

Start speaking! You will see the recognized text appear in real-time. Press the shortcut again or click the stop button to end the recording.

Selecting and Configuring Different Models

For the speech recognition feature to work, a simple model configuration is required:

Configure Model Directory

  1. Open the main application interface and go to Settings.
  2. Locate the Model Directory configuration item.
  3. Select or enter the folder path where your SenseVoice model is located.

Recommended Model Download (ModelScope)

Boundless Flow uses a local SenseVoice ONNX model by default. Recommended download via ModelScope (see ModelScope docs; beginner steps: Appendix A):

modelscope download --model iic/SenseVoiceSmall --local_dir ./SenseVoiceSmall

After downloading, set Model Directory to the download folder (e.g., ./SenseVoiceSmall or an absolute Windows path).

⚠️

Note: Please ensure this folder contains the model.onnx and tokens.json files. If you haven't downloaded the model yet, please refer to the installation guide to get the model files.

Advanced STT Settings

native-stt Offline File Transcription

The upgraded native-stt path adds an offline transcription workflow alongside real-time microphone STT. It is intended for longer recordings, archived audio, meeting replays, and local file-based transcription.

native-stt configuration screen
native-stt configuration: choose backend, model, and audio file to transcribe

Best-fit Scenarios and Behavior

How to Configure native-stt

  1. In the STT panel, switch the backend to Whisper or SenseVoice.
  2. Select the corresponding model file or model path.
  3. Select the audio file you want to transcribe.
  4. Click Transcribe File; partial and final results will be appended to the Raw Result area.

Model and Format Notes

⚠️

Note: native-stt is currently positioned as an offline file-transcription feature, not a live microphone subtitle path. For speak-and-see workflows, continue using the default ONNX real-time STT pipeline.