Real-time Speech-to-Text (STT) & Model Selection

The core feature of Boundless Flow is real-time Speech-to-Text (STT) based on local models. It accurately and rapidly converts your speech into text and supports multiple output methods.

Mini Mode floating subtitle overlay
Mini Mode — floating real-time subtitle overlay

Real-time STT Introduction

Boundless Flow supports both SenseVoice ONNX and FunASR for local real-time STT. Both paths provide interim streaming output and final sentence-level output, so text can appear while you are speaking.

ℹ️

STT paths: real-time microphone STT supports SenseVoice ONNX and FunASR. The native-stt path is focused on offline file transcription with native Whisper / SenseVoice backends.

Output Methods

In the settings, you can choose different output methods:

How to Start Real-time STT

1

Configure Model

Ensure you have correctly configured the model directory (see below).

2

Start Recording

Method 1: Click the "Start Recording" button on the main interface.
Method 2: Press the Right Alt key (RightAlt) on your keyboard from any screen.

3

Speak and Stop

Start speaking! You will see the recognized text appear in real-time. Press the shortcut again or click the stop button to end the recording.

Selecting and Configuring Different Models

For the speech recognition feature to work, a simple model configuration is required:

Select Backend and Configure Model Directory

  1. Open the main application interface and go to Settings.
  2. Select the STT backend you want to use (ONNX or FunASR).
  3. Locate Model Directory and point it to the corresponding model folder.

Recommended Model Download (ModelScope)

Boundless Flow uses a local SenseVoice ONNX model by default. Recommended download via ModelScope (see ModelScope docs; beginner steps: Appendix A):

modelscope download --model iic/SenseVoiceSmall --local_dir ./SenseVoiceSmall

After downloading, set Model Directory to the download folder (e.g., ./SenseVoiceSmall or an absolute Windows path).

If you choose the FunASR backend, download the FunASR-Nano model and set Model Directory to that folder:

modelscope download --model FunAudioLLM/Fun-ASR-Nano-2512 --local_dir ./Fun-ASR-Nano-2512
⚠️

Note: ONNX backend requires files such as model.onnx and tokens.json. FunASR backend requires the complete Fun-ASR-Nano-2512 model directory (for example: model.pt, config.yaml, and tokenizer assets).

Advanced STT Settings

native-stt Offline File Transcription

The upgraded native-stt path adds an offline transcription workflow alongside real-time microphone STT. It is intended for longer recordings, archived audio, meeting replays, and local file-based transcription.

native-stt configuration screen
native-stt configuration: choose backend, model, and audio file to transcribe

Best-fit Scenarios and Behavior

How to Configure native-stt

  1. In the STT panel, switch the backend to Whisper or SenseVoice.
  2. Select the corresponding model file or model path.
  3. Select the audio file you want to transcribe.
  4. Click Transcribe File; partial and final results will be appended to the Raw Result area.

Model and Format Notes

⚠️

Note: native-stt is currently positioned as an offline file-transcription feature, not a live microphone subtitle path. For live speak-and-see workflows, use the real-time ONNX or FunASR pipeline.