Real-time Streaming
VieNeu-TTS supports ultra-low latency streaming — audio playback starts before the entire sentence is generated.
Performance
- Latency: Under 300ms for first chunk on modern i3/i5 CPUs
- Optimized for: GGUF backend on CPU
- Sample rate: 24 kHz
Web Demo
uv run vieneu-stream
Open http://localhost:8001 in your browser.
SDK Streaming
from vieneu import Vieneu
tts = Vieneu()
for chunk in tts.infer_stream(text="Một đoạn văn rất dài..."):
play_audio(chunk)
Parameters
tts.infer_stream(
text="Your text here",
max_chars=256,
temperature=1.0,
top_k=50,
voice=voice_data,
ref_audio="ref.wav",
ref_text="...",
)
How It Works
- Text is split into chunks
- Each chunk is phonemized
- GGUF model generates tokens via streaming
- Every N tokens, a partial decode produces an audio chunk
- Overlap-add smooths chunk boundaries
The streaming uses configurable parameters:
streaming_frames_per_chunk: 25 frames per audio chunkstreaming_overlap_frames: 1 frame overlap for smooth transitionsstreaming_lookforward: 10 frames lookahead for quality