SDK Overview
The VieNeu-TTS SDK provides a Python interface for text-to-speech synthesis.
Architecture
Vieneu(mode=...) # Factory function
├── "standard" → VieNeuTTS (GGUF / PyTorch)
├── "fast" → FastVieNeuTTS (LMDeploy GPU)
├── "remote" → RemoteVieNeuTTS (API client)
└── "xpu" → XPUVieNeuTTS (Intel Arc GPU)
Choosing a Mode
| Mode | Backend | Best For |
|---|---|---|
standard | GGUF (CPU) or PyTorch | Default, works everywhere |
fast | LMDeploy | Maximum GPU throughput |
remote | HTTP API | Lightweight client, heavy server |
xpu | Intel XPU | Intel Arc GPUs |
Two Components
Every mode loads two components:
- Backbone — The transformer LLM that generates speech tokens
- Codec — NeuCodec that decodes tokens into audio waveform
Models are auto-downloaded from HuggingFace on first use and cached at ~/.cache/huggingface/hub/.
Basic Pattern
from vieneu import Vieneu
tts = Vieneu(mode="standard")
audio = tts.infer(text="Xin chào bạn")
tts.save(audio, "output.wav")
tts.close()
Context Manager
with Vieneu() as tts:
audio = tts.infer(text="Xin chào")
tts.save(audio, "output.wav")
# Resources auto-released