SDK Overview

The VieNeu-TTS SDK provides a Python interface for text-to-speech synthesis.

Architecture

Vieneu(mode=...)          # Factory function
  ├── "standard" → VieNeuTTS        (GGUF / PyTorch)
  ├── "fast"     → FastVieNeuTTS    (LMDeploy GPU)
  ├── "remote"   → RemoteVieNeuTTS  (API client)
  └── "xpu"      → XPUVieNeuTTS    (Intel Arc GPU)

Choosing a Mode

Mode	Backend	Best For
`standard`	GGUF (CPU) or PyTorch	Default, works everywhere
`fast`	LMDeploy	Maximum GPU throughput
`remote`	HTTP API	Lightweight client, heavy server
`xpu`	Intel XPU	Intel Arc GPUs

Two Components

Every mode loads two components:

Backbone — The transformer LLM that generates speech tokens
Codec — NeuCodec that decodes tokens into audio waveform

Models are auto-downloaded from HuggingFace on first use and cached at ~/.cache/huggingface/hub/.

Basic Pattern

from vieneu import Vieneu

tts = Vieneu(mode="standard")
audio = tts.infer(text="Xin chào bạn")
tts.save(audio, "output.wav")
tts.close()

Context Manager

with Vieneu() as tts:
    audio = tts.infer(text="Xin chào")
    tts.save(audio, "output.wav")
# Resources auto-released

Architecture​

Choosing a Mode​

Two Components​

Basic Pattern​

Context Manager​

Architecture

Choosing a Mode

Two Components

Basic Pattern

Context Manager