Skip to main content

SDK Overview

The VieNeu-TTS SDK provides a Python interface for text-to-speech synthesis.

Architecture

Vieneu(mode=...)          # Factory function
├── "standard" → VieNeuTTS (GGUF / PyTorch)
├── "fast" → FastVieNeuTTS (LMDeploy GPU)
├── "remote" → RemoteVieNeuTTS (API client)
└── "xpu" → XPUVieNeuTTS (Intel Arc GPU)

Choosing a Mode

ModeBackendBest For
standardGGUF (CPU) or PyTorchDefault, works everywhere
fastLMDeployMaximum GPU throughput
remoteHTTP APILightweight client, heavy server
xpuIntel XPUIntel Arc GPUs

Two Components

Every mode loads two components:

  1. Backbone — The transformer LLM that generates speech tokens
  2. Codec — NeuCodec that decodes tokens into audio waveform

Models are auto-downloaded from HuggingFace on first use and cached at ~/.cache/huggingface/hub/.

Basic Pattern

from vieneu import Vieneu

tts = Vieneu(mode="standard")
audio = tts.infer(text="Xin chào bạn")
tts.save(audio, "output.wav")
tts.close()

Context Manager

with Vieneu() as tts:
audio = tts.infer(text="Xin chào")
tts.save(audio, "output.wav")
# Resources auto-released