Inference Methods
infer()
audio = tts.infer(
text: str,
ref_audio: str = None,
ref_codes: Tensor = None,
ref_text: str = None,
voice: dict = None,
max_chars: int = 256,
silence_p: float = 0.15,
crossfade_p: float = 0.0,
temperature: float = 1.0,
top_k: int = 50,
skip_normalize: bool = False,
)
Parameters
| Parameter | Type | Description |
|---|---|---|
text | str | Text to synthesize |
ref_audio | str | Path to reference audio for voice cloning |
ref_codes | Tensor | Pre-encoded reference codes |
ref_text | str | Transcript of reference audio |
voice | dict | Preset voice dict from get_preset_voice() |
max_chars | int | Max characters per chunk (default 256) |
silence_p | float | Silence duration between chunks in seconds |
crossfade_p | float | Crossfade duration between chunks |
temperature | float | Sampling temperature |
top_k | int | Top-k sampling |
skip_normalize | bool | Skip text normalization |
Returns
numpy.ndarray — Audio waveform at 24 kHz.
Voice Priority
voicedict (from preset)ref_audio+ref_textref_codes+ref_text- Default preset voice
infer_batch()
audios = tts.infer_batch(texts: List[str], ...)
Returns List[numpy.ndarray]. PyTorch mode uses true batch generation; GGUF processes sequentially.
infer_stream()
for chunk in tts.infer_stream(text: str, ...):
play_audio(chunk)
Yields numpy.ndarray chunks (GGUF only).
save()
tts.save(audio: numpy.ndarray, output_path: str)
encode_reference()
codes = tts.encode_reference(ref_audio_path: str)
# Returns: torch.Tensor
close()
tts.close()
# Or use context manager:
with Vieneu() as tts:
audio = tts.infer(text="...")