Turn text into natural-sounding speech with a neural voice, locally.
First run downloads the voice (~90\xA0MB) and caches it — later runs are fast.
1.0×
Expressiveness
0.67
0.80
Higher values add more intonation and timing variation. The voice's defaults usually sound best.
WAV is produced directly; MP3 and OGG are transcoded with FFmpeg.