Turn text into natural-sounding speech with a neural voice, locally.

First run downloads the voice (~90\xA0MB) and caches it — later runs are fast.

Voice

Speed1.0×

Expressiveness

Variation0.67

Cadence0.80

Higher values add more intonation and timing variation. The voice's defaults usually sound best.

Download format

WAV is produced directly; MP3 and OGG are transcoded with FFmpeg.

Text to Speech