Quickstart
A guide for those who just want to see it run.
You can go from first synthesis with built-in speakers in about 10 minutes, and finish benchmarking in about 15.
Reading Order
- Installation — CPU vs GPU — Choose the right installation profile for your environment
- First Voice — Generate and save a wav file with built-in speakers
- Speed & Prosody Controls — Understand speed/pitch/prosody parameters
- Custom Word Registration — Fix mispronounced words manually
- Sentence-level Streaming — Send the first audio chunk of long texts quickly
- Benchmark on Your Machine — Measure actual performance on your hardware
What You Can Do After This Section
- Freely use the 11 pre-built speakers
- Adjust speed, pitch, and prosody parameters
- Measure "how many seconds it takes to generate 1 second of audio" on your hardware
Voices You Can Create
After completing the quickstart, you will have these speakers at your fingertips.
Here are samples of each speaker saying the same sentence ("こんにちは、はじめまして。").
JVNV jvnv-F1-jp — Female Speaker 1
JVNV jvnv-F2-jp — Female Speaker 2
JVNV jvnv-M1-jp — Male Speaker 1
JVNV jvnv-M2-jp — Male Speaker 2

つくよみちゃん tsukuyomi_chan — Anime-style

あみたろ amitaro_normal — Normal

あみたろ amitaro_runrun — Excited

あみたろ amitaro_yofukashi — Calm

あみたろ amitaro_punsuka — Angry

あみたろ amitaro_sasayaki_a — Whisper A

あみたろ amitaro_sasayaki_b — Whisper B
Custom speaker training is in progress
A guide for preparing recordings and training your own speaker is being written.
It will be available in the Custom Speaker Training section once ready.
