Skip to content

Quickstart

A guide for those who just want to see it run.

You can go from first synthesis with built-in speakers in about 10 minutes, and finish benchmarking in about 15.

Reading Order

  1. Installation — CPU vs GPU — Choose the right installation profile for your environment
  2. First Voice — Generate and save a wav file with built-in speakers
  3. Speed & Prosody Controls — Understand speed/pitch/prosody parameters
  4. Custom Word Registration — Fix mispronounced words manually
  5. Sentence-level Streaming — Send the first audio chunk of long texts quickly
  6. Benchmark on Your Machine — Measure actual performance on your hardware

What You Can Do After This Section

  • Freely use the 11 pre-built speakers
  • Adjust speed, pitch, and prosody parameters
  • Measure "how many seconds it takes to generate 1 second of audio" on your hardware

Voices You Can Create

After completing the quickstart, you will have these speakers at your fingertips.

Here are samples of each speaker saying the same sentence ("こんにちは、はじめまして。").

JVNV jvnv-F1-jp — Female Speaker 1
0:00 / 0:00
JVNV jvnv-F2-jp — Female Speaker 2
0:00 / 0:00
JVNV jvnv-M1-jp — Male Speaker 1
0:00 / 0:00
JVNV jvnv-M2-jp — Male Speaker 2
0:00 / 0:00
つくよみちゃん tsukuyomi_chan — Anime-style
0:00 / 0:00
あみたろ amitaro_normal — Normal
0:00 / 0:00
あみたろ amitaro_runrun — Excited
0:00 / 0:00
あみたろ amitaro_yofukashi — Calm
0:00 / 0:00
あみたろ amitaro_punsuka — Angry
0:00 / 0:00
あみたろ amitaro_sasayaki_a — Whisper A
0:00 / 0:00
あみたろ amitaro_sasayaki_b — Whisper B
0:00 / 0:00

Custom speaker training is in progress

A guide for preparing recordings and training your own speaker is being written.

It will be available in the Custom Speaker Training section once ready.