Quickstart

A guide for those who just want to see it run.

You can go from first synthesis with built-in speakers in about 10 minutes, and finish benchmarking in about 15.

Reading Order

Installation — CPU vs GPU — Choose the right installation profile for your environment
First Voice — Generate and save a wav file with built-in speakers
Speed & Prosody Controls — Understand speed/pitch/prosody parameters
Custom Word Registration — Fix mispronounced words manually
Sentence-level Streaming — Send the first audio chunk of long texts quickly
Benchmark on Your Machine — Measure actual performance on your hardware

What You Can Do After This Section

Freely use the 11 pre-built speakers
Adjust speed, pitch, and prosody parameters
Measure "how many seconds it takes to generate 1 second of audio" on your hardware

Voices You Can Create

After completing the quickstart, you will have these speakers at your fingertips.

Here are samples of each speaker saying the same sentence ("こんにちは、はじめまして。").

JVNV jvnv-F1-jp — Female Speaker 1

0:00 / 0:00

JVNV jvnv-F2-jp — Female Speaker 2

0:00 / 0:00

JVNV jvnv-M1-jp — Male Speaker 1

0:00 / 0:00

JVNV jvnv-M2-jp — Male Speaker 2

0:00 / 0:00

つくよみちゃん tsukuyomi_chan — Anime-style

0:00 / 0:00

あみたろ amitaro_normal — Normal

0:00 / 0:00

あみたろ amitaro_runrun — Excited

0:00 / 0:00

あみたろ amitaro_yofukashi — Calm

0:00 / 0:00

あみたろ amitaro_punsuka — Angry

0:00 / 0:00

あみたろ amitaro_sasayaki_a — Whisper A

0:00 / 0:00

あみたろ amitaro_sasayaki_b — Whisper B

0:00 / 0:00

Custom speaker training is in progress

A guide for preparing recordings and training your own speaker is being written.

It will be available in the Custom Speaker Training section once ready.