Real-time CPU Inference
With ONNX optimization, 1.5x faster for short texts and 3.3x faster for long texts compared to Style-Bert-VITS2 on CPU alone.
On GPU, torch.compile makes it even faster.
How we did it
Just bring a video or recording — we handle data prep, training, benchmarking, and deployment for you.
Here are samples of the built-in speakers saying the same sentence ("こんにちは、はじめまして。").







Want to generate the above samples yourself on your laptop, using only the CPU? Head to Get Started in 10 Minutes.
pip install hayakoepip install torch --index-url https://download.pytorch.org/whl/cu126
pip install hayakoe[gpu]The CPU profile does not require PyTorch, keeping installation fast and image size small.
The GPU profile installs additional dependencies but provides faster inference.
from hayakoe import TTS
text = "こんにちは、はじめまして。"
tts = TTS().load("jvnv-F1-jp").prepare()
tts.speakers["jvnv-F1-jp"].generate(text).save("hello.wav")Go ahead and listen to hello.wav!
There are 11 built-in speakers.
jvnv-F1-jp / jvnv-F2-jp / jvnv-M1-jp / jvnv-M2-jp — Based on the JVNV corpustsukuyomi_chan — Based on the Tsukuyomi-chan Corpusamitaro_normal / amitaro_runrun / amitaro_yofukashi / amitaro_punsuka / amitaro_sasayaki_a / amitaro_sasayaki_b — Based on the Amitaro ITA CorpusSimply replace "jvnv-F1-jp" in the code above to try a different voice.
If you installed the GPU profile, just add TTS(device="cuda") to run inference on GPU.
This project uses the following voice data for speech synthesis.