Skip to content

HayaKoeNear-real-time TTS with your favorite voice,
on CPU alone.

Just bring a video or recording — we handle data prep, training, benchmarking, and deployment for you.

Sample Voices

Here are samples of the built-in speakers saying the same sentence ("こんにちは、はじめまして。").

JVNV jvnv-F1-jp — Female Speaker 1
0:00 / 0:00
JVNV jvnv-F2-jp — Female Speaker 2
0:00 / 0:00
JVNV jvnv-M1-jp — Male Speaker 1
0:00 / 0:00
JVNV jvnv-M2-jp — Male Speaker 2
0:00 / 0:00
つくよみちゃん tsukuyomi_chan — Anime-style
0:00 / 0:00
あみたろ amitaro_normal — Normal
0:00 / 0:00
あみたろ amitaro_runrun — Excited
0:00 / 0:00
あみたろ amitaro_yofukashi — Calm
0:00 / 0:00
あみたろ amitaro_punsuka — Angry
0:00 / 0:00
あみたろ amitaro_sasayaki_a — Whisper A
0:00 / 0:00
あみたろ amitaro_sasayaki_b — Whisper B
0:00 / 0:00

Want to generate the above samples yourself on your laptop, using only the CPU? Head to Get Started in 10 Minutes.

Quick Overview

Installation

bash
pip install hayakoe
bash
pip install torch --index-url https://download.pytorch.org/whl/cu126
pip install hayakoe[gpu]

The CPU profile does not require PyTorch, keeping installation fast and image size small.

The GPU profile installs additional dependencies but provides faster inference.

Inference

python
from hayakoe import TTS

text = "こんにちは、はじめまして。"

tts = TTS().load("jvnv-F1-jp").prepare()
tts.speakers["jvnv-F1-jp"].generate(text).save("hello.wav")

Go ahead and listen to hello.wav!

There are 11 built-in speakers.

  • jvnv-F1-jp / jvnv-F2-jp / jvnv-M1-jp / jvnv-M2-jp — Based on the JVNV corpus
  • tsukuyomi_chan — Based on the Tsukuyomi-chan Corpus
  • amitaro_normal / amitaro_runrun / amitaro_yofukashi / amitaro_punsuka / amitaro_sasayaki_a / amitaro_sasayaki_b — Based on the Amitaro ITA Corpus

Simply replace "jvnv-F1-jp" in the code above to try a different voice.

If you installed the GPU profile, just add TTS(device="cuda") to run inference on GPU.

Which Docs Should I Read?

  1. Start with the Quickstart. Follow along from installation to first synthesis and benchmarking to see how fast and how good the TTS sounds.
  2. Ready for more? Try Custom Speaker Training. Use a single video with the voice you like to go from data preparation to deployment.
  3. Want to share it? Head to Server Deployment. Learn how to expose an API on FastAPI and Docker.
  4. Want to dig deeper? Read the Deep Dive. A detailed walkthrough of every optimization point that achieved these speed and memory improvements.
  5. Stuck on something? Check the FAQ. Advanced settings like cache paths, private HF repos, S3, and multi-speaker memory are covered here.

Voice Data Credits

This project uses the following voice data for speech synthesis.