HayaKoeNear-real-time TTS with your favorite voice,
on CPU alone.

Just bring a video or recording — we handle data prep, training, benchmarking, and deployment for you.

Real-time CPU Inference

With ONNX optimization, 1.5x faster for short texts and 3.3x faster for long texts compared to Style-Bert-VITS2 on CPU alone.
On GPU, torch.compile makes it even faster.

How we did it

AMD64 · ARM64 Everywhere

Install on both x86_64 and aarch64 Linux with a single command.
CPU inference works seamlessly even on ARM boards like the Raspberry Pi.

Raspberry Pi Benchmark

47% Memory Savings

BERT Q8 quantization reduces RAM by 47% compared to PyTorch.
Approx. 2.0 GB RAM in CPU mode, approx. 1.7 GB VRAM in GPU mode.

How we did it

Lightweight Multi-speaker

BERT is shared across all speakers.
Adding one more speaker costs only ~300 MB of additional RAM.

Multi-speaker Serving

Sentence-level Streaming

Use astream() to stream audio as each sentence is synthesized.
Receive the first audio chunk much sooner than waiting for the full synthesis.

Streaming Example

Your Voice, Your Way

Just prepare a video with the voice you love.
We handle preprocessing, training, quality comparison, optimization, and deployment.

Training Guide

HF · S3-compatible · Local Pluggable

CLI deployment supports HuggingFace, S3-compatible, and local destinations.
Runtime loading supports the same three sources identically.

Source Abstraction

Sample Voices

Here are samples of the built-in speakers saying the same sentence ("こんにちは、はじめまして。").

JVNV jvnv-F1-jp — Female Speaker 1

0:00 / 0:00

JVNV jvnv-F2-jp — Female Speaker 2

0:00 / 0:00

JVNV jvnv-M1-jp — Male Speaker 1

0:00 / 0:00

JVNV jvnv-M2-jp — Male Speaker 2

0:00 / 0:00

つくよみちゃん tsukuyomi_chan — Anime-style

0:00 / 0:00

あみたろ amitaro_normal — Normal

0:00 / 0:00

あみたろ amitaro_runrun — Excited

0:00 / 0:00

あみたろ amitaro_yofukashi — Calm

0:00 / 0:00

あみたろ amitaro_punsuka — Angry

0:00 / 0:00

あみたろ amitaro_sasayaki_a — Whisper A

0:00 / 0:00

あみたろ amitaro_sasayaki_b — Whisper B

0:00 / 0:00

Want to generate the above samples yourself on your laptop, using only the CPU? Head to Get Started in 10 Minutes.

Quick Overview

Installation

CPUGPU (CUDA)

bash

pip install hayakoe

bash

pip install torch --index-url https://download.pytorch.org/whl/cu126
pip install hayakoe[gpu]

The CPU profile does not require PyTorch, keeping installation fast and image size small.

The GPU profile installs additional dependencies but provides faster inference.

Inference

python

from hayakoe import TTS

text = "こんにちは、はじめまして。"

tts = TTS().load("jvnv-F1-jp").prepare()
tts.speakers["jvnv-F1-jp"].generate(text).save("hello.wav")

Go ahead and listen to hello.wav!

There are 11 built-in speakers.

jvnv-F1-jp / jvnv-F2-jp / jvnv-M1-jp / jvnv-M2-jp — Based on the JVNV corpus
tsukuyomi_chan — Based on the Tsukuyomi-chan Corpus
amitaro_normal / amitaro_runrun / amitaro_yofukashi / amitaro_punsuka / amitaro_sasayaki_a / amitaro_sasayaki_b — Based on the Amitaro ITA Corpus

Simply replace "jvnv-F1-jp" in the code above to try a different voice.

If you installed the GPU profile, just add TTS(device="cuda") to run inference on GPU.

Which Docs Should I Read?

Start with the Quickstart. Follow along from installation to first synthesis and benchmarking to see how fast and how good the TTS sounds.
Ready for more? Try Custom Speaker Training. Use a single video with the voice you like to go from data preparation to deployment.
Want to share it? Head to Server Deployment. Learn how to expose an API on FastAPI and Docker.
Want to dig deeper? Read the Deep Dive. A detailed walkthrough of every optimization point that achieved these speed and memory improvements.
Stuck on something? Check the FAQ. Advanced settings like cache paths, private HF repos, S3, and multi-speaker memory are covered here.

Voice Data Credits

This project uses the following voice data for speech synthesis.

Tsukuyomi-chan Corpus (CV. Rei Yumesaki, (C) Rei Yumesaki) — https://tyc.rei-yumesaki.net/material/corpus/
Amitaro no Koe Sozai Koubou ITA Corpus Recordings — https://amitaro.net/

HayaKoeNear-real-time TTS with your favorite voice,on CPU alone.

Real-time CPU Inference

AMD64 · ARM64 Everywhere

47% Memory Savings

Lightweight Multi-speaker

Sentence-level Streaming

Your Voice, Your Way

HF · S3-compatible · Local Pluggable

Sample Voices ​

Quick Overview ​

Installation ​

Inference ​

Which Docs Should I Read? ​

Voice Data Credits ​

HayaKoeNear-real-time TTS with your favorite voice,
on CPU alone.

Sample Voices

Quick Overview

Installation

Inference

Which Docs Should I Read?

Voice Data Credits