Skip to content

Deep Dive

HayaKoe is a TTS engine built by trimming Style-Bert-VITS2 down to Japanese-only and restructuring it into a practical form for CPU inference and server operations.

This section documents what was changed, how it was done, and how much the results improved, with actual measurements.

You can start from any topic that interests you.

Summary at a Glance

Here are the measured improvements HayaKoe achieved over the original SBV2 (see individual pages for details).

CategoryOriginal SBV2HayaKoeDifference
CPU speed (short text, ~2s)1.13 s0.68 s1.67x faster
CPU speed (medium text, ~8s)3.35 s2.44 s1.37x faster
CPU speed (long text, ~38s)35.33 s10.43 s3.39x faster
CPU memory5,122 MB2,346 MB54% reduction
GPU VRAM3,712 MB1,661 MB55% reduction
Supported architecturex86_64x86_64 · aarch64 LinuxARM board support

Page Structure

Each page follows a why it matters -> implementation -> improvement results flow as a baseline, with flexible structure depending on the topic.

Table of Contents

Big Picture

Making CPU Inference Real-time

Further GPU Inference Optimization

Operational Convenience

Other

Recommended reading order

If this is your first time, we recommend skimming Architecture Overview first, then selectively reading topics that interest you.