Speed & Prosody Controls
Quickstart recommendation: just use speed
In most cases, leaving parameters other than speed at their defaults sounds most natural.
pitch_scale and intonation_scale can introduce slight quality degradation when moved away from 1.0.
If you just want to make things a bit faster or slower, try speed alone, and come back to the Advanced Parameters section when you need more.
generate() accepts six keyword arguments for adjusting speech speed, pitch, intonation, and variability.
All are optional, and if none are passed, the trained defaults are used for synthesis.
The samples below all use the same sentence ("今日はどんな国に辿り着くのでしょうか。新しい出会いが楽しみです。") with the tsukuyomi_chan speaker, varying only the parameter in question.
Parameter behavior is identical to the infer() function in Style-Bert-VITS2, which HayaKoe forked from.
speed — Speech Speed
Based on the default of 1.0, smaller values are faster and larger values are slower.
Internally, this multiplies the phoneme durations predicted by the Duration Predictor directly by speed, so pronunciation itself is well preserved.

Below 0.8, pronunciation starts to blur, and above 1.3, it sounds more "dragged out" than simply "slow".
In practice, 0.9 to 1.1 sounds most natural.
speaker.generate(text, speed=0.9) # slightly faster
speaker.generate(text, speed=1.1) # slightly slowerAdvanced Parameters
The settings below are already natural at their defaults, but can be adjusted when fine-tuning is needed.
Summary
| Parameter | Default | Recommended Range | Effect |
|---|---|---|---|
pitch_scale | 1.0 | 0.95 ~ 1.05 | Pitch multiplier. Slight quality loss away from 1.0 |
intonation_scale | 1.0 | 0.8 ~ 1.3 | Intonation range. Slight quality loss away from 1.0 |
sdp_ratio | 0.2 | 0.0 ~ 0.5 | Blend ratio of deterministic DP and stochastic SDP |
noise | 0.6 | 0.3 ~ 0.9 | Voice variability (tonal randomness) |
noise_w | 0.8 | 0.5 ~ 1.2 | Rhythm variability (SDP noise) |
We recommend moving one parameter at a time.
In the samples below, we intentionally pushed values beyond the recommended range so you can hear the differences.
pitch_scale — Pitch
A simple multiplier that raises or lowers the overall pitch.
Moving away from 1.0 introduces slight quality degradation, so it is recommended to adjust this more narrowly than other parameters.

In the 0.95 to 1.05 range, speaker identity is mostly preserved, but at extreme values the voice sounds like a different person or quality noticeably drops.
speaker.generate(text, pitch_scale=1.05)intonation_scale — Intonation Range
Controls the "width" of intonation variation.
0.0 is a near-completely monotone robotic tone, while 2.0 is an exaggerated reading tone.
Like pitch_scale, moving away from 1.0 introduces slight quality degradation.

In practice, 0.85 to 1.3 sounds natural.
speaker.generate(text, intonation_scale=1.2)sdp_ratio — Deterministic/Stochastic Duration Blend
HayaKoe (and Style-Bert-VITS2) uses two types of duration predictors together.
- DP (Deterministic Duration Predictor) — Always produces the same duration for the same text
- SDP (Stochastic Duration Predictor) — Produces slightly different durations each time
sdp_ratio is the blend ratio between the two, where 0.0 uses DP only and 1.0 uses SDP only.
Higher values increase rhythm variation within sentences, and results differ with each run for the same text.

For services where reproducibility matters (e.g., fixed subtitle timing), set it to 0.0; for one-off generation, 0.2 ~ 0.4 sounds natural.
speaker.generate(text, sdp_ratio=0.0) # always identicalnoise / noise_w — Voice & Rhythm Variability
Each controls noise at a different stage (not the phoneme audio itself).
noise— Voice variability. Controls overall tonal randomness in the Flow stage. Always has an effect regardless ofsdp_ratio.noise_w— Rhythm variability. Noise fed into the SDP (stochastic predictor). Has no effect whensdp_ratiois 0.
The samples below were generated with all other parameters at their defaults, changing only the respective noise value.


In most cases, leaving the defaults (0.6, 0.8) sounds most natural.
If you feel the output is "wobbling too much", try lowering the corresponding noise slightly; if it sounds "too mechanical", try raising it a bit.
