Having processed the data from one of the two speakers, we’ve experimented with training synthetic voices using Tacotron 2 and FastPitch neural text-to-speech (TTS) frameworks, with good results. As mentioned in the previous entry, the FinSyn corpus contains a wide variety of speaking styles, posing a challenge for synthesis models; if the variation is unaccounted for, the resulting synthetic voices will be unpredictable, either producing speech in average style, or selecting more or less randomly among the styles present in the training corpus. Continue reading “FinSyn – Synthesis experiments”
Publicly available Finnish speech resources are becoming plentiful, with, for example, a large parliamentary corpus and the recent Lahjoita puhetta, “donate speech”, campaign. These are eminently suitable for many speech technology applications, like speech recognition, but of limited value for speech synthesis training, where large, high quality single-speaker corpora are ideal. So far, such corpora have been absent for Finnish language.
To fill this gap, we at Helsinki phonetics group designed and recorded a new speech corpus in autumn 2021, intended for Finnish speech synthesis research and applications. A dataset consisting of ~ 60h of speech was collected, recording two voice talents almost daily during a one month period.