FinSyn – Synthesis experiments

Having processed the data from one of the two speakers, we’ve experimented with  training  synthetic voices using Tacotron 2 and FastPitch neural text-to-speech (TTS) frameworks, with good results.  As mentioned in the previous entry, the FinSyn corpus contains a wide variety of speaking styles, posing a challenge for synthesis models; if the variation is unaccounted for, the resulting synthetic voices will be unpredictable, either producing speech in average style, or selecting more or less randomly among the styles present in the training corpus. Continue reading “FinSyn – Synthesis experiments”

FinSyn – a New Speech Synthesis Corpus for Finnish

Publicly available Finnish speech resources are becoming plentiful, with, for example, a large parliamentary corpus and the recent Lahjoita puhetta, “donate speech”, campaign.  These are eminently suitable for many speech technology applications, like speech recognition, but of limited value for speech synthesis training, where large, high quality single-speaker corpora are ideal. So far, such corpora have been absent for Finnish language.

To fill this gap, we at Helsinki phonetics  group designed and recorded a new speech corpus in autumn 2021, intended for Finnish speech synthesis research and applications.  A dataset consisting of ~ 60h of speech was collected, recording two voice talents almost daily during a one month period.

Continue reading “FinSyn – a New Speech Synthesis Corpus for Finnish”