CorCoDial – Corpus-based computational dialectology

The project

Dialectology is concerned with the study of language variation across space. Current dialectological datasets typically consist of
interviews with informants. These interviews cannot easily be compared with each other as they differ considerably in length and
content. If informant A does not use word x, this does not necessarily mean that the word does not exist in A’s dialect. It may just be
that A chose to talk about topics that did not require the use of word x. This project aims to introduce comparability in dialect corpora.
In particular, we will use machine translation techniques to normalize the dialect texts, i.e., to transform them to standardized spelling.
However, we are not only interested in the result of this normalization process, but also in the transformation operations that the
normalization model learns. These model parameters will allow us to provide new visualisations of dialect landscapes and to confirm or
challenge traditional dialect classifications.

The CorCoDial project is funded by the Academy of Finland during the period 2021-2025.

The full research plan is available here.

The UH research portal provides a summary of the project activities.

People

News