The project
Dialectology is concerned with the study of language variation across space. Current dialectological datasets typically consist of
interviews with informants. These interviews cannot easily be compared with each other as they differ considerably in length and
content. If informant A does not use word x, this does not necessarily mean that the word does not exist in A’s dialect. It may just be
that A chose to talk about topics that did not require the use of word x. This project aims to introduce comparability in dialect corpora.
In particular, we will use machine translation techniques to normalize the dialect texts, i.e., to transform them to standardized spelling.
However, we are not only interested in the result of this normalization process, but also in the transformation operations that the
normalization model learns. These model parameters will allow us to provide new visualisations of dialect landscapes and to confirm or
challenge traditional dialect classifications.
The CorCoDial project is funded by the Academy of Finland during the period 2021-2025.
The full research plan is available here.
The UH research portal provides a summary of the project activities.
People
- Principal investigator: Yves Scherrer
- Postdoctoral researchers: Aleksandra Miletić, Olli Kuparinen
- PhD student: Janine Siewert
- Affiliated and visiting students: Noëmi Aepli (University of Zurich), Dana Roemling (University of Birmingham), Erofili Psaltaki (University of Crete)
- External collaborators and advisors: Jack Grieve, Nikola Ljubešić, Tanja Samardžić, Benedikt Szmrecsanyi
News
- Mar-May 2024: CorCoDial hosts two visiting students: Erofili Psaltaki (University of Crete) and Dana Roemling (University of Birmingham)
- Jan 2024: Janine is now funded by the CorCoDial project.
- Dec 2023: Three papers accepted at EMNLP 2023 and colocated workshops:
- Changing usage of Low Saxon auxiliary and modal verbs, LChange workshop
- The Helsinki-NLP Submissions at NADI 2023 Shared Task: Walking the Baseline, ArabicNLP conference