CorCoDial – Corpus-based computational dialectology

The project

Dialectology is concerned with the study of language variation across space. Current dialectological datasets typically consist of
interviews with informants. These interviews cannot easily be compared with each other as they differ considerably in length and
content. If informant A does not use word x, this does not necessarily mean that the word does not exist in A’s dialect. It may just be
that A chose to talk about topics that did not require the use of word x. This project aims to introduce comparability in dialect corpora.
In particular, we will use machine translation techniques to normalize the dialect texts, i.e., to transform them to standardized spelling.
However, we are not only interested in the result of this normalization process, but also in the transformation operations that the
normalization model learns. These model parameters will allow us to provide new visualisations of dialect landscapes and to confirm or
challenge traditional dialect classifications.

The CorCoDial project is funded by the Academy of Finland during the period 2021-2025.

The full research plan is available here.

The UH research portal provides a summary of the project activities.



  • Aug 2023: Yves will start a new position as an Associate Professor at the University of Oslo. He will remain affiliated with the University of Helsinki and continue leading the CorCoDial project.
  • May-July 2023: Janine is on another research visit at the University of Groningen.
  • May 2023: Olli presents Murreviikko-korpus: murteen mukaan annotoitu ja yleiskielistetty Twitter-aineisto at the Finnish Conference of Linguistics.
  • May 2023: Three papers accepted at VarDial 2023! Janine, Olli and Yves participate on-site at the workshop.
  • Nov-Dec 2022: Janine is on a research visit at the University of Groningen.
  • 16 Oct 2022: Aleksandra, Noëmi and Yves participate at the VarDial workshop co-located with COLING.
  • 5-6 Oct 2022: Aleksandra and Yves participate at the Estonian Digital Humanities conference, presenting joint work with Olli and Janine.
  • Sept-Oct 2022: Together with the GramAdapt project, we host HSSH Visiting Professor T. Mark Ellison from the University of Cologne.
  • Sept-Dec 2022: Olli is on a research visit in the QLVL group at KU Leuven.
  • 1-5 Aug 2022: Janine, Olli and Yves participate at Methods in Dialectology.
  • 26 May 2022: Janine presents Low Saxon dialect distances at the orthographic and syntactic level at the ACL LChange workshop.
  • 12 May 2022: Olli presents Vaihtoehto Kettuselle? Murrekorpusten laskennallinen
    at the Finnish Conference of Linguistics.
  • 3 May 2022: Aleksandra Miletić has joined our team as a second post-doc.
  • 29 Apr 2022: Thanks for joining our workshop and making it such a success! The final program and presentation slides are available on the workshop page.
  • 11 Feb 2022: We will organize a workshop on Corpus-based and computational approaches to linguistic variation on 27-28 April. Further information on this page.
  • 9 Dec 2021: Olli presents A topic modeling approach to dialect analysis and visualization in the language technology research seminar (Slides) (Video)