CorCoDial – Corpus-based computational dialectology

The project

Dialectology is concerned with the study of language variation across space. Current dialectological datasets typically consist of
interviews with informants. These interviews cannot easily be compared with each other as they differ considerably in length and
content. If informant A does not use word x, this does not necessarily mean that the word does not exist in A’s dialect. It may just be
that A chose to talk about topics that did not require the use of word x. This project aims to introduce comparability in dialect corpora.
In particular, we will use machine translation techniques to normalize the dialect texts, i.e., to transform them to standardized spelling.
However, we are not only interested in the result of this normalization process, but also in the transformation operations that the
normalization model learns. These model parameters will allow us to provide new visualisations of dialect landscapes and to confirm or
challenge traditional dialect classifications.

The CorCoDial project is funded by the Academy of Finland during the period 2021-2025.

The full research plan is available here.

People

News

  • 3 May 2022: Aleksandra Miletić has joined our team as a second post-doc.
  • 29 Apr 2022: Thanks for joining our workshop and making it such a success! The final program and presentation slides are available on the workshop page.
  • 11 Feb 2022: We will organize a workshop on Corpus-based and computational approaches to linguistic variation on 27-28 April. Further information on this page.
  • 9 Dec 2021: Olli presents A topic modeling approach to dialect analysis and visualization in the language technology research seminar (Slides) (Video)