University of Helsinki Language Technology group at NoDaLiDa 2019

The University of Helsinki Language Technology group has a number of papers at The 22nd Nordic Conference on Computational Linguistics (NoDaLiDa’19). Here’s where you can find us. See you in Turku!

Monday September 30:

NLP4CALL workshop:

  • 13:15. Mathias Creutz, Eetu Sjöblom: “Toward automatic improvement of language produced by non-native language learners”

Constraint Grammar – Methods, Tools, and Applications workshop:

  • 14:15. Anssi Yli-Jyrä: “Constraint Grammar Is a Hand-Crafted Transformer”

Tuesday 01 October:

Oral presentations:
Parallel session B: Morphology and Syntax.

  • 13:45. Ilmari Kylliäinen and Miikka Silfverberg: “Ensembles of Neural Morphological Inflection Models”


  • 16:45. Jeff Ens, Mika Hämäläinen, Jack Rueter and Philippe Pasquier: “Morphosyntactic Disambiguation in an Endangered Language Setting”


  • 16:45. Mikko Aulamo and Jörg Tiedemann: “The OPUS Resource Repository: An Open Package for Creating Parallel Corpora and Machine Translation Services”

Wednesday 02 October

Oral presentations:
Parallel session B: Speech.

  • 14:50. Aarne Talman, Antti Suni, Hande Celikkanat, Sofoklis Kakouros, Jörg Tiedemann and Martti Vainio: “Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations”

University of Helsinki Language Technology group successful at WMT18 machine translation tasks

Our research group successfully participated in this year’s WMT machine translation tasks, taking the shared first place in both the English-Finnish and Finnish-English news translation tasks.

Shared Task: Machine Translation of News

We also participated in the Multimodal Machine Translation task as part of the MeMaD project, taking the first place in English-German and in English-French translation.

Shared Task: Multimodal Machine Translation En-De

In both the news and multimodal translation tasks, the best systems from the Language Technology group utilised state-of-the-art neural machine translation models.

System papers describing the models will be presented at EMNLP 2018 Third Conference on Machine Translation (WMT18) later this year.

Shared Task: Multimodal Machine Translation En-Fr

Congratulations to our team!

Found in Translation project presentation at Charles University’s Fred Jelinek Seminar Series

Professor Jörg Tiedemann gave a talk at the Charles University in Czech Republic on the 18th June as part of their Fred Jelinek Seminar Series.


Found in Translation – Learning to understand languages with cross-lingual grounding

Translated texts are semantic mirrors of the original text and the significant variations that we can observe across languages can be used to disambiguate the meaning of a given expression using the linguistic signal that is grounded in translation. We are interested in massively parallel corpora consisting of hundreds up to a thousand different languages and how they can be applied as implicit supervision to learn abstractions that could lead to significant improvements in natural language understanding. As a side-effect, we can also see how multilingual models can pick up essential relationships between languages building a continuous space with reasonable language clusters. I will talk about some initial results and plans for the future and I would like to get your feedback about those ideas.

NLUxG project presentation at the Academy of Finland AIPSE seminar

The Academy of Finland funded research project Natural Language Understanding with Cross-Lingual Grounding was presented in the Academy of Finland opening seminar Novel Applications of Artificial Intelligence in Physical Sciences and Engineering Research (AIPSE) on 18 June by Dr. Alessandro Raganato and Dr. Hande Celikkanat.

Our presentation and the poster attracted a lot of interest from the seminar participants.

Language Technology group visible at DHN 2018

Helsinki Language Technology group had 5 papers in the recent Digital Humanities in the Nordic Countries (DHN 2018) Conference held on 7–9 March 2018 in Helsinki.

Jörg Tiedemann presenting his paper “Emerging Language Spaces Learned From Massively Multilingual Corpora” at DHN 2018

Distinguished Short Paper

  • Jörg Tiedemann: Emerging Language Spaces Learned From Massively Multilingual Corpora [pdf]

Long Paper

  • Emily Öhman, Kaisla Kajava: Sentimentator: Gamifying Fine-grained Sentiment Annotation [pdf]


  • Yves Scherrer, Tanja Samardžić: ArchiMob: A multidialectal corpus of Swiss German oral history interviews [pdf]
  • Seppo Nyrkkö: An approach to unsupervised ontology term tagging of dependency-parsed text using a Self-Organizing Map (SOM) [pdf]
  • Mika Hämäläinen, Tanja Säily, Eetu Mäkelä: Normalizing Early English Letters for Neologism Retrieval [pdf]