HPLT and LumiNMT

Our new project on High-Performance Language Technologies (HPLT) has started and we will scale data sets, language models and neural MT to a new level. In relation to that, the language technology group in Helsinki has also been selected for one of the first Finnish extreme scale projects on the supercomputer LUMI.

Our project there will be called LumiNMT and the goal of the project is to train neural machine translation models on a large scale using state-of-the-art transformer models and novel modular multilingual setups. Our project will focus on increasing language coverage and efficient use of massively parallel data sets. Our research group wants to use LUMI’s extensive parallel computing capabilities to reduce training time and scale up a model size.

Language tools for Ukrainian

In response to the on-going crisis in Ukraine we have started to collect language tools and resources that support the Ukrainian language. At Helsinki-NLP, we have especially focused on the development of open translation models and tools and we are currently working on improved models for more language pairs. Hopefully some of them can help communication and interaction with people in help.

Detect, normalize and generate Finnish dialects

Finnish dialects create a lot of trouble when interacting with computers, since it is impossible to speak a language without speaking in a dialect of some sort. Mika Hämäläinen, Niko Partanen, Khalid Alnajjar and Jack Rueter from our language technology team have created software that can automatically detect, normalize and generate Finnish dialects. Their research made it to the news on our university website.

Donate spoken language data in Finnish and Swedish

The Language Bank of Finland and the Swedish Literary Society in Finland are collecting Finland-Swedish speech data (https://doneraprat.fi)  (see YLE article with video: Vill du att röststyrning ska fungera på finlandssvenska? Kom med och donera prat [Do ​​you want voice control to work in Finland-Swedish? Come and donate speech]). Note that the campaign for collecting spoken Finnish also continues (https://lahjoitapuhetta.fi/).

If you wish to know how the database may affect everyday life and how it can be used in research, listen to the YLE podcast “Second Last Word” (YLE pod: Så här lär sig din dammsugare finlandssvensk dialekt [How your vacuum cleaner learns Finnish-Swedish dialect] and YLE article: Pratande kylskåp och smarta glasögon hjälper dig handla mat – det här är den röststyrda framtiden [Talking refrigerators and smart glasses help you buy food – this is the voice-controlled future]).

Meet the LT industry 2021

  • Place: Metsätalo (Unioninkatu 40), Sali I
  • Date: Friday November 26, 2021
  • Time: 15:15 – 17:45

Update 26 November: Thanks for attending! You can find the presentation slides here (UH account required).

The purpose of this event is to arrange a meeting between students and representatives of the industry that work with language technology in one way or another. The event is open to anyone who is interested in getting information about career opportunities. We will have short presentations of relevant companies and their business and leave time for questions and discussions. There will also be the opportunity to informally speak to the industry representatives face to face.

We have invited various language service providers and LT businesses and the preliminary list of confirmed participants is listed below:

  • Kielikone (unmanned booth)
  • Lingsoft
  • Semantix
  • Silo.AI (virtual participation)
  • Speechly
  • Utopia Analytics

Please sign up here by Friday 19 November if you intend to participate. (The registration is not binding, it is just to facilitate the organization.)