Introducing Elaine Zosa

profile-fotoHello there! I’m Elaine, a new postdoctoral researcher in the HelsinkiNLP Research Group. To start off, I have not always worked in NLP. I worked in the financial technology sector before I decided to study for a master’s degree. I obtained my MSc in Computer Science at the University of Helsinki where my concentration was on algorithmic bioinformatics. After that, I was a research assistant in computational genomics at the Technical University of Munich. Then in late 2018, I started my doctoral research at the University of Helsinki, in the Discovery Research Group led by Prof. Hannu Toivonen.

During my PhD, I worked on two EU Horizon 2020 projects: NewsEye (https://www.newseye.eu/) and EMBEDDIA (http://embeddia.eu/).  Both these projects involved building tools to help analyse large-scale news collections. In the former, we focused on historical news collections from Finland, France, and Austria, and in the latter, on news media from less-represented European languages such as Finnish, Estonian, and Croatian. I worked on various tasks in the projects and helped develop new methods in topic modeling, lexical semantic change, news headline generation, and multilingual news matching. Methodological innovations aside, these projects exposed me to the inherently interdisciplinary nature of NLP and language technology and that, I think, is the most exciting thing about this field. I enjoy building tools that could be useful to researchers in the humanities and social sciences, and beyond.

Now I am investigating methods to quantify and model uncertainty in various linguistic tasks. You can also find out more about my work on my homepage, https://ezosa.github.io/!

New project accepted: Green NLP

The Academy of Finland decided to fund our project proposal on “Green NLP – controlling the carbon footprint in sustainable language technology” from the call on sustainable and energy-efficient ICT solutions. We are looking forward to three years of exciting research and work together with our colleagues from TurkuNLP and CSC.

GreenNLP addresses the problem of increasing energy consumption caused by modern solutions in natural language processing (NLP). Neural language models and machine translation require heavy computations to train and their size is constantly growing, which makes them expensive to deploy and run. In our project we will reduce the training costs and model sizes by clever optimizations of the underlying machine learning algorithms with techniques that make use of knowledge transfer and compression. Furthermore, we will focus on multilingual solutions that can serve many languages in a single model reducing the number of actively running systems. Finally, we will also openly document and freely distribute all our results to enable efficient reuse of ready-made components to further decrease the carbon footprint of modern language technology.

2022 Steven Krauwer Award for OPUS-MT for Ukrainian

Helsinki-NLP received the 2022 Steven Krauwer award for CLARIN achievements for the work on open machine translation for Ukrainian. Thank you very much for this award but especially also thanks to everyone who contributed data, software and help with putting this all together! And let us continue to help people in need recognizing the importance of open and transparent language technology and the responsibilities we have in society. Thank you!

HPLT and LumiNMT

Our new project on High-Performance Language Technologies (HPLT) has started and we will scale data sets, language models and neural MT to a new level. In relation to that, the language technology group in Helsinki has also been selected for one of the first Finnish extreme scale projects on the supercomputer LUMI.

Our project there will be called LumiNMT and the goal of the project is to train neural machine translation models on a large scale using state-of-the-art transformer models and novel modular multilingual setups. Our project will focus on increasing language coverage and efficient use of massively parallel data sets. Our research group wants to use LUMI’s extensive parallel computing capabilities to reduce training time and scale up a model size.

Language tools for Ukrainian

In response to the on-going crisis in Ukraine we have started to collect language tools and resources that support the Ukrainian language. At Helsinki-NLP, we have especially focused on the development of open translation models and tools and we are currently working on improved models for more language pairs. Hopefully some of them can help communication and interaction with people in help.