Public examination of Aarne Talman’s PhD thesis

Aarne TalmanAnother FoTran team member, Aarne Talman will defend the doctoral dissertation entitled “Towards Natural Language Understanding: Developing and Assessing Approaches and Benchmarks

The public examination will take place in the Faculty of Arts, University of Helsinki, on 23 February 2024 at 13:00 in the lecture hall at Unioninkatu 33, (room 303). Assistant Professor Vered Shwartz, University of British Columbia, will serve as the opponent, and Jörg Tiedemann as the custos.

The defense will also be streamed on Unitube:

The dissertation will be published in the series Dissertationes Universitatis Helsingiensis (

[press release]

Public examination of Raúl Vázquez’ PhD thesis

Our FoTran team member Raúl Vázquez will defend the doctoral dissertation entitled “Representation Learning in Multilingual Neural Machine Translation

The public examination will take place in the Faculty of Arts, University of Helsinki, on 17 November 2023 at 10:00 in the lecture hall at Unioninkatu 33, (room 303). Senior Researcher Dr. Cristina España i Bonet, DFKI, will serve as the opponent, and Jörg Tiedemann as the custos.

The defense will also be streamed on Unitube:

The dissertation will be published in the series Dissertationes Universitatis Helsingiensis (

Helsinki-NLP is busy at the FCAI AI day

The language technology group from Helsinki has substantial presence at the AI day organized by the Finnish Center for AI (FCAI). Check out the program and see our presentations about



Come and see us at the presentations and posters!

Anna Dmitrieva is the Language Bank of Finland researcher of the month

researcher of the month Anna Dmitrieva Our doctoral student, Anna Dmitrieva, has been highlighted the researcher of the month at the Language Bank of Finland.

Anna is working on automatic text simplification with a focus on Russian and Finnish and created freely available resources and tools that can be retrieved from public repositories. One of the recent resources is a Parallel Corpus of Finnish and Easy-to-read Finnish compiled from public news at YLE.

New team member: Joseph Attieh

Hello everyone!

My name is Joseph Attieh, and I’m thrilled to introduce myself as the newest member joining the Language Technology research group led by Prof. Jörg Tiedemann. As I embark on my PhD journey, my primary focus will be on Green NLP, particularly in the areas of knowledge distillation and modularization.

My academic journey started in Lebanon, where I received my bachelor’s degree in Computer Engineering. Afterwards, I pursued a double master’s degree at Aalto University in Finland and KTH Royal Institute of Technology in Sweden. During the course of my studies, I’ve had the privilege of conducting research at EPFL, where I delved into decentralized and distributed systems. My journey then took me to Munich, where I worked with BMW, navigating the intricate world of data analytics. More recently, I’ve worked at Huawei Technologies in Helsinki, starting as an Automatic Speech Recognition researcher and later transitioning to a full-time NLP researcher role. The challenges and discoveries during these experiences solidified my decision to further my studies and research, leading me to pursuing this exciting PhD opportunity.

Outside the world of algorithms and research papers, I cherish my time swimming, a love I owe to the Mediterranean vibes of my Lebanese homeland. Roller and ice skating are my go-to escapes. And when I’m not in the research lab or on the rink, you might find me binge watching a TV show or reading a book, guilty pleasures I wholeheartedly embrace.

I am excited to join the research team and am looking forward to the collaborations, challenges, and innovations ahead. I am eager to contribute to the team’s vision and to further the advancements in the field of Green NLP. The research in Green NLP has the potential to revolutionize the way NLP models are built and used. I am eager to contribute to this vision and collaborate with the brilliant minds here.

Thank you for taking a moment to get to know me.  I am excited to embark on this shared journey of discovery, challenges, and innovation with all of you.

Best regards,

Joseph Attieh

Notes from EACL 2023

Hello! I’m Timothee Mickus, one of the postdocs in the FoTran project. I was at this year’s EACL to present a paper we submitted a year ago to TACL. The piece was about throwing linear algebra at Transformers and seeing what comes out. It was my first live conference since the pandemic, so that was a nice change of pace from Zoom-based and Underline-based conferences. It’s always easier to get feedback with live audiences.

I left the conference with a few tentative opportunities to collaborate with people in other labs, some new ideas for experiments, and, of course, numerous papers to read, with topics ranging from biases in annotation practices to the interpretability of contextualization in Transformer embeddings and from unsupervised machine translation evaluation to knowing when to expect failure and success in multitask scenarios.

The keynotes ranged through a wide variety of topics, and special attention had been paid so that they would also come from different backgrounds. Ed Grefenstette talked about large-language models and why instruction-based LMs seemed especially promising from a very NLP-centric standpoint. Kevin Munger provided a more media-studies oriented take on the same topic, and questioned how we should change our habits (from how we teach to how we write about AI to what new social policies we need) given the rise of the chatbots in today and tomorrow’s information landscape. Joyce Chai brought an overview of how advancements in NLP impact other AI research fields, and in particular embodied AI.

How do we keep ACL events affordable?

Coming into EACL, I expected that most of this blog post would be about the science. It turned out that EACL 2023 was also out of the ordinary in terms of organization.

One contentious point that was thoroughly discussed during the conference breaks was the price. It turned out to be one of the major points addressed during the business meeting (which was the only session where questions could be directed to the conference organizers). I tend to agree that $800 for a five day event (without counting the ACL membership dues) is a hefty sum; it contributed to the decisions of some of my colleagues and co-authors to not attend the conference. Next year’s venue has yet to be decided, since next year’s organizers want to explore more affordable options than what they initially had in mind. I would personally appreciate some transparency as to how the conference is funded (how much comes from sponsors? from the ACL dues? from the conference attendees?) and where that funding goes (how much are we paying for the conference venue and coffee break caterings? for the conference handbooks? for our EACL-branded tote-bags?).

The challenge of organizing EACL 2023

But maybe the most important thing to mention is the numerous rescheduling. The conference was initially supposed to happen in Kiev, and did not happen in Kiev for reasons. The backup venue in Dubrovnik was something of a last minute change of plans: the original plan once Kiev was ruled out was that the conference would just go online. I suppose this rushed relocation also explains some of the mishaps and communications issues during the final weeks before the conference. To take some concrete examples: the handbook we were provided contained duplicate entries for some of the papers presented at the conference; conversely, some poster sessions were not included. On a more personal note, I had no clear indication as to whether my talk would be presented in an oral or poster session: instead, I was sent a link to the underline page while it was still under construction, and directed to search for my name in this semi-functional website.

Presentations, posters, findings and all the confusion

This leads me to another unconventional organizational decision: every paper that was presented orally also had a poster presentation slot. This happened to be my case as well. I’m still not fully certain what I think of it: I’m not complaining about the extra opportunity to advertise my work, but it is extra work (both ahead of and during the conference). It’s also somewhat unfair that some of the presenters didn’t get the opportunity–and all the more jarring when it comes to Findings papers: Officially, Findings are works that are good enough but for which there’s not extra space in the conference itself, and yet organizers did manage to double-book quite a number of papers in the main conference. This means we have a tiered acceptance system: some of us got two presentation slots, some of us only got a poster, and some of us (Findings authors) only got virtual presentations. In and of itself, that’s not necessarily a bad thing, but I’m uncomfortable with this tiered system being left as implicit. Let’s hope that future editions of EACL and *ACL conferences will be more transparent on that front.

Where is NLP as a field and EACL as a conference?

The last point I’d like to mention comes from the opening session. As it turns out, most submissions to the European chapter of the association for computational linguistics did not come from Europe (with a good third of submissions stemming from the USA), and very few actually focused on linguistics. Most works submitted focused more on engineering, processing and dataset description rather than syntax, semantics or phonology modeling. While this state of affairs begs the question of what is EACL precisely, it also highlights how things have changed across the last few years: NLP has become a more international and engineering focused field of research. Despite these changes, it’s good to see that the community remains very directly involved in deciding where it is going–be it local volunteers in Dubrovnik picking up the ball at the last minute to ensure that EACL would not be an online-only event, or attendees openly discussing whether the current conference practices we have are a good fit for the community.

Tommi Nieminen joins Helsinki-NLP

My name is Tommi Nieminen, and I recently joined the Helsinki-NLP research group as a new PhD student. For the past two decades, I have worked in the translation industry, starting as a translator and gradually drifting to more technical roles, such as CAT tool support, localization engineering, translation process automation, and machine translation development. Due to my work history, my research focuses mostly on the use of language technology in professional translation.

I have a long history with the University of Helsinki. I enrolled on an MA Philosophy course in the university in 2001, and after a long period of academic absence and part-time study (and a change of disciplines) I finally graduated with an MA in language technology in 2018. Since then I have participated in two academic projects involving the university, Fiskmö and OPUS-MT: Open Translation Models, Tools and Services. In the course of these projects I developed the OPUS-CAT tool, which enables translators to use machine translation models from the OPUS-MT project in their normal working environments. The motivation behind OPUS-CAT is to make open-source machine translation technology and resources directly available to the individual translators, so that they may have more control over how machine translation is integrated into their profession.

I am thrilled to be part of the team in Helsinki and the GreenNLP project, and to work on issues that have recently become more significant than ever before. I live far from Helsinki, so I am usually at the university only one day a week. I look forward to meeting all of you that I have not met yet.

Introducing Elaine Zosa

profile-fotoHello there! I’m Elaine, a new postdoctoral researcher in the HelsinkiNLP Research Group. To start off, I have not always worked in NLP. I worked in the financial technology sector before I decided to study for a master’s degree. I obtained my MSc in Computer Science at the University of Helsinki where my concentration was on algorithmic bioinformatics. After that, I was a research assistant in computational genomics at the Technical University of Munich. Then in late 2018, I started my doctoral research at the University of Helsinki, in the Discovery Research Group led by Prof. Hannu Toivonen.

During my PhD, I worked on two EU Horizon 2020 projects: NewsEye ( and EMBEDDIA (  Both these projects involved building tools to help analyse large-scale news collections. In the former, we focused on historical news collections from Finland, France, and Austria, and in the latter, on news media from less-represented European languages such as Finnish, Estonian, and Croatian. I worked on various tasks in the projects and helped develop new methods in topic modeling, lexical semantic change, news headline generation, and multilingual news matching. Methodological innovations aside, these projects exposed me to the inherently interdisciplinary nature of NLP and language technology and that, I think, is the most exciting thing about this field. I enjoy building tools that could be useful to researchers in the humanities and social sciences, and beyond.

Now I am investigating methods to quantify and model uncertainty in various linguistic tasks. You can also find out more about my work on my homepage,!