New team member: Joseph Attieh

Hello everyone!

My name is Joseph Attieh, and I’m thrilled to introduce myself as the newest member joining the Language Technology research group led by Prof. Jörg Tiedemann. As I embark on my PhD journey, my primary focus will be on Green NLP, particularly in the areas of knowledge distillation and modularization.

My academic journey started in Lebanon, where I received my bachelor’s degree in Computer Engineering. Afterwards, I pursued a double master’s degree at Aalto University in Finland and KTH Royal Institute of Technology in Sweden. During the course of my studies, I’ve had the privilege of conducting research at EPFL, where I delved into decentralized and distributed systems. My journey then took me to Munich, where I worked with BMW, navigating the intricate world of data analytics. More recently, I’ve worked at Huawei Technologies in Helsinki, starting as an Automatic Speech Recognition researcher and later transitioning to a full-time NLP researcher role. The challenges and discoveries during these experiences solidified my decision to further my studies and research, leading me to pursuing this exciting PhD opportunity.

Outside the world of algorithms and research papers, I cherish my time swimming, a love I owe to the Mediterranean vibes of my Lebanese homeland. Roller and ice skating are my go-to escapes. And when I’m not in the research lab or on the rink, you might find me binge watching a TV show or reading a book, guilty pleasures I wholeheartedly embrace.

I am excited to join the research team and am looking forward to the collaborations, challenges, and innovations ahead. I am eager to contribute to the team’s vision and to further the advancements in the field of Green NLP. The research in Green NLP has the potential to revolutionize the way NLP models are built and used. I am eager to contribute to this vision and collaborate with the brilliant minds here.

Thank you for taking a moment to get to know me.  I am excited to embark on this shared journey of discovery, challenges, and innovation with all of you.

Best regards,

Joseph Attieh

Notes from EACL 2023

Hello! I’m Timothee Mickus, one of the postdocs in the FoTran project. I was at this year’s EACL to present a paper we submitted a year ago to TACL. The piece was about throwing linear algebra at Transformers and seeing what comes out. It was my first live conference since the pandemic, so that was a nice change of pace from Zoom-based and Underline-based conferences. It’s always easier to get feedback with live audiences.

I left the conference with a few tentative opportunities to collaborate with people in other labs, some new ideas for experiments, and, of course, numerous papers to read, with topics ranging from biases in annotation practices to the interpretability of contextualization in Transformer embeddings and from unsupervised machine translation evaluation to knowing when to expect failure and success in multitask scenarios.

The keynotes ranged through a wide variety of topics, and special attention had been paid so that they would also come from different backgrounds. Ed Grefenstette talked about large-language models and why instruction-based LMs seemed especially promising from a very NLP-centric standpoint. Kevin Munger provided a more media-studies oriented take on the same topic, and questioned how we should change our habits (from how we teach to how we write about AI to what new social policies we need) given the rise of the chatbots in today and tomorrow’s information landscape. Joyce Chai brought an overview of how advancements in NLP impact other AI research fields, and in particular embodied AI.

How do we keep ACL events affordable?

Coming into EACL, I expected that most of this blog post would be about the science. It turned out that EACL 2023 was also out of the ordinary in terms of organization.

One contentious point that was thoroughly discussed during the conference breaks was the price. It turned out to be one of the major points addressed during the business meeting (which was the only session where questions could be directed to the conference organizers). I tend to agree that $800 for a five day event (without counting the ACL membership dues) is a hefty sum; it contributed to the decisions of some of my colleagues and co-authors to not attend the conference. Next year’s venue has yet to be decided, since next year’s organizers want to explore more affordable options than what they initially had in mind. I would personally appreciate some transparency as to how the conference is funded (how much comes from sponsors? from the ACL dues? from the conference attendees?) and where that funding goes (how much are we paying for the conference venue and coffee break caterings? for the conference handbooks? for our EACL-branded tote-bags?).

The challenge of organizing EACL 2023

But maybe the most important thing to mention is the numerous rescheduling. The conference was initially supposed to happen in Kiev, and did not happen in Kiev for reasons. The backup venue in Dubrovnik was something of a last minute change of plans: the original plan once Kiev was ruled out was that the conference would just go online. I suppose this rushed relocation also explains some of the mishaps and communications issues during the final weeks before the conference. To take some concrete examples: the handbook we were provided contained duplicate entries for some of the papers presented at the conference; conversely, some poster sessions were not included. On a more personal note, I had no clear indication as to whether my talk would be presented in an oral or poster session: instead, I was sent a link to the underline page while it was still under construction, and directed to search for my name in this semi-functional website.

Presentations, posters, findings and all the confusion

This leads me to another unconventional organizational decision: every paper that was presented orally also had a poster presentation slot. This happened to be my case as well. I’m still not fully certain what I think of it: I’m not complaining about the extra opportunity to advertise my work, but it is extra work (both ahead of and during the conference). It’s also somewhat unfair that some of the presenters didn’t get the opportunity–and all the more jarring when it comes to Findings papers: Officially, Findings are works that are good enough but for which there’s not extra space in the conference itself, and yet organizers did manage to double-book quite a number of papers in the main conference. This means we have a tiered acceptance system: some of us got two presentation slots, some of us only got a poster, and some of us (Findings authors) only got virtual presentations. In and of itself, that’s not necessarily a bad thing, but I’m uncomfortable with this tiered system being left as implicit. Let’s hope that future editions of EACL and *ACL conferences will be more transparent on that front.

Where is NLP as a field and EACL as a conference?

The last point I’d like to mention comes from the opening session. As it turns out, most submissions to the European chapter of the association for computational linguistics did not come from Europe (with a good third of submissions stemming from the USA), and very few actually focused on linguistics. Most works submitted focused more on engineering, processing and dataset description rather than syntax, semantics or phonology modeling. While this state of affairs begs the question of what is EACL precisely, it also highlights how things have changed across the last few years: NLP has become a more international and engineering focused field of research. Despite these changes, it’s good to see that the community remains very directly involved in deciding where it is going–be it local volunteers in Dubrovnik picking up the ball at the last minute to ensure that EACL would not be an online-only event, or attendees openly discussing whether the current conference practices we have are a good fit for the community.

Tommi Nieminen joins Helsinki-NLP

My name is Tommi Nieminen, and I recently joined the Helsinki-NLP research group as a new PhD student. For the past two decades, I have worked in the translation industry, starting as a translator and gradually drifting to more technical roles, such as CAT tool support, localization engineering, translation process automation, and machine translation development. Due to my work history, my research focuses mostly on the use of language technology in professional translation.

I have a long history with the University of Helsinki. I enrolled on an MA Philosophy course in the university in 2001, and after a long period of academic absence and part-time study (and a change of disciplines) I finally graduated with an MA in language technology in 2018. Since then I have participated in two academic projects involving the university, Fiskmö and OPUS-MT: Open Translation Models, Tools and Services. In the course of these projects I developed the OPUS-CAT tool, which enables translators to use machine translation models from the OPUS-MT project in their normal working environments. The motivation behind OPUS-CAT is to make open-source machine translation technology and resources directly available to the individual translators, so that they may have more control over how machine translation is integrated into their profession.

I am thrilled to be part of the team in Helsinki and the GreenNLP project, and to work on issues that have recently become more significant than ever before. I live far from Helsinki, so I am usually at the university only one day a week. I look forward to meeting all of you that I have not met yet.

Introducing Elaine Zosa

profile-fotoHello there! I’m Elaine, a new postdoctoral researcher in the HelsinkiNLP Research Group. To start off, I have not always worked in NLP. I worked in the financial technology sector before I decided to study for a master’s degree. I obtained my MSc in Computer Science at the University of Helsinki where my concentration was on algorithmic bioinformatics. After that, I was a research assistant in computational genomics at the Technical University of Munich. Then in late 2018, I started my doctoral research at the University of Helsinki, in the Discovery Research Group led by Prof. Hannu Toivonen.

During my PhD, I worked on two EU Horizon 2020 projects: NewsEye (https://www.newseye.eu/) and EMBEDDIA (http://embeddia.eu/).  Both these projects involved building tools to help analyse large-scale news collections. In the former, we focused on historical news collections from Finland, France, and Austria, and in the latter, on news media from less-represented European languages such as Finnish, Estonian, and Croatian. I worked on various tasks in the projects and helped develop new methods in topic modeling, lexical semantic change, news headline generation, and multilingual news matching. Methodological innovations aside, these projects exposed me to the inherently interdisciplinary nature of NLP and language technology and that, I think, is the most exciting thing about this field. I enjoy building tools that could be useful to researchers in the humanities and social sciences, and beyond.

Now I am investigating methods to quantify and model uncertainty in various linguistic tasks. You can also find out more about my work on my homepage, https://ezosa.github.io/!

New project accepted: Green NLP

The Academy of Finland decided to fund our project proposal on “Green NLP – controlling the carbon footprint in sustainable language technology” from the call on sustainable and energy-efficient ICT solutions. We are looking forward to three years of exciting research and work together with our colleagues from TurkuNLP and CSC.

GreenNLP addresses the problem of increasing energy consumption caused by modern solutions in natural language processing (NLP). Neural language models and machine translation require heavy computations to train and their size is constantly growing, which makes them expensive to deploy and run. In our project we will reduce the training costs and model sizes by clever optimizations of the underlying machine learning algorithms with techniques that make use of knowledge transfer and compression. Furthermore, we will focus on multilingual solutions that can serve many languages in a single model reducing the number of actively running systems. Finally, we will also openly document and freely distribute all our results to enable efficient reuse of ready-made components to further decrease the carbon footprint of modern language technology.

2022 Steven Krauwer Award for OPUS-MT for Ukrainian

Helsinki-NLP received the 2022 Steven Krauwer award for CLARIN achievements for the work on open machine translation for Ukrainian. Thank you very much for this award but especially also thanks to everyone who contributed data, software and help with putting this all together! And let us continue to help people in need recognizing the importance of open and transparent language technology and the responsibilities we have in society. Thank you!