In August 2015 I started to work as professor of language technology at the Department of Digital Humanities at the University of Helsinki. My main research interest is in cross-lingual NLP and machine translation. More information about my research can be found
- in the research portal
- at Google Scholar and ORCID
Short CV:
- Since August 2015: Professor of Language Technology at the Department of Digital Humanities Languages (formerly Department of Modern Languages), University of Helsinki
- September 2014 – July 2015: Senior Researcher at the Department of Linguistics and Philology, Uppsala University
- September 2009 – August 2014: Visiting Professor at the Department of Linguistics and Philology, Uppsala University
- September 2004 – August 2009: PostDoc researcher at the Department of Information Science/Humanities Computing (Informatiekunde), University of Groningen
- January 2004 – August 2004: Lecturer in computational linguistics and coordinator for the language technology programme, Department of Linguistics and Philology, Uppsala University
- 2000 – 2003: Ph.D. research at the Department of Linguistics, Uppsala University
- 2001 – 2002: Visiting Ph.D. student, Division of Informatics, Edinburgh University, UK
- 1997 – 1999: Research assistent, Department of Linguistics, Uppsala University
- 1991 – 1997: Masters in Computer Science (Diplom für Informatik), “Otto-von-Guericke” University, Magdeburg, Germany
Recent Projects
- HPLT – High-Performance Language Technologies (EU H2020)
- Uncertainty-aware neural language models (AoF)
- OPUS-MT: Open Translation Models, Tools and Services (ELG)
- Behind the words: Deep neural models of language meaning for industry-grade applications (AoF)
- Found in Translation: Natural Language Understanding with Cross-lingual Grounding (ERC)
- EOSC-nordic (EU H2020)
- NLUxG – NLU with Cross-Lingual Grounding (AoF)
- MeMAD: Methods for Managing Audiovisual Data (EU H2020)
- fiskmö: Parallel corpora and machine translation for Finnish and Swedish (SKF)
- Nordic Language Processing Laboratory (nordforsk)
- Cross-lingual NLP for low-resource languages (UH)
- Discourse Oriented Statistical Machine Translation (VR)
- Efficient Algorithms for Natural Language Processing Beyond Sentence Boundaries – a project within the e-science collaboration eSSENCE
- LetsMT! – Building a Platform for Online Sharing of Training Data and Building User Tailored MT (EU ICT)
Resources and Tools
- OPUS – a collection of freely available parallel corpora and tools
- fiskmö translator – a translation demo for the Nordic languages
- efmaral and eflomal – tools for efficient word alignment
- WMT en-fi 2016, 2017: official MT test sets for Finnish-English
- HNMT – the Helsinki Neural Machine Translation system
- Lingua::Align – a toolbox for tree-to-tree alignment
- Uplug – a toolbox for processing parallel corpora
- Lingua::Ident::Blacklists – language identifier for related languages
- Docent – a document-level SMT decoder
- pdf2xml – a converter for PDF documents
- subalign – tools for converting and aligning movie subtitles
- Helsinki-NLP at github and bitbucket