Cross-Lingual NLP

Parallel multilingual resources capture valuable linguistic information that can be used in various fields of computational linguistics. The most obvious application is machine translation with systems that can be trained on large collections of text and their human translations. Alined parallel corpora can also be used as knowledge sources for finding implicit linguistic patterns treating translations as natural semantic annotation. Parallel data sets can also be used to port linguistic tools and resources to other languages, which is especially useful for supporting low-density languages. The language technology group in Helsinki works on several of these topics including statistical machine translation and annotation projection. Read news and posts related to cross-lingual NLP.

Projects:

  • HPLT – High-Performance Language Technologies (2022-2025)
  • FoTran – Found in Translation (ERC), 2018-2022
  • OPUS-MT – Open Translation Models, Tools and Services (ELG pilot pro)
  • fiskmö – Finnish-Swedish Parallel Corpus and Machine Translation (SKF)
  • MeMAD – Method for Managing Audiovisual Data (EU H2020), 2018-2020
  • NLUxG – Natural Language Understanding with Cross-Lingual Grounding (AoF), 2018-2019
  • Cross-Linguistic and Multilingual NLP with the Focus on Low-Resource Languages and Language Variants (University of Helsinki), 2015-2019
  • NLPL – the Nordic Language Processing Laboratory
  • EOSC-nordic – the NLPL use case inside the nordic chapter of the European Open Science Cloud
  • OPUS (supported by NLPL), including OPUS-MT

Multilingual Resources:

  • OPUS: The Open Parallel Corpus (search, lexicon, NMT)
  • MuCoW: Multilingual contrastive WSD test sets for MT
  • ParCor: A Parallel Corpus with Annotated Pronoun Coreference
  • DiscoMT2015: Data from a Shared Task on Pronoun-Focused MT
  • FinnWordNet: A bilingual lexical database
  • PIE: The Proto-Indo-European Lexicon
  • SemFi and SemUr: Semantic Databases for Finnish, Skolt Sami, Erzya, Moksha and Komi-Zyrian.

Tools:

Events:

Cross-lingual NLP entered the research group in Helsinki in 2015. Current and past members of the group who deal with related research are:

Related Publications: