Research

My scientific contributions can be divided into three areas of applied computational linguistics:

Cross-Lingual and multilingual NLP including Annotation Projection and Machine Translation:

  • High Performance Language Technologies (HPLT)
  • Natural language understanding with cross-lingual grounding (FoTran)
  • Multimodal machine translation (MeMAD)
  • Parallel corpora and machine translation for Finnish and Swedish (fiskmö)
  • Annotation projection and cross-lingual transfer models
  • Discourse-oriented machine translation (DiscoMT)
  • Feature-rich discriminative word alignment (the clue alignment approach)
  • Public software for processing parallel corpora and treebanks (Uplug, Lingua-Align)
  • Extensive collection of freely available parallel corpora (OPUS)
  • Work with rule-based and hybrid machine translation (PLUG, MATS, KOMA), example-based MT (PaCoMT), statistical MT (LetsMT!), research on phrase-based, syntax-based and character-level SMT

Induction of Lexical Information:

  • Word alignment for extracting multilingual lexical knowledge (PLUGMATSKOMA)
  • Extraction of domain-specific terminology (Scania)
  • Identification of idiomatic expressions from aligned parallel corpora
  • Extraction of lexico-semantic relations in various languages and domains (IMIX)

Question Answering and Information Retrieval (IMIX):

  • Linguistic features in passage retrieval for question answering
  • Genetic algorithms for the optimization of feature-rich information retrieval
  • Dutch and cross-lingual question answering using dependency relations

For more information look at my list of publications and projects I was involved in.