My scientific contributions can be divided into three areas of applied computational linguistics:
Cross-Lingual NLP including Annotation Projection and Machine Translation:
- Natural language understanding with cross-lingual grounding (FoTran)
- Multimodal machine translation (MeMAD)
- Parallel corpora and machine translation for Finnish and Swedish (fiskmö)
- Annotation projection and cross-lingual transfer models
- Discourse-oriented machine translation (DiscoMT)
- Feature-rich discriminative word alignment (the clue alignment approach)
- Public software for processing parallel corpora and treebanks (Uplug, Lingua-Align)
- Extensive collection of freely available parallel corpora (OPUS)
- Work with rule-based and hybrid machine translation (PLUG, MATS, KOMA), example-based MT (PaCoMT), statistical MT (LetsMT!), research on phrase-based, syntax-based and character-level SMT
Induction of Lexical Information:
- Word alignment for extracting multilingual lexical knowledge (PLUG, MATS, KOMA)
- Extraction of domain-specific terminology (Scania)
- Identification of idiomatic expressions from aligned parallel corpora
- Extraction of lexico-semantic relations in various languages and domains (IMIX)
Question Answering and Information Retrieval (IMIX):
- Linguistic features in passage retrieval for question answering
- Genetic algorithms for the optimization of feature-rich information retrieval
- Dutch and cross-lingual question answering using dependency relations
For more information look at my list of publications and projects I was involved in.