Helsinki has a long tradition in working with computational models for morphological analysis and generation of natural languages. Helsinki is the birthplace of two-level morphology (TWOL), a general computational model for word form recognition and production. The language technology group now provides efficient tools for morphological analysis based on finite-state technology and statistical NLP. Additional tools and linguistic resources that emphasise NLP for highly inflecting languages (Finnish in particular) are collected under the umbrella of FIN-CLARIN. For more information, look at the links below and read news and posts related to our work on NLP for morphologically-rich languages.
Tools and Software Libraries:
- HFST: Helsinki Finite-State Technology
- Omorfi: Open Morphology for Finnish
- finer-data: Named Entity Recognition for Finnish
- HFST PoS tagger Demo
- HFST Morphological Analyses Demo
- UralicNLP: A Python library for processing Uralic languages
- Morfessor: Unsupervised morphological segmentation
- FinPos: Morphological Tagging and Lemmatization
- AKU: Open-source Language Technology for Uralic Minority Languages
Linguistic Resources:
- FIN-CLARIN: Integration of the Language Resources in Finland
- Kielipankki: The Language Bank of Finland
- FinnTreeBank: A Treebank for Finnish
- FinnWordNet: A Lexico-Semantic Database for Finnish
Many people have contributed to the development of resources and tools. Below you can see current people working on related research and development as well as selected relevant references.
People:
- Krister Lindén (FIN-CLARIN)
- Jack Rueter (AKU)
- Mika Hämäläinen (UralicNLP, AKU)
- Heidi Jauhiainen (FIN-CLARIN)
- Tommi Jauhiainen (FIN-CLARIN)
- Mathias Creutz (Morfessor)
- Miikka Silfverberg (FinPos)
- Atro Voutilainen (FinnTreeBank)
- Anssi Yli-Jyrä (FST)
- Senka Drobac (HFST)
- Sam Hardwick (HFST)
- Erik Axelson (HFST)