Helsinki has a long tradition in working with computational models for morphological analysis and generation of natural languages. Helsinki is the birthplace of two-level morphology (TWOL), a general computational model for word form recognition and production. The language technology group now provides efficient tools for morphological analysis based on finite-state technology and statistical NLP. Additional tools and linguistic resources that emphasise NLP for highly inflecting languages (Finnish in particular) are collected under the umbrella of FIN-CLARIN. For more information, look at the links below and read news and posts related to our work on NLP for morphologically-rich languages.
Tools and Software Libraries:
- HFST: Helsinki Finite-State Technology
- Omorfi: Open Morphology for Finnish
- finer-data: Named Entity Recognition for Finnish
- HFST PoS tagger Demo
- HFST Morphological Analyses Demo
- UralicNLP: A Python library for processing Uralic languages
- FinMeter: A Python library for analyzing Finnish poetry, including rhyme, meter, metaphoricity.
- Murre: A Python library for processing and generating dialectal Finnish (puhekieli)
- Syntax maker: NLG tools for Finnish surface realization on Python
- Morfessor: Unsupervised morphological segmentation
- FinPos: Morphological Tagging and Lemmatization
- Veʹrdd: An online interface for dictionary editing
Linguistic Resources:
- FIN-CLARIN: Integration of the Language Resources in Finland
- Kielipankki: The Language Bank of Finland
- FinnTreeBank: A Treebank for Finnish
- Moksha Treebank
- Skolt Sami Treebank
- Komi-Permyak Treebank
- Erzya Treebank
- SemFi and SemUr: Semantic Databases for Finnish, Skolt Sami, Erzya, Moksha and Komi-Zyrian. (online demo)
- Akusanat: A MediaWiki based online dictionary of Uralic languages
- FinnWordNet: A Lexico-Semantic Database for Finnish
Many people have contributed to the development of resources and tools. Below you can see current people working on related research and development as well as selected relevant references.
People:
- Krister Lindén (FIN-CLARIN)
- Jack Rueter (AKU)
- Mika Hämäläinen (UralicNLP, AKU)
- Khalid Alnajjar
- Heidi Jauhiainen (FIN-CLARIN)
- Tommi Jauhiainen (FIN-CLARIN)
- Mathias Creutz (Morfessor)
- Miikka Silfverberg (FinPos)
- Atro Voutilainen (FinnTreeBank)
- Anssi Yli-Jyrä (FST)
- Senka Drobac (HFST)
- Sam Hardwick (HFST)
- Erik Axelson (HFST)
Projects:
- Behind the words: Deep neural models of language meaning for industry-grade appliciations
- Permic Morpho-Lexical Resources and Implementation
- HFST – Helsinki Finite-State Technology
- Finno-Ugric Languages and the Internet