The DigiSami project was part of a wider joint research project funded by the Academy of Finland and the Hungarian Academy of Science, intended to increase the visibility and use of small Fenno-Ugric languages in the digital world by applying language technology to support the development of digital materials. The goal of the Finnish part was to support digital content generation for the Sami languages, focusing mainly on North Sami. The DigiSami project at University of Helsinki investigated how modern language technology and corpus-based linguistic research can contribute to handling the digitalisation challenges faced by small Fenno-Ugric language communities.
The project collected the DigiSami Corpus of Spoken North Sami language in several North Sami-speaking areas of both Finland and Norway. The spoken dialogues were transcribed and carefully annotated using modern corpus linguistics techniques. The speech materials were made available to colleagues at Aalto University working on speech technology, who began to develop a speech synthesizer and a speech recognizer for North Sami.
The project worked towards SamiTalk, a Sami-speaking robot linked to Sami Wikipedia. The aim was to demonstrate to young Sami speakers that the Sami languages of their grandparents will also play an active part in the digital future. The WikiTalk project had shown that robots can talk about many topics using English Wikipedia. As a step towards SamiTalk, the DigiSami project worked on localisation of spoken dialogue systems and made a Finnish localisation of WikiTalk using Finnish Wikipedia. A SamiTalk demonstration prototype shows a robot talking about the situation of Sami languages in Finland using information from Sami Wikipedia. The robot speaks North Sami, but can only do speech recognition for Finnish as the North Sami speech recognizer was not available.
The project organised the northernmost International Workshop on Spoken Dialogue Systems in Saariselkä, Finland in 2016. The workshop attracted over 50 researchers from Europe, Japan and USA. Revised versions of the workshop papers were collected in Dialogues with Social Robots, edited by Kristiina Jokinen and Graham Wilcock and published by Springer in 2017.
Best Paper Award
The project also worked on multimodal analysis of the videos and annotations in the DigiSami Corpus using machine learning techniques, in collaboration with University of Eastern Finland. This work was recognised by a Best Paper Award at IWSDS 2018 for Enabling Spoken Dialogue Systems for Low-resourced Languages: End-to-end Dialect Recognition for North Sami by Trung Ngo Trong, Kristiina Jokinen and Ville Hautamäki.