Releasing the Komi newspapers at Fenno-Ugrica

Last year, we released a plenty of monographs in Komi languages in our online collection, Fenno-Ugrica. In addition to the monographs, we also are publishing newspapers in both, Komi-Permyak and Komi-Zyrian. All in all, there will be 23 titles and around 40 000 pages of Komi newspapers in our collection by the end of June 2015.

Continue reading

Brief technical overview of Revizor, the editor for correcting OCR text material

OCR’ed text is often lacking in quality because of errors during the optical recognition process, especially when the source material is old or otherwise in a bad state. These errors make it hard to rely on the text for building a corpus or word lists and makes the source material less accessible to use for study or to incorporate into other tooling for language researchers. This is a problem that our OCR editor tries to eradicate, or at least contribute a possible solution towards.

Continue reading

Material from the Komi National Library Accessible through Uralica

With the help of fund, received from the Finnish Ministry of Education and Cultures, we are developing the Uralica service that aims to bring together libraries and other institutions that possess digital content in Uralic languages. The latest addition was made last week, when we linked some digitized material from the Komi National Library onto Uralica. The linked material in Komi can be accessed here.

Finno-Ugrian Researcher Discovers Linguistic Treasures Every Day

We recently published the first material produced in the continued Digitisation Project of Kindred Languages in the Fenno-Ugrica collection, a total of 75 monographs in the Mari languages. To discuss this material, we met with Finno-Ugrian researcher Mrs Julia Kuprina, a project researcher at the Morphological Analyzers for Minority Finno-Ugrian Languages project. We spoke with her about the material in the collections, her own research in language technologies, and naturally also the Hill Mari language.

Continue reading