Stage One Completed, Second Round to Begin

The continuation phase of the Digitization Project of Kindred Languages (2014-2015) took off in January 2014. Since then, we have conducted the copyright clearance for all material that will be digitized and published by the end of this year. Also, naturally, we have signed the needed agreements with the National Library of Russia on digitization of the material. Not to mention, there’s a great deal of work done behind the scenes: the development of our OCR editor has taken a step forward, and a plenty of time has been spend on post-production of material here in Helsinki. By the mid-July, we have published exactly new 400 monographs in Khanty, Mansi, Hill and Meadow Mari, Nenets, Selkup, Komi-Permyak, Komi-Zyrian and Udmurt in our Fenno-Ugrica collection. We more than are glad that we have passed the first round of release now.

The next stage is looming around the corner and more action will come: In the beginning of August we will start releasing the biggest set of material so far. This bunch of material is colloquially called as Parallel Titles, which consist of more than 128 books in various different translations in Uralic languages. All in all, this set has more than 620 items and 50 000 pages relating to 25 different subjects, varying from the natural sciences to politics and from technology to children’s literature. The most of the material have originally been published in the 1920s and the 1930s and they are offering a great view on the history of Kindred Peoples, education, society, linguistic development etc. The full reference list Here. The majority of digitizing work in Saint Peterburg has already been completed, but the post-production and cataloging tasks will take some time – we do anticipate that the first of Parallel titles will be available in Fenno-Ugrica in early-August and work will be completed by the end of September.

Some new steps in our services will also be taken. In addition to the release of searchable PDFs, we have started to offer the XML files of each item per page for those who are interested. See the example here. Also, the post-correction of material has begun: the Ingrian material is first at the stake and we do use crowdsourcing, or perhaps even nichesourcing, to enhance the OCR’d text for the benefit of the linguistic research. For more details on our approach in crowdsourcing, see my presentation on our crowdsourcing/nichesourcing methods here or read my article from the latest Kansalliskirjasto magazine in Finnish here (pp. 38-40)

If you have any questions related to our project, materials that are published or will be published, don’t hesitate to contact us by e-mail. Write to for consultation. Watch also out our VKontakte site for news flashes in Russian and project manager’s Twitter account for news in English.

Wishing you all a pleasant and sunny summer,




Leave a Reply

Your email address will not be published. Required fields are marked *