Post-prodcution of our digital content

Anis Moubarik, an information system specialist at the National Library and a member of DPKL team, will introduce you to that procedure what happens to a digitized book in our post-production processes. During the project, Anis has been in charge of creating both, OCR’ed PDFs that are available in our Fenno-Ugrica collection and Alto XML files per book, which are made available for editing in Revizor, the text editor for enhancing the data.

Continue reading

Brief technical overview of Revizor, the editor for correcting OCR text material

OCR’ed text is often lacking in quality because of errors during the optical recognition process, especially when the source material is old or otherwise in a bad state. These errors make it hard to rely on the text for building a corpus or word lists and makes the source material less accessible to use for study or to incorporate into other tooling for language researchers. This is a problem that our OCR editor tries to eradicate, or at least contribute a possible solution towards.

Continue reading

OCR Webinar 5 March 2014: Recap

Dear all,

thanks for participating in our OCR Webinar on the 5th of March 2014. Please, find here all four presentations, which were held during the webinar, the transcript of chat comments and link to the Webinar recording.

In the case of further enquiries, please, don’t hesitate to write your comment on our KIWI page or contact us by e-mail: kk-fennougrica@helsinki.fi

Yours &c.,

Jussi-Pekka

What is the point of an online interactive OCR text editor?

The Digitization Project of Kindred Languages is not only about the publishing Fenno-Ugric material online,but it also aims to support the linguistic research by developing purposeful tools for its help. In this blog entry, Wouter Van Hemel of the National Library of Finland sheds the light over the OCR editor, which enables the editing of machine-encoded text for the benefit of linguistic research by crowdsourcing.

***
Continue reading