Post-prodcution of our digital content

Posted on 13.5.2015 by Jussi-Pekka Hakkarainen

2,578

Anis Moubarik, an information system specialist at the National Library and a member of DPKL team, will introduce you to that procedure what happens to a digitized book in our post-production processes. During the project, Anis has been in charge of creating both, OCR’ed PDFs that are available in our Fenno-Ugrica collection and Alto XML files per book, which are made available for editing in Revizor, the text editor for enhancing the data.

Continue reading →

Brief technical overview of Revizor, the editor for correcting OCR text material

Posted on 30.4.2015 by Jussi-Pekka Hakkarainen

2,390

OCR’ed text is often lacking in quality because of errors during the optical recognition process, especially when the source material is old or otherwise in a bad state. These errors make it hard to rely on the text for building a corpus or word lists and makes the source material less accessible to use for study or to incorporate into other tooling for language researchers. This is a problem that our OCR editor tries to eradicate, or at least contribute a possible solution towards.

Continue reading →

OCR Webinar 5 March 2014: Recap

Posted on 6.3.2014 by Jussi-Pekka Hakkarainen

6,991

Dear all,

thanks for participating in our OCR Webinar on the 5th of March 2014. Please, find here all four presentations, which were held during the webinar, the transcript of chat comments and link to the Webinar recording.

Hakkarainen: An Introduction to the OCR Webinar: Making the Impact on Research and Society
Van Hemel: OCRUI: Interface for the correction of OCR text material
Rueter: Subversioned OCR Editing, an Opening for Community Involvement
Vanhasalo: Kotus Experience: Crowdsourcing the Old Literary Finnish for the Research’s Benefit
Chat Transcript from OCR Webinar
Webinar recording (Opens in Adobe Connect)

In the case of further enquiries, please, don’t hesitate to write your comment on our KIWI page or contact us by e-mail: kk-fennougrica@helsinki.fi

Yours &c.,

Jussi-Pekka

What is the point of an online interactive OCR text editor?

Posted on 21.2.2014 by Jussi-Pekka Hakkarainen

3,801

The Digitization Project of Kindred Languages is not only about the publishing Fenno-Ugric material online,but it also aims to support the linguistic research by developing purposeful tools for its help. In this blog entry, Wouter Van Hemel of the National Library of Finland sheds the light over the OCR editor, which enables the editing of machine-encoded text for the benefit of linguistic research by crowdsourcing.

***
Continue reading →

Fenno-Ugrica

The Blog of the Minority Languages Project – National Library of Finland

Category Archives: OCR editor

Post-prodcution of our digital content

Brief technical overview of Revizor, the editor for correcting OCR text material

OCR Webinar 5 March 2014: Recap

What is the point of an online interactive OCR text editor?