CIFU XII, Day 2

What did I learn from the second day of CIFU XII? Two things at least: for a linguist layman like me, I found it interesting to follow how differently the language documentation may be defined. As a librarian, I was thrilled to see that the people in this field are taking archiving seriously. These are the topics I want to grasp in this blog entry too.

Continue reading

CIFU XII, Day 1

So, the 12th International Congress of Finno-Ugric Studies has finally begun. Despite the fact that Mr. Harri Mantila implicated that the congress has become somewhat tinier than before, we are pleased to enjoy about 111 long papers and 195 presentations in 19 symposia. The CIFU XII has around 380 participants from 21 countries, so I wouldn’t consider this event as a small rendez-vous at all.

Continue reading

DH2015. Recap, Day 5

I had spent four days in DH2015 and I hadn’t really chosen the sessions as a historian or a philologist in me would have wanted. No, there wasn’t anyone in my organization, who would have prompted me to participate any precise session in particular, but when going to the conferences, I tend to attend the sessions, which could provide some new information for my home organization in return. By intention, I chose the sessions of the last day according to my own interests and finally I was picking cherries too.

Continue reading

What Did I Learn from the DH2015 Workshops? Recap, Days 1-2

The DH2015 is taking place during this week in Sydney, Australia. Digitization Project of Kindred Languages will be present here as I was enabled to have a long paper on Nichesourcing of Uralic Languages later this week. Yesterday and today, I was attending the pre-conference workshops. This is a brief summary on my experiences in three workshops.

Continue reading

Post-prodcution of our digital content

Anis Moubarik, an information system specialist at the National Library and a member of DPKL team, will introduce you to that procedure what happens to a digitized book in our post-production processes. During the project, Anis has been in charge of creating both, OCR’ed PDFs that are available in our Fenno-Ugrica collection and Alto XML files per book, which are made available for editing in Revizor, the text editor for enhancing the data.

Continue reading

Brief technical overview of Revizor, the editor for correcting OCR text material

OCR’ed text is often lacking in quality because of errors during the optical recognition process, especially when the source material is old or otherwise in a bad state. These errors make it hard to rely on the text for building a corpus or word lists and makes the source material less accessible to use for study or to incorporate into other tooling for language researchers. This is a problem that our OCR editor tries to eradicate, or at least contribute a possible solution towards.

Continue reading