OCR’ed text is often lacking in quality because of errors during the optical recognition process, especially when the source material is old or otherwise in a bad state. These errors make it hard to rely on the text for building a corpus or word lists and makes the source material less accessible to use for study or to incorporate into other tooling for language researchers. This is a problem that our OCR editor tries to eradicate, or at least contribute a possible solution towards.
During the Digitization Project of Kindred Languages, we have paid a special attention to the materials published in Mordvinic languages, Erzya, Moksha, Shoksha. Erzya was converted into a medium of popular education, enlightenment and dissemination of information pertinent to the developing political agenda of the Soviet state.