OCR’ed text is often lacking in quality because of errors during the optical recognition process, especially when the source material is old or otherwise in a bad state. These errors make it hard to rely on the text for building a corpus or word lists and makes the source material less accessible to use for study or to incorporate into other tooling for language researchers. This is a problem that our OCR editor tries to eradicate, or at least contribute a possible solution towards.
Brief technical overview of Revizor, the editor for correcting OCR text material
Reply