Post OCR correction of swedish patent text: The difference between reading tongue ‘lästunga’ and security tab ‘låstunga’

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The purpose of this paper is to compare two basic post-processing algorithms for correction of optical character recognition (OCR) errors in Swedish text. One is based on language knowledge and manual correction (lexical filter); the other is based on a generic algorithm using limited language knowledge in order to perform corrections (generic filter). The different methods aim to improve the quality of OCR generated Swedish patent text. Tests are conducted on 7,721 randomly selected patent claims generated by different OCR software tools. The OCR generated and automatically corrected (by the lexical or generic filter) texts are compared with manually corrected ground truth. The preliminary results indicate that the OCR tools are biased to different characters when generating text and the language knowledge of post correction influences the final results.

Cite

CITATION STYLE

APA

Andersson, L., Rastas, H., & Rauber, A. (2014). Post OCR correction of swedish patent text: The difference between reading tongue ‘lästunga’ and security tab ‘låstunga.’ Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8849, 1–9. https://doi.org/10.1007/978-3-319-12979-2_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free