Adapting OCR with limited supervision

Deepayan Das; C. V. Jawahar

Conference Proceedings

Adapting OCR with limited supervision

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12116 LNCS 30-44

DOI: 10.1007/978-3-030-57058-3_3

4Citations

13Readers

Get full text

Abstract

Text recognition systems of today (aka OCRs) are mostly based on supervised learning of deep neural networks. Performance of these is limited by the type of data that is used for training. In the presence of diverse style in the document images (eg. fonts, print, writer, imaging process), creating a large amount of training data is impossible. In this paper, we explore the problem of adapting an existing OCR, already trained for a specific collection to a new collection, with minimal supervision or human effort. We explore three popular strategies for this: (i) Fine Tuning (ii) Self Training (ii) Fine Tuning + Self Training. We discuss details on how these popular approaches in Machine Learning can be adapted to the text recognition problem of our interest. We hope, our empirical observations on two different languages will be of relevance to wider use cases in text recognition.

Author supplied keywords

Cite

CITATION STYLE

APA

Das, D., & Jawahar, C. V. (2020). Adapting OCR with limited supervision. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12116 LNCS, pp. 30–44). Springer. https://doi.org/10.1007/978-3-030-57058-3_3

Adapting OCR with limited supervision

Abstract

Author supplied keywords

Cite

Register to see more suggestions