Self-Supervised Representation Learning for Document Image Classification

7Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Supervised learning, despite being extremely effective, relies on expensive, time-consuming, and error-prone annotations. Self-supervised learning has recently emerged as a strong alternate to supervised learning in a range of different domains as collecting a large amount of unlabeled data can be achieved by simply crawling the internet. These self-supervised methods automatically discover features relevant to represent an input example by using self-defined proxy tasks. In this paper, we question the potential of commonly employed purely supervised training (starting either from ImageNet pretrained networks or pure random initialization) in contrast to self-supervised representations that can be learned directly using self-supervised representation learning methods on large document image datasets. For this purpose, we leverage a large-scale document image collection (RVL-CDIP) to train ResNet-50 image encoder using two different self-supervision methods (SimCLR and Barlow Twins). Employing a linear classifier on top of self-supervised embeddings from ResNet-50 results in an accuracy of 86.75% as compared to 71.43% from the corresponding ImageNet pretrained embeddings. Similarly, evaluating on Tobacco-3482 dataset using self-supervised embeddings from ResNet-50 yields an accuracy of 88.52% in contrast to 74.16% from the corresponding ImageNet pretrained embeddings. We show that in the case of limited labeled data, this wide gap in performance between self-supervised and fully supervised models persists even after fine-tuning pretrained models. However, a significant reduction in this gap is observed with an increasing amount of data including the case where the model is trained from scratch. Our results show that representations learned using self-supervised representation learning techniques are a viable option for document image classification, specifically in the context of limited labeled data, which is a usual restriction in industrial use cases.

Cite

CITATION STYLE

APA

Siddiqui, S. A., Dengel, A., & Ahmed, S. (2021). Self-Supervised Representation Learning for Document Image Classification. IEEE Access, 9, 164358–164367. https://doi.org/10.1109/ACCESS.2021.3133200

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free