Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval

Marco Peer; Florian Kleber; Robert Sablatnig

Conference Proceedings

Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13639 LNCS 122-136

DOI: 10.1007/978-3-031-21648-0_9

1Citations

4Readers

Get full text

Abstract

This paper introduces a self-supervised approach using vision transformers for writer retrieval based on knowledge distillation. We propose morphological operations as a general data augmentation method for handwriting images to learn discriminative features independent of the pen. Our method operates on binarized 224 × 224 -sized patches extracted of the documents’ writing region, and we generate two different views based on randomly sampled kernels for erosion and dilation to learn a representative embedding space invariant to different pens. Our evaluation shows that morphological operations outperform data augmentation generally used in retrieval tasks, e.g., flipping, rotation, and translation, by up to 8%. Additionally, we evaluate our data augmentation strategy to existing approaches such as networks trained with triplet loss. We achieve a mean average precision of 66.4% on the Historical-WI dataset, competing with methods using algorithms like SIFT for patch extraction or computationally expensive encodings, e.g., mVLAD, NetVLAD, or E-SVM. In the end, we show by visualizing the attention mechanism that the heads of the vision transformer focus on different parts of the handwriting, e.g., loops or specific characters, enhancing the explainability of our writer retrieval.

Author supplied keywords

Cite

CITATION STYLE

APA

Peer, M., Kleber, F., & Sablatnig, R. (2022). Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13639 LNCS, pp. 122–136). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-21648-0_9

Self-supervised Vision Transformers with Data Augmentation Strategies Using Morphological Operations for Writer Retrieval

Abstract

Author supplied keywords

Cite

Register to see more suggestions