Automatic anonymization of printed-text document images

Ángel Sánchez; José F. Vélez; Javier Sánchez; A. Belén Moreno

Conference ProceedingsOPEN ACCESS

Automatic anonymization of printed-text document images

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10884 LNCS 145-152

DOI: 10.1007/978-3-319-94211-7_17

2Citations

9Readers

Abstract

Nowadays, the storage and transmission of some types of documents requires the removal of personal information from involved users. Automatic text anonymization or de-identification is a solution for hiding all sensible information contained in the documents. Although the problem has been mainly studied for plain printed-text documents, there are not works where the de-identification task also produces anonymized document images with the same text fonts as those in the original documents. This data augmentation process could be applied to train a system for document image classification. In this paper, we describe an implementation of an automated anonymization modular system for printed-text image documents written in Spanish. System evaluation performed on a dataset of invoice images shows the viability of our proposal.

Author supplied keywords

Cite

CITATION STYLE

APA

Sánchez, Á., Vélez, J. F., Sánchez, J., & Moreno, A. B. (2018). Automatic anonymization of printed-text document images. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10884 LNCS, pp. 145–152). Springer Verlag. https://doi.org/10.1007/978-3-319-94211-7_17

Automatic anonymization of printed-text document images

Abstract

Author supplied keywords

Cite

Register to see more suggestions