Nowadays, the storage and transmission of some types of documents requires the removal of personal information from involved users. Automatic text anonymization or de-identification is a solution for hiding all sensible information contained in the documents. Although the problem has been mainly studied for plain printed-text documents, there are not works where the de-identification task also produces anonymized document images with the same text fonts as those in the original documents. This data augmentation process could be applied to train a system for document image classification. In this paper, we describe an implementation of an automated anonymization modular system for printed-text image documents written in Spanish. System evaluation performed on a dataset of invoice images shows the viability of our proposal.
CITATION STYLE
Sánchez, Á., Vélez, J. F., Sánchez, J., & Moreno, A. B. (2018). Automatic anonymization of printed-text document images. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10884 LNCS, pp. 145–152). Springer Verlag. https://doi.org/10.1007/978-3-319-94211-7_17
Mendeley helps you to discover research relevant for your work.