The São Paulo Court of Justice has the highest number of lawsuits of all courts. The lawsuits are composed of raster-scanned documents enclosed in unstructured volumes, of which some are unreadable document images. Natural Language Processing techniques fail to extract from some of these documents due to the low quality of images. This article proposes a methodology to automatize the retrieval of document images from lawsuit databases based on the contents of the images. We developed a hybrid algorithm for feature extraction from document images and used a distance metric to retrieve similar images. The TJSP’s database was used to validate our proposal, resulting in a system that allows finding similar images with an accuracy above eighty percent.
CITATION STYLE
Freire, D. L., Ponce de Leon Ferreira de Carvalho, A. C., Carneiro Feltran, L., Ayumi Nagamatsu, L., Ramos da Silva, K. C., Firmino, C., … Mendes Portela, R. (2022). Content-Based Lawsuits Document Image Retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13566 LNAI, pp. 29–40). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16474-3_3
Mendeley helps you to discover research relevant for your work.