Segmentation-free keyword retrieval in historical document images

Irina Rabaev; Itshak Dinstein; Jihad El-Sana; Klara Kedem

Conference Proceedings

Segmentation-free keyword retrieval in historical document images

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8814 369-378

DOI: 10.1007/978-3-319-11758-4_40

8Citations

7Readers

Get full text

Abstract

We present a segmentation-free method to retrieve keywords from degraded historical documents. The proposed method works directly on the gray scale representation and does not require any pre-processing to enhance document images. The document images are subdivided into overlapping patches of varying sizes, where each patch is described by the bag-of-visual-words descriptor. The obtained patch descriptors are hashed into several hash tables using kernelized locality-sensitive hashing scheme for efficient retrieval. In such a scheme the search for a keyword is reduced to a small fraction of the patches from the appropriate entries in the hash tables. Since we need to capture the handwriting variations and the availability of historical documents is limited, we synthesize a small number of samples from the given query to improve the results of the retrieval process. We have tested our approach on historical document images in Hebrew from the Cairo Genizah collection, and obtained impressive results.

Author supplied keywords

Cite

CITATION STYLE

APA

Rabaev, I., Dinstein, I., El-Sana, J., & Kedem, K. (2014). Segmentation-free keyword retrieval in historical document images. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8814, pp. 369–378). Springer Verlag. https://doi.org/10.1007/978-3-319-11758-4_40

Segmentation-free keyword retrieval in historical document images

Abstract

Author supplied keywords

Cite

Register to see more suggestions