Efficient search in document image collections

Anand Kumar; C. V. Jawahar; R. Manmatha

Conference Proceedings

Efficient search in document image collections

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4843 LNCS(PART 1) 586-595

DOI: 10.1007/978-3-540-76386-4_55

28Citations

20Readers

Get full text

Abstract

This paper presents an efficient indexing and retrieval scheme for searching in document image databases. In many non-European languages, optical character recognizers are not very accurate. Word spotting - word image matching - may instead be used to retrieve word images in response to a word image query. The approaches used for word spotting so far, dynamic time warping and/or nearest neighbor search, tend to be slow. Here indexing is done using locality sensitive hashing (LSH) - a technique which computes multiple hashes - using word image features computed at word level. Efficiency and scalability is achieved by content-sensitive hashing implemented through approximate nearest neighbor computation. We demonstrate that the technique achieves high precision and recall (in the 90% range), using a large image corpus consisting of seven Kalidasa's (a well known Indian poet of antiquity) books in the Telugu language. The accuracy is comparable to using dynamic time warping and nearest neighbor search while the speed is orders of magnitude better - 20000 word images can be searched in milliseconds. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Kumar, A., Jawahar, C. V., & Manmatha, R. (2007). Efficient search in document image collections. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4843 LNCS, pp. 586–595). Springer Verlag. https://doi.org/10.1007/978-3-540-76386-4_55

Efficient search in document image collections

Abstract

Cite

Register to see more suggestions