Image based retrieval and keyword spotting in documents

Chew Lim Tan; X. Zhang; Linlin Li

Book Chapter

Image based retrieval and keyword spotting in documents

Springer London, (2014), 805-842

DOI: 10.1007/978-0-85729-859-1_27

9Citations

8Readers

Get full text

Abstract

The attempt to move towards paperless offices has led to the digitization of large quantities of printed documents for storage in image databases. Thanks to advances in computer and network technology, it is possible to generate and transmit huge amount of document images efficiently. An ensuing and pressing issue is then to find ways and means to provide highly reliable and efficient retrieval functionality over these document images from a vast variety of information sources. Optical Character Recognition (OCR) is one powerful tool to achieve retrieval tasks, but nowadays there is a debate over the trade-off between OCR-based and OCR-free retrieval, because of OCR errors and wastage of time to OCR the entire collection into text format. Instead, image-based retrieval using document image similarity measure is a much more economical alternative. Till now, many methods have been proposed to achieve different sub-tasks, all of which contribute to the final retrieval performance. This chapter will present different methods for presenting word images and preprocessing steps before similarity measure or training and testing and discuss different algorithms or models for achieving keyword spotting and document image retrieval.

Author supplied keywords

Cite

CITATION STYLE

APA

Tan, C. L., Zhang, X., & Li, L. (2014). Image based retrieval and keyword spotting in documents. In Handbook of Document Image Processing and Recognition (pp. 805–842). Springer London. https://doi.org/10.1007/978-0-85729-859-1_27

Image based retrieval and keyword spotting in documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions