Imaged document text retrieval without OCR

75Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely, the Vertical Traverse Density (VTD) and Horizontal Traverse Density (HTD), are extracted. An n-gram based document vector is constructed for each document based on these features. Text similarity between documents is then measured by calculating the dot product of the document vectors. Testing with seven corpora of imaged textual documents in English and Chinese as well as images from UW1 database confirms the validity of the proposed method.

Cite

CITATION STYLE

APA

Tan, C. L., Huang, W., Yu, Z., & Xu, Y. (2002). Imaged document text retrieval without OCR. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(6), 838–844. https://doi.org/10.1109/TPAMI.2002.1008389

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free