This paper presents a system for retrieval of relevant documents from large document image collections. We achieve effective search and retrieval from a large collection of printed document images by matching image features at word-level. For representations of the words, profile-based and shape-based features are employed. A novel DTW-based partial matching scheme is employed to take care of morphologically variant words, This is useful for grouping together similar words during the indexing process. The system supports cross-lingual search using OM-Trans transliteration and a dictionary-based approach. System-level issues for retrieval (eg. scalability, effective delivery etc.) are addressed in this paper. © Springer-Verlag Berlin Heidelberg 2006.
CITATION STYLE
Balasubramanian, A., Meshesha, M., & Jawahar, C. V. (2006). Retrieval from document image collections. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3872 LNCS, pp. 1–12). https://doi.org/10.1007/11669487_1
Mendeley helps you to discover research relevant for your work.