Spotting where to read on pages – Retrieval of relevant parts from page images

Koichi Kise; Masaaki Tsujino; Keinosuke Matsumoto

Conference ProceedingsOPEN ACCESS

Spotting where to read on pages – Retrieval of relevant parts from page images

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2423 388-399

DOI: 10.1007/3-540-45869-7_43

6Citations

2Readers

Abstract

This paper presents a new method of document image retrieval that is capable of spotting parts of page images relevant to a user’s query. This enables us to improve the usability of retrieval, since a user can find where to read on retrieved pages. The effectiveness of retrieval can also be improved because the method is little influenced by irrelevant parts on pages. The method is based on the assumption that parts of page images which densely contain keywords in a query are relevant to it. The characteristics of the proposed method are as follows: (1) Two-dimensional density distributions of keywords are calculated for ranking parts of page images, (2) The method relies only on the distribution of characters so as not to be affected by the errors of layout analysis. Based on the experimental results of retrieving Japanese newspaper articles, we have shown that the proposed method is superior to a method without the function of dealing with parts, and sometimes equivalent to a method of electronic document retrieval that works on error-free text.

Cite

CITATION STYLE

APA

Kise, K., Tsujino, M., & Matsumoto, K. (2002). Spotting where to read on pages – Retrieval of relevant parts from page images. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2423, pp. 388–399). Springer Verlag. https://doi.org/10.1007/3-540-45869-7_43

Spotting where to read on pages – Retrieval of relevant parts from page images

Abstract

Cite

Register to see more suggestions