Abstract
Document retrieval can be considered as a basic but important tool for text mining that is capable of taking a user’s information need into account. However, document retrieval is a hard task if multitopic lengthy documents have to be retrieved with a very short description (a few keywords) of the information need. In this paper, we focus on this problem which is typical in real world applications. We experimentally validate that passage-based document retrieval is advantageous in such circumstances as compared to conventional document retrieval. Passage-based document retrieval is a kind of document retrieval which takes into account only small fractions (passages) of documents to judge the document relevance to the information need. As a passage-based method, we employ the method based on density distributions of keywords. This is compared with the following three conventional methods for document retrieval: the vector space model, pseudo-feedback, and latent semantic indexing. Experimental results show that the passagebased method is superior to the conventional methods if long documents have to be retrieved by short queries.
Cite
CITATION STYLE
Kise, K., Junker, M., Dengel, A., & Matsumoto, K. (2001). Passage-based document retrieval as a tool for text mining with user’s information needs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2226, pp. 155–169). Springer Verlag. https://doi.org/10.1007/3-540-45650-3_16
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.