To cope with the fact that, in the ad hoc retrieval setting, documents relevant to a query could contain very few (short) parts (passages) with query-related information, researchers proposed passage-based document ranking approaches. We show that several of these retrieval methods can be understood, and new ones can be derived, using the same probabilistic model. We use language-model estimates to instantiate specific retrieval algorithms, and in doing so present a novel passage language model that integrates information from the containing document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we present yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; these relevance models also outperform a document-based relevance model. Finally, we demonstrate the merits in using the document-homogeneity measures for integrating document-query and passage-query similarity information for document retrieval. © 2009 Springer Science+Business Media, LLC.
CITATION STYLE
Bendersky, M., & Kurland, O. (2010). Utilizing passage-based language models for ad hoc document retrieval. Information Retrieval, 13(2), 157–187. https://doi.org/10.1007/s10791-009-9118-8
Mendeley helps you to discover research relevant for your work.