Context-Aware Document Term Weighting for Ad-Hoc Search

84Citations
Citations of this article
105Readers
Mendeley users who have this article in their library.

Abstract

Bag-of-words document representations play a fundamental role in modern search engines, but their power is limited by the shallow frequency-based term weighting scheme. This paper proposes HDCT, a context-aware document term weighting framework for document indexing and retrieval. It first estimates the semantic importance of a term in the context of each passage. These fine-grained term weights are then aggregated into a document-level bag-of-words representation, which can be stored into a standard inverted index for efficient retrieval. This paper also proposes two approaches that enable training HDCT without relevance labels. Experiments show that an index using HDCT weights significantly improved the retrieval accuracy compared to typical term-frequency and state-of-the-art embedding-based indexes.

Cite

CITATION STYLE

APA

Dai, Z., & Callan, J. (2020). Context-Aware Document Term Weighting for Ad-Hoc Search. In The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020 (pp. 1897–1907). Association for Computing Machinery, Inc. https://doi.org/10.1145/3366423.3380258

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free