Topic-Grained Text Representation-Based Model for Document Retrieval

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online matching time by pre-storing document representations offline. However, the above paradigm consumes vast local storage space, especially when storing the document as word-grained representations. To tackle this, we present TGTR, a Topic-Grained Text Representation-based Model for document retrieval. Following the representation-based matching paradigm, TGTR stores the document representations offline to ensure retrieval efficiency, whereas it significantly reduces the storage requirements by using novel topic-grained representations rather than traditional word-grained. Experimental results demonstrate that compared to word-grained baselines, TGTR is consistently competitive with them on TREC CAR and MS MARCO in terms of retrieval accuracy, but it requires less than 1/10 of the storage space required by them. Moreover, TGTR overwhelmingly surpasses global-grained baselines in terms of retrieval accuracy.

Cite

CITATION STYLE

APA

Du, M., Li, S., Yu, J., Ma, J., Ji, B., Liu, H., … Yi, Z. (2022). Topic-Grained Text Representation-Based Model for Document Retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13531 LNCS, pp. 776–788). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-15934-3_64

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free