Topic-Grained Text Representation-Based Model for Document Retrieval

Mengxue Du; Shasha Li; Jie Yu; Jun Ma; Bin Ji; Huijun Liu; Wuhang Lin; Zibo Yi

Conference Proceedings

Topic-Grained Text Representation-Based Model for Document Retrieval

Du M
Li S
Yu J
et al.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13531 LNCS 776-788

DOI: 10.1007/978-3-031-15934-3_64

2Citations

6Readers

Get full text

Abstract

Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online matching time by pre-storing document representations offline. However, the above paradigm consumes vast local storage space, especially when storing the document as word-grained representations. To tackle this, we present TGTR, a Topic-Grained Text Representation-based Model for document retrieval. Following the representation-based matching paradigm, TGTR stores the document representations offline to ensure retrieval efficiency, whereas it significantly reduces the storage requirements by using novel topic-grained representations rather than traditional word-grained. Experimental results demonstrate that compared to word-grained baselines, TGTR is consistently competitive with them on TREC CAR and MS MARCO in terms of retrieval accuracy, but it requires less than 1/10 of the storage space required by them. Moreover, TGTR overwhelmingly surpasses global-grained baselines in terms of retrieval accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Du, M., Li, S., Yu, J., Ma, J., Ji, B., Liu, H., … Yi, Z. (2022). Topic-Grained Text Representation-Based Model for Document Retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13531 LNCS, pp. 776–788). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-15934-3_64

Topic-Grained Text Representation-Based Model for Document Retrieval

Abstract

Author supplied keywords

Cite

Register to see more suggestions