Embedding representations of text are useful for downstream natural language processing tasks. Several universal sentence representation methods have been proposed with a particular focus on self-supervised pre-training approaches to leverage the vast quantities of unlabelled data. However, there are two challenges for generating rich embedding representations for a new document. 1) The latest rich embedding generators are based on very large costly transformer-based architectures. 2) The rich embedding representation of a new document is limited to only the information provided without access to any explicit contextual and temporal information that could potentially further enrich the representation. We propose efficient retrieval-augmented text embeddings (ERATE) that tackles the first issue and offers a method to tackle the second issue. To the best of our knowledge, we are the first to incorporate retrieval to general purpose embeddings as a new paradigm, which we apply to the semantic similarity tasks of SentEval. Despite not reaching state-of-the-art performance, ERATE offers key insights that encourages future work into investigating the potential of retrieval-based embeddings.
CITATION STYLE
Raina, V., Kassner, N., Popat, K., Lewis, P., Cancedda, N., & Martin, L. (2023). ERATE: Efficient Retrieval Augmented Text Embeddings. In ACL 2023 - 4th Workshop on Insights from Negative Results in NLP, Proceedings (pp. 11–18). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.insights-1.2
Mendeley helps you to discover research relevant for your work.