Self-supervised Information Retrieval Trained from Self-generated Sets of Queries and Relevant Documents

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Large corpora of textual data such as scientific papers, patents, legal documents, reviews, etc., represent precious unstructured knowledge that needs semantic information retrieval engines to be extracted. Current best information retrieval solutions use supervised deep learning approaches, requiring large labelled training sets of queries and corresponding relevant documents, often unavailable, or their preparation is economically infeasible for most organizations. In this work, we present a new self-supervised method to train a neural solution to model and efficiently search large corpora of documents against arbitrary queries without requiring labelled dataset of queries and associated relevant papers. The core points of our self-supervised approach are (i) a method to self-generate the training set of queries and their relevant documents from the corpus itself, without any kind of human supervision, (ii) a deep metric learning approach to model their semantic space of relationships, and (iii) the incorporation of a multi-dimensional index for this neural semantic space over which running queries efficiently. To better stress the performance of the approach, we applied it to a totally unsupervised corpus with complex contents of over half a million Italian legal documents.

Cite

CITATION STYLE

APA

Moro, G., Valgimigli, L., Rossi, A., Casadei, C., & Montefiori, A. (2022). Self-supervised Information Retrieval Trained from Self-generated Sets of Queries and Relevant Documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13590 LNCS, pp. 283–290). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-17849-8_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free