Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers

Xinyu Zhang; Andrew Yates; Jimmy Lin

Conference Proceedings

Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2021) 12657 LNCS 150-163

DOI: 10.1007/978-3-030-72240-1_11

11Citations

5Readers

Get full text

Abstract

While BERT has been shown to be effective for passage retrieval, its maximum input length limitation poses a challenge when applying the model to document retrieval. In this work, we reproduce three passage score aggregation approaches proposed by Dai and Callan [5] for overcoming this limitation. After reproducing their results, we generalize their findings through experiments with a new dataset and experiment with other pretrained transformers that share similarities with BERT. We find that these BERT variants are not more effective for document retrieval in isolation, but can lead to increased effectiveness when combined with “pre–fine-tuning” on the MS MARCO passage dataset. Finally, we investigate whether there is a difference between fine-tuning models on “deep” judgments (i.e., fewer queries with many judgments each) vs. fine-tuning on “shallow” judgments (i.e., many queries with fewer judgments each). Based on available data from two different datasets, we find that the two approaches perform similarly.

Cite

CITATION STYLE

APA

Zhang, X., Yates, A., & Lin, J. (2021). Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12657 LNCS, pp. 150–163). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-72240-1_11

Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers

Abstract

Cite

Register to see more suggestions