Tevatron: An Efficient and Flexible Toolkit for Neural Retrieval

Luyu Gao; Xueguang Ma; Jimmy Lin; Jamie Callan

Conference ProceedingsOPEN ACCESS

Tevatron: An Efficient and Flexible Toolkit for Neural Retrieval

SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (2023) 3120-3124

DOI: 10.1145/3539618.3591805

6Citations

12Readers

Get full text

Abstract

Recent rapid advances in deep pre-trained language models and the introduction of large datasets have powered research in embedding-based neural retrieval. While many excellent research papers have emerged, most of them come with their own implementations, which are typically optimized for some particular research goals instead of efficiency or code organization. In this paper, we introduce Tevatron, a neural retrieval toolkit that is optimized for efficiency, flexibility, and code simplicity. Tevatron enables model training and evaluation for a variety of ranking components such as dense retrievers, sparse retrievers, and rerankers. It also provides a standardized pipeline that includes text processing, model training, corpus/query encoding, and search. In addition, Tevatron incorporates well-studied methods for improving retriever effectiveness such as hard negative mining and knowledge distillation. We provide an overview of Tevatron in this paper, demonstrating its effectiveness and efficiency on multiple IR and QA datasets. We highlight Tevatron's flexible design, which enables easy generalization across datasets, model architectures, and accelerator platforms (GPUs and TPUs). Overall, we believe that Tevatron can serve as a solid software foundation for research on neural retrieval systems, including their design, modeling, and optimization.

Author supplied keywords

Cite

CITATION STYLE

APA

Gao, L., Ma, X., Lin, J., & Callan, J. (2023). Tevatron: An Efficient and Flexible Toolkit for Neural Retrieval. In SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 3120–3124). Association for Computing Machinery, Inc. https://doi.org/10.1145/3539618.3591805

Tevatron: An Efficient and Flexible Toolkit for Neural Retrieval

Abstract

Author supplied keywords

Cite

Register to see more suggestions