Evaluating Embedding APIs for Information Retrieval

Ehsan Kamalloo; Xinyu Zhang; Odunayo Ogundepo; Nandan Thakur; David Alfonso-Hermelo; Mehdi Rezagholizadeh; Jimmy Lin

Conference ProceedingsOPEN ACCESS

Evaluating Embedding APIs for Information Retrieval

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 5 518-526

DOI: 10.18653/v1/2023.acl-industry.50

3Citations

37Readers

Abstract

The ever-increasing size of language models curtails their widespread availability to the community, thereby galvanizing many companies into offering access to large language models through APIs. One particular type, suitable for dense retrieval, is a semantic embedding service that builds vector representations of input text. With a growing number of publicly available APIs, our goal in this paper is to analyze existing offerings in realistic retrieval scenarios, to assist practitioners and researchers in fnd-ing suitable services according to their needs. Specifcally, we investigate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval. For this purpose, we evaluate these services on two standard benchmarks, BEIR and MIRACL. We fnd that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English, in contrast to the standard practice of employing them as frst-stage retrievers. For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost. We hope our work lays the groundwork for evaluating semantic embedding APIs that are critical in search and more broadly, for information access.

Cite

CITATION STYLE

APA

Kamalloo, E., Zhang, X., Ogundepo, O., Thakur, N., Alfonso-Hermelo, D., Rezagholizadeh, M., & Lin, J. (2023). Evaluating Embedding APIs for Information Retrieval. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 5, pp. 518–526). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-industry.50

Evaluating Embedding APIs for Information Retrieval

Abstract

Cite

Register to see more suggestions