Reduce Human Labor On Evaluating Conversational Information Retrieval System: A Human-Machine Collaboration Approach

11Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Evaluating conversational information retrieval (CIR) systems is a challenging task that requires a significant amount of human labor for annotation. It is imperative to invest significant effort into researching more labor-effective methods for evaluating CIR systems. To touch upon this challenge, we take the first step to involve active testing in CIR evaluation and propose a novel method called HumCoE. It strategically selects a few data for human annotation and then calibrates the evaluation results to eliminate evaluation biases. As such, it makes an accurate evaluation of the CIR system at low human labor. We experimentally reveal that it consumes less than 1% of human labor and achieves a consistency rate of 95%-99% with human evaluation results. This emphasizes the superiority of our method.

Cite

CITATION STYLE

APA

Huang, C., Qin, P., Lei, W., & Lv, J. (2023). Reduce Human Labor On Evaluating Conversational Information Retrieval System: A Human-Machine Collaboration Approach. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 10876–10891). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.670

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free