Using question series to evaluate question answering system effectiveness

15Citations
Citations of this article
79Readers
Mendeley users who have this article in their library.

Abstract

The original motivation for using question series in the TREC 2004 question answering track was the desire to model aspects of dialogue processing in an evaluation task that included different question types. The structure introduced by the series also proved to have an important additional benefit: the series is at an appropriate level of granularity for aggregating scores for an effective evaluation. The series is small enough to be meaningful at the task level since it represents a single user interaction, yet it is large enough to avoid the highly skewed score distributions exhibited by single questions. An analysis of the reliability of the per-series evaluation shows the evaluation is stable for differences in scores seen in the track. © 2005 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Voorhees, E. M. (2005). Using question series to evaluate question answering system effectiveness. In HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 299–306). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220575.1220613

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free