Using question series to evaluate question answering system effectiveness

Ellen M. Voorhees

Conference ProceedingsOPEN ACCESS

Using question series to evaluate question answering system effectiveness

Voorhees E

HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (2005) 299-306

DOI: 10.3115/1220575.1220613

15Citations

79Readers

Abstract

The original motivation for using question series in the TREC 2004 question answering track was the desire to model aspects of dialogue processing in an evaluation task that included different question types. The structure introduced by the series also proved to have an important additional benefit: the series is at an appropriate level of granularity for aggregating scores for an effective evaluation. The series is small enough to be meaningful at the task level since it represents a single user interaction, yet it is large enough to avoid the highly skewed score distributions exhibited by single questions. An analysis of the reliability of the per-series evaluation shows the evaluation is stable for differences in scores seen in the track. © 2005 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Voorhees, E. M. (2005). Using question series to evaluate question answering system effectiveness. In HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 299–306). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220575.1220613

Using question series to evaluate question answering system effectiveness

Abstract

Cite

Register to see more suggestions