The original motivation for using question series in the TREC 2004 question answering track was the desire to model aspects of dialogue processing in an evaluation task that included different question types. The structure introduced by the series also proved to have an important additional benefit: the series is at an appropriate level of granularity for aggregating scores for an effective evaluation. The series is small enough to be meaningful at the task level since it represents a single user interaction, yet it is large enough to avoid the highly skewed score distributions exhibited by single questions. An analysis of the reliability of the per-series evaluation shows the evaluation is stable for differences in scores seen in the track. © 2005 Association for Computational Linguistics.
CITATION STYLE
Voorhees, E. M. (2005). Using question series to evaluate question answering system effectiveness. In HLT/EMNLP 2005 - Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 299–306). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220575.1220613
Mendeley helps you to discover research relevant for your work.