Aggregation of multiple judgments for evaluating ordered lists

9Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Many tasks (e.g., search and summarization) result in an ordered list of items. In order to evaluate such an ordered list of items, we need to compare it with an ideal ordered list created by a human expert for the same set of items. To reduce any bias, multiple human experts are often used to create multiple ideal ordered lists. An interesting challenge in such an evaluation method is thus how to aggregate these different ideal lists to compute a single score for an ordered list to be evaluated. In this paper, we propose three new methods for aggregating multiple order judgments to evaluate ordered lists: weighted correlation aggregation, rank-based aggregation, and frequent sequential pattern-based aggregation. Experiment results on ordering sentences for text summarization show that all the three new methods outperform the state of the art average correlation methods in terms of discriminativeness and robustness against noise. Among the three proposed methods, the frequent sequential pattern-based method performs the best due to the flexible modeling of agreements and disagreements among human experts at various levels of granularity. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Kim, H. D., Zhai, C., & Han, J. (2010). Aggregation of multiple judgments for evaluating ordered lists. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5993 LNCS, pp. 166–178). Springer Verlag. https://doi.org/10.1007/978-3-642-12275-0_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free