Learning to score system summaries for better content selection evaluation

77Citations
Citations of this article
105Readers
Mendeley users who have this article in their library.

Abstract

The evaluation of summaries is a challenging but crucial task of the summarization field. In this work, we propose to learn an automatic scoring metric based on the human judgements available as part of classical summarization datasets like TAC-2008 and TAC-2009. Any existing automatic scoring metrics can be included as features, the model learns the combination exhibiting the best correlation with human judgments. The reliability of the new metric is tested in a further manual evaluation where we ask humans to evaluate summaries covering the whole scoring spectrum of the metric. We release the trained metric as an open-source tool.

Cite

CITATION STYLE

APA

Peyrard, M., Botschen, T., & Gurevych, I. (2017). Learning to score system summaries for better content selection evaluation. In EMNLP 2017 - Workshop on New Frontiers in Summarization, NFiS 2017 - Workshop Proceedings (pp. 74–84). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w17-4510

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free