Looking for a few good metrics: Automatic summarization evaluation - how many samples are enough

  • Lin C
N/ACitations
Citations of this article
59Readers
Mendeley users who have this article in their library.

Abstract

ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to auto- matically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlap- ping units such as n-gram, word sequences, and word pairs between the computer-generated sum- mary to be evaluated and the ideal summaries cre- ated by humans. This paper discusses the validity of the evaluation method used in the Document Under- standing Conference (DUC) and evaluates five dif- ferent ROUGE metrics: ROUGE-N, ROUGE-L, ROUGE- W, ROUGE-S, and ROUGE-SU included in the ROUGE summarization evaluation package using data pro- vided by DUC. A comprehensive study of the effects of using single or multiple references and various sample sizes on the stability of the results is also presented.

Cite

CITATION STYLE

APA

Lin, C.-Y. (2004). Looking for a few good metrics: Automatic summarization evaluation - how many samples are enough. In Proceedings of the NTCIR Workshop (pp. 1765–1776).

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free