Looking for a few good metrics: Automatic summarization evaluation - how many samples are enough

Chin-Yew Lin

Conference Proceedings

Looking for a few good metrics: Automatic summarization evaluation - how many samples are enough

Lin C

Proceedings of the NTCIR Workshop (2004) 1765-1776

N/ACitations

59Readers

Abstract

ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to auto- matically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlap- ping units such as n-gram, word sequences, and word pairs between the computer-generated sum- mary to be evaluated and the ideal summaries cre- ated by humans. This paper discusses the validity of the evaluation method used in the Document Under- standing Conference (DUC) and evaluates five dif- ferent ROUGE metrics: ROUGE-N, ROUGE-L, ROUGE- W, ROUGE-S, and ROUGE-SU included in the ROUGE summarization evaluation package using data pro- vided by DUC. A comprehensive study of the effects of using single or multiple references and various sample sizes on the stability of the results is also presented.

Author supplied keywords

Cite

CITATION STYLE

APA

Lin, C.-Y. (2004). Looking for a few good metrics: Automatic summarization evaluation - how many samples are enough. In Proceedings of the NTCIR Workshop (pp. 1765–1776).

Looking for a few good metrics: Automatic summarization evaluation - how many samples are enough

Abstract

Author supplied keywords

Cite

Register to see more suggestions