ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to auto- matically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlap- ping units such as n-gram, word sequences, and word pairs between the computer-generated sum- mary to be evaluated and the ideal summaries cre- ated by humans. This paper discusses the validity of the evaluation method used in the Document Under- standing Conference (DUC) and evaluates five dif- ferent ROUGE metrics: ROUGE-N, ROUGE-L, ROUGE- W, ROUGE-S, and ROUGE-SU included in the ROUGE summarization evaluation package using data pro- vided by DUC. A comprehensive study of the effects of using single or multiple references and various sample sizes on the stability of the results is also presented.
Mendeley saves you time finding and organizing research
There are no full text links
Choose a citation style from the tabs below