Abstract
We propose using active learning for tagging extractive reference summary of lecture speech. The training process of feature-based summarization model usually requires a large amount of training data with high-quality reference summaries. Human production of such summaries is tedious, and since inter-labeler agreement is low, very unreliable. Active learning helps assuage this problem by automatically selecting a small amount of unlabeled documents for humans to hand correct. Our method chooses the unlabeled documents according to the similarity score between the document and the comparable resource—PowerPoint slides. After manual correction, the selected documents are returned to the training pool. Summarization results show an increasing learning curve of ROUGE-L F-measure, from 0.44 to 0.514, consistently higher than that of using randomly chosen training samples.
Author supplied keywords
Cite
CITATION STYLE
Zhang, J. J., & Fung, P. (2009). Active Learning of Extractive Reference Summaries for Lecture Speech Summarization. In BUCC 2009 - 2nd Workshop on Building and Using Comparable Corpora: From Parallel to Non-Parallel Corpora at the ACL-IJCNLP 2009 - Proceedings (pp. 23–26). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1690339.1690346
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.