A zipf-like distant supervision approach for multi-document summarization using wikinews articles

3Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This work presents a sentence ranking strategy based on distant supervision for the multi-document summarization problem. Due to the difficulty of obtaining large training datasets formed by document clusters and their respective human-made summaries, we propose building a training and a testing corpus from Wikinews. Wikinews articles are modeled as distant summaries of their cited sources, considering that first sentences of Wikinews articles tend to summarize the event covered in the news story. Sentences from cited sources are represented as tuples of numerical features and labeled according to a relationship with the given distant summary that is based on the Zipf law. Ranking functions are trained using linear regressions and ranking SVMs, which are also combined using Borda count. Top ranked sentences are concatenated and used to build summaries, which are compared with the first sentences of the distant summary using ROUGE evaluation measures. Experimental results obtained show the effectiveness of the proposed method and that the combination of different ranking techniques outperforms the quality of the generated summary. © 2012 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Bravo-Marquez, F., & Manriquez, M. (2012). A zipf-like distant supervision approach for multi-document summarization using wikinews articles. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7608 LNCS, pp. 143–154). Springer Verlag. https://doi.org/10.1007/978-3-642-34109-0_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free