On-Topic Cover Stories from News Archives

Christian Schulte; Bilyana Taneva; Gerhard Weikum

Conference Proceedings

On-Topic Cover Stories from News Archives

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9022 37-42

DOI: 10.1007/978-3-319-16354-3_4

0Citations

3Readers

Get full text

Abstract

While Web or newspaper archives store large amounts of articles, they also contain a lot of near-duplicate information. Examples include articles about the same event published by multiple news agencies or articles about evolving events that lead to copies of paragraphs to provide background information. To support journalists, who attempt to read all information on a given topic at once, we propose an approach that, given a topic and a text collection, extracts a set of articles with broad coverage of the topic and minimum amount of duplicates. We start by extracting articles related to the input topic and detecting duplicate paragraphs. We keep only one instance from each group of duplicates by using a weighted quadratic optimization problem. It finds the best position for all paragraphs, such that some articles consist mainly of distinct paragraphs and others consist mainly of duplicates. Finally, we present to the reader the articles with more distinct paragraphs. Our experiments show the high precision and recall of our approach.

Cite

CITATION STYLE

APA

Schulte, C., Taneva, B., & Weikum, G. (2015). On-Topic Cover Stories from News Archives. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9022, pp. 37–42). Springer Verlag. https://doi.org/10.1007/978-3-319-16354-3_4

On-Topic Cover Stories from News Archives

Abstract

Cite

Register to see more suggestions