Identifying the original contribution of a document via language modeling

7Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

One major goal of text mining is to provide automatic methods to help humans grasp the key ideas in ever-increasing text corpora. To this effect, we propose a statistically well-founded method for identifying the original ideas that a document contributes to a corpus, focusing on self-referential diachronic corpora such as research publications, blogs, email, and news articles. Our statistical model of passage impact defines (interesting) original content through a combination of impact and novelty, and the model is used to identify each document's most original passages. Unlike heuristic approaches, the statistical model is extensible and open to analysis. We evaluate the approach both on synthetic data and on real data in the domains of research publications and news, showing that the passage impact model outperforms a heuristic baseline method. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Shaparenko, B., & Joachims, T. (2009). Identifying the original contribution of a document via language modeling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5782 LNAI, pp. 350–365). https://doi.org/10.1007/978-3-642-04174-7_23

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free