Evaluating the Performance of Text Mining Systems on Real-world Press Archives

  • Paaß G
  • de Vries H
N/ACitations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We investigate the performance of text mining systems for annotating press articles in two real-world press archives. Seven commercial systems are tested which recover the categories of a document as well named entities and catchphrases. Using cross-validation we evaluate the precision-recall characteristic. Depending on the depth of the category tree 39–79% breakeven is achieved. For one corpus 45% of the documents can be classified automatically, based on the system’s confidence estimates. In a usability experiment the formal evaluation results are confirmed. It turns out that with respect to some features human annotators exhibit a lower performance than the text mining systems. This establishes a convincing argument to use text mining systems to support indexing of large document collections.

Cite

CITATION STYLE

APA

Paaß, G., & de Vries, H. (2006). Evaluating the Performance of Text Mining Systems on Real-world Press Archives (pp. 414–421). https://doi.org/10.1007/3-540-31314-1_50

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free