Evaluating the Performance of Text Mining Systems on Real-world Press Archives

Gerhard Paaß; Hugo de Vries

Book Chapter

Evaluating the Performance of Text Mining Systems on Real-world Press Archives

Paaß G
de Vries H

DOI: 10.1007/3-540-31314-1_50

N/ACitations

5Readers

Get full text

Abstract

We investigate the performance of text mining systems for annotating press articles in two real-world press archives. Seven commercial systems are tested which recover the categories of a document as well named entities and catchphrases. Using cross-validation we evaluate the precision-recall characteristic. Depending on the depth of the category tree 39–79% breakeven is achieved. For one corpus 45% of the documents can be classified automatically, based on the system’s confidence estimates. In a usability experiment the formal evaluation results are confirmed. It turns out that with respect to some features human annotators exhibit a lower performance than the text mining systems. This establishes a convincing argument to use text mining systems to support indexing of large document collections.

Cite

CITATION STYLE

APA

Paaß, G., & de Vries, H. (2006). Evaluating the Performance of Text Mining Systems on Real-world Press Archives (pp. 414–421). https://doi.org/10.1007/3-540-31314-1_50

Evaluating the Performance of Text Mining Systems on Real-world Press Archives

Abstract

Cite

Register to see more suggestions