Building Large Resources for Text Mining: The Leipzig Corpora Collection

Uwe Quasthoff; Dirk Goldhahn; Thomas Eckart

Book Chapter

Building Large Resources for Text Mining: The Leipzig Corpora Collection

Quasthoff U
Goldhahn D
Eckart T

DOI: 10.1007/978-3-319-12655-5_1

N/ACitations

12Readers

Get full text

Abstract

Many text mining algorithms and applications require the availability of large text corpora and certain statistics-based annotations. To ensure comparability of results a standardized corpus building process is required. Particularly noteworthy are all pre-processing procedures as they are crucial for the quality of the resulting data stock. This quality can be estimated by both evaluating the corpus building process and by statistical quality measurements on the corpus. Some of these approaches are described using the example of the Leipzig Corpora Collection.

Cite

CITATION STYLE

APA

Quasthoff, U., Goldhahn, D., & Eckart, T. (2014). Building Large Resources for Text Mining: The Leipzig Corpora Collection (pp. 3–24). https://doi.org/10.1007/978-3-319-12655-5_1

Building Large Resources for Text Mining: The Leipzig Corpora Collection

Abstract

Cite

Register to see more suggestions