Building Large Resources for Text Mining: The Leipzig Corpora Collection

  • Quasthoff U
  • Goldhahn D
  • Eckart T
N/ACitations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Many text mining algorithms and applications require the availability of large text corpora and certain statistics-based annotations. To ensure comparability of results a standardized corpus building process is required. Particularly noteworthy are all pre-processing procedures as they are crucial for the quality of the resulting data stock. This quality can be estimated by both evaluating the corpus building process and by statistical quality measurements on the corpus. Some of these approaches are described using the example of the Leipzig Corpora Collection.

Cite

CITATION STYLE

APA

Quasthoff, U., Goldhahn, D., & Eckart, T. (2014). Building Large Resources for Text Mining: The Leipzig Corpora Collection (pp. 3–24). https://doi.org/10.1007/978-3-319-12655-5_1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free