Compilation of a Spanish representative corpus

Alexander Gelbukh; Grigori Sidorov; Liliana Chanona-Hernández

Conference Proceedings

Compilation of a Spanish representative corpus

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2276 285-288

DOI: 10.1007/3-540-45715-1_27

10Citations

5Readers

Get full text

Abstract

Due to the Zipf law, even a very large corpus contains very few occurrences (tokens) for the majority of its different words (types). Only a corpus containing enough occurrences of even rare words can provide necessary statistical information for the study of contextual usage of words. We call such corpus representative and suggest to use Internet for its compilation. The corresponding algorithm and its application to Spanish are described. Different concepts of a representative corpus are discussed.

Cite

CITATION STYLE

APA

Gelbukh, A., Sidorov, G., & Chanona-Hernández, L. (2002). Compilation of a Spanish representative corpus. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2276, pp. 285–288). Springer Verlag. https://doi.org/10.1007/3-540-45715-1_27

Compilation of a Spanish representative corpus

Abstract

Cite

Register to see more suggestions