Against background of the growing need of information, which for language used to be supplied in a rather limited way, the new solution found in language corpora and the way how this has been implemented is outlined and discussed. For the Czech language, this solution has materialized in the 100 million representative Czech National Corpus (CNC, 2000). In the following, a brief tour is offered through various stages of its build-up, characterizing both various corpora within CNC and giving some figures about proportions of various types of language represented. The last part of the contribution sets a minimal programme for further research and desiderata to be followed in general in this branch of important and international stream of modern science.
CITATION STYLE
Čermák, F. (2001). Language corpora: The Czech case. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2166, pp. 21–30). Springer Verlag. https://doi.org/10.1007/3-540-44805-5_3
Mendeley helps you to discover research relevant for your work.