Google books ngram: Problems of representativeness and data reliability

6Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The article discusses representativeness of Google Books Ngram as a multi-purpose corpus. Criticism of the corpus is analysed and discussed. A comparative study of the GBN data and the data obtained using the Russian National Corpus and the General Internet Corpus of Russian is performed to show that the Google Books Ngram corpus can be successfully used for corpus-based studies. A new concept “diachronically balanced corpus” is introduced. Besides, the article describes the problems of word spelling and metadata errors presented in the GBN corpus and proposes possible ways of improving quality of the GBN data.

Cite

CITATION STYLE

APA

Solovyev, V. D., Bochkarev, V. V., & Akhtyamova, S. S. (2020). Google books ngram: Problems of representativeness and data reliability. In Communications in Computer and Information Science (Vol. 1223 CCIS, pp. 147–162). Springer. https://doi.org/10.1007/978-3-030-51913-1_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free