Recognition of parts of speech using the vector of bigram frequencies

8Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper describes how to automatically recognize parts of speech and other grammatical categories of a word such as gender and number. Unlike some previous works, the vector of syntactic bigram frequencies (including the considered word) is used as the source data for recognition of parts of speech and the grammatical categories. Data on frequencies of syntactic bigrams were obtained from the Russian sub-corpus of Google Books Ngram. We used part–of–speech tags available in Google Books Ngram, as well as data on parts of speech and grammatical categories of words obtained from the electronic dictionary Open Corpora. To train the model, we selected words from the list of 100.000 most frequent words that don’t have homonyms and are found in both Google Books Ngram and Open Corpora. A multilayer perceptron with an output layer of the softmax type was used as a recognizer. The vector of frequencies of syntactic bigrams including the test word and one of the 10.000 most frequent words was at the inputs of the network. The neural network was trained by the criterion of minimum cross–entropy. When recognizing parts of speech on the test sample, the average recognition accuracy was 99.1%. Nouns and verbs were recognized best of all (with the accuracy of 99.77% and 99.62%, respectively). The recognition accuracy of the word number was 99.61%. The achieved recognition accuracy of the word gender was substantially lower, it was just 91.9%.

Cite

CITATION STYLE

APA

Khristoforov, S., Bochkarev, V., & Shevlyakova, A. (2020). Recognition of parts of speech using the vector of bigram frequencies. In Communications in Computer and Information Science (Vol. 1086, pp. 132–142). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-39575-9_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free