Russian-language question classification: A new typology and first results

2Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper deals with automatic classification of questions in the Russian language, a natural early step in building a question answering system. We developed a typology of Russian questions using interrogative particles, pronouns and word order as the main features. A corpus of 2008 questions was manually compiled and annotated according to our typology. We used a fine-grained class set and a coarse-grained one (23 and 14 classes, respectively). The training data, represented as character bi-/trigrams and word uni-/bi-/trigrams, was used to approach the task of question classification. We tested several widely used machine-learning methods (logistic regression, support vector machines, naïve Bayes) against a regular expression baseline on a held-out test corpus annotated by an external expert. The best results were achieved by a SVM classifier (linear kernel) that achieved the accuracy of 65.3% (fine-grained) and 68.7% (coarse-grained), while the baseline regular expression model showed 52.7% accuracy.

Cite

CITATION STYLE

APA

Nikolaev, K., & Malafeev, A. (2018). Russian-language question classification: A new typology and first results. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10716 LNCS, pp. 72–81). Springer Verlag. https://doi.org/10.1007/978-3-319-73013-4_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free