Russian-language question classification: A new typology and first results

Kirill Nikolaev; Alexey Malafeev

Conference Proceedings

Russian-language question classification: A new typology and first results

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10716 LNCS 72-81

DOI: 10.1007/978-3-319-73013-4_7

2Citations

4Readers

Get full text

Abstract

This paper deals with automatic classification of questions in the Russian language, a natural early step in building a question answering system. We developed a typology of Russian questions using interrogative particles, pronouns and word order as the main features. A corpus of 2008 questions was manually compiled and annotated according to our typology. We used a fine-grained class set and a coarse-grained one (23 and 14 classes, respectively). The training data, represented as character bi-/trigrams and word uni-/bi-/trigrams, was used to approach the task of question classification. We tested several widely used machine-learning methods (logistic regression, support vector machines, naïve Bayes) against a regular expression baseline on a held-out test corpus annotated by an external expert. The best results were achieved by a SVM classifier (linear kernel) that achieved the accuracy of 65.3% (fine-grained) and 68.7% (coarse-grained), while the baseline regular expression model showed 52.7% accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Nikolaev, K., & Malafeev, A. (2018). Russian-language question classification: A new typology and first results. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10716 LNCS, pp. 72–81). Springer Verlag. https://doi.org/10.1007/978-3-319-73013-4_7

Russian-language question classification: A new typology and first results

Abstract

Author supplied keywords

Cite

Register to see more suggestions