Authorship attribution in russian with new high-performing and fully interpretable morpho-syntactic features

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This work tackles the problem of modeling author style in Russian. In particular, we solve the task of authorship attribution using the collected dataset of 30 authors, 1506 texts written in the period of 18th–21st century. We apply various approaches to solving the attribution problem: Random Forest, Logistic Regression, SVM Classifier. In terms of text representation, we use seven models in three language levels: lexis, morphology, and syntax. Most importantly, we propose our own set of morpho-syntactic features that perform on about the same level as doc2vec, but are fully interpretable. The conducted experiments show the effectiveness of their standalone use, as well as the increase in the quality of classification when using these attributes along with the classic doc2vec-based approach. All code, including feature extraction, is made freely available. Additionally, we analyze the performance of individual features as style markers. Finally, we study classification errors in order to identify the patterns in the misattribution of specific authors.

Cite

CITATION STYLE

APA

Pimonova, E., Durandin, O., & Malafeev, A. (2019). Authorship attribution in russian with new high-performing and fully interpretable morpho-syntactic features. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11832 LNCS, pp. 193–204). Springer. https://doi.org/10.1007/978-3-030-37334-4_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free