Authorship attribution in russian with new high-performing and fully interpretable morpho-syntactic features

Elena Pimonova; Oleg Durandin; Alexey Malafeev

Conference Proceedings

Authorship attribution in russian with new high-performing and fully interpretable morpho-syntactic features

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11832 LNCS 193-204

DOI: 10.1007/978-3-030-37334-4_18

1Citations

2Readers

Get full text

Abstract

This work tackles the problem of modeling author style in Russian. In particular, we solve the task of authorship attribution using the collected dataset of 30 authors, 1506 texts written in the period of 18th–21st century. We apply various approaches to solving the attribution problem: Random Forest, Logistic Regression, SVM Classifier. In terms of text representation, we use seven models in three language levels: lexis, morphology, and syntax. Most importantly, we propose our own set of morpho-syntactic features that perform on about the same level as doc2vec, but are fully interpretable. The conducted experiments show the effectiveness of their standalone use, as well as the increase in the quality of classification when using these attributes along with the classic doc2vec-based approach. All code, including feature extraction, is made freely available. Additionally, we analyze the performance of individual features as style markers. Finally, we study classification errors in order to identify the patterns in the misattribution of specific authors.

Author supplied keywords

Cite

CITATION STYLE

APA

Pimonova, E., Durandin, O., & Malafeev, A. (2019). Authorship attribution in russian with new high-performing and fully interpretable morpho-syntactic features. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11832 LNCS, pp. 193–204). Springer. https://doi.org/10.1007/978-3-030-37334-4_18

Authorship attribution in russian with new high-performing and fully interpretable morpho-syntactic features

Abstract

Author supplied keywords

Cite

Register to see more suggestions