A comparison of Data Driven models of solving the task of gender identification of author in Russian language texts for cases without and with the gender deception

5Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this work we compare several data-driven approaches to the task of author's gender identification for texts with or without gender imitation. The data corpus has been specially gathered with crowdsourcing for this task. The best models are convolutional neural network with input of morphological data (fl-measure: 88%±3) for texts without imitation, and gradient boosting model with vector of character n-grams frequencies as input data (f1-measure: 64% ± 3) for texts with gender imitation. The method to filter the crowdsourced corpus using limited reference sample of texts to increase the accuracy of result is discussed.

Cite

CITATION STYLE

APA

Sboev, A., Moloshnikov, I., Gudovskikh, D., & Rybka, R. (2017). A comparison of Data Driven models of solving the task of gender identification of author in Russian language texts for cases without and with the gender deception. In Journal of Physics: Conference Series (Vol. 937). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/937/1/012046

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free