A comparison of Data Driven models of solving the task of gender identification of author in Russian language texts for cases without and with the gender deception

A. Sboev; I. Moloshnikov; D. Gudovskikh; R. Rybka

Conference ProceedingsOPEN ACCESS

A comparison of Data Driven models of solving the task of gender identification of author in Russian language texts for cases without and with the gender deception

Journal of Physics: Conference Series (2017) 937(1)

DOI: 10.1088/1742-6596/937/1/012046

5Citations

7Readers

Abstract

In this work we compare several data-driven approaches to the task of author's gender identification for texts with or without gender imitation. The data corpus has been specially gathered with crowdsourcing for this task. The best models are convolutional neural network with input of morphological data (fl-measure: 88%±3) for texts without imitation, and gradient boosting model with vector of character n-grams frequencies as input data (f1-measure: 64% ± 3) for texts with gender imitation. The method to filter the crowdsourced corpus using limited reference sample of texts to increase the accuracy of result is discussed.

Cite

CITATION STYLE

APA

Sboev, A., Moloshnikov, I., Gudovskikh, D., & Rybka, R. (2017). A comparison of Data Driven models of solving the task of gender identification of author in Russian language texts for cases without and with the gender deception. In Journal of Physics: Conference Series (Vol. 937). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/937/1/012046

A comparison of Data Driven models of solving the task of gender identification of author in Russian language texts for cases without and with the gender deception

Abstract

Cite

Register to see more suggestions