The class imbalance problem in construction of training datasets for authorship attribution

Urszula Stánczyk

Conference Proceedings

The class imbalance problem in construction of training datasets for authorship attribution

Stánczyk U

Advances in Intelligent Systems and Computing (2016) 391 535-547

DOI: 10.1007/978-3-319-23437-3_46

13Citations

6Readers

Get full text

Abstract

The paper presents research on class imbalance in the context of construction of training sets for authorship recognition. In experiments the sets are artificially imbalanced, then balanced by under-sampling and over-sampling. The prepared sets are used in learning of two predictors: connectionist and rule-based, and their performance observed. The tests show that for artificial neural networks in several cases the predictive accuracy is not degraded but in fact improved, while one rule classifier is highly sensitive to class balance as it never performs better than for the original balanced set and in many cases worse.

Author supplied keywords

Cite

CITATION STYLE

APA

Stánczyk, U. (2016). The class imbalance problem in construction of training datasets for authorship attribution. In Advances in Intelligent Systems and Computing (Vol. 391, pp. 535–547). Springer Verlag. https://doi.org/10.1007/978-3-319-23437-3_46

The class imbalance problem in construction of training datasets for authorship attribution

Abstract

Author supplied keywords

Cite

Register to see more suggestions