Text categorization for authorship attribution in english poetry

2Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Authorship attribution could be considered as style-based text categorization problem. This paper presents an empirical study of performing style-based poetry categorization with the bag-of-words representation on 406 same theme English poems of five poets from World War I era. We investigated the impact of applying stop-words removal, stemming, and feature selection methods on the categorization performance of Support Vector Machine and Naïve Bayes Classifier. We found that these two models achieve best performance when stop-words removal and stemming are not applied on the training datasets, and the performance of Naïve Bayes Classifier is improved by performing feature selection methods. We also compared the best categorization performance of the bag-of-words representation with that of the stylometric representation including lexical features, such as function words and high frequency words, and found that the bag-of-words representation outperforms the stylometric representation.

Cite

CITATION STYLE

APA

Gallagher, C., & Li, Y. (2019). Text categorization for authorship attribution in english poetry. In Advances in Intelligent Systems and Computing (Vol. 858, pp. 249–261). Springer Verlag. https://doi.org/10.1007/978-3-030-01174-1_19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free