Stylometric features for authorship attribution of Polish texts

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Authorship attribution aims at distinguishing texts written by different authors using text features representing their styles. In this paper we investigate stylometric features for the Polish language based on Part of Speech (POS) tagging (including POS bigrams) and function words. Due to high inflection level of Polish language the feature space tends to be very large. This in particular concerns POS n-grams. Focusing on POS bigrams, we propose their simplified representation allowing to keep the feature space compact. We report experiments, in which authorship attribution was conducted for varying in lengths documents, with use of classifiers from the Weka library. We evaluate classification results for combinations of the following features: POS tags, POS bigrams, function words and simple document statistics. Experiments indicate that the developed features provide good classification performance.

Cite

CITATION STYLE

APA

Szwed, P. (2017). Stylometric features for authorship attribution of Polish texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10246 LNAI, pp. 171–182). Springer Verlag. https://doi.org/10.1007/978-3-319-59060-8_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free