Stylometric features for authorship attribution of Polish texts

Piotr Szwed

Conference Proceedings

Stylometric features for authorship attribution of Polish texts

Szwed P

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10246 LNAI 171-182

DOI: 10.1007/978-3-319-59060-8_17

1Citations

4Readers

Get full text

Abstract

Authorship attribution aims at distinguishing texts written by different authors using text features representing their styles. In this paper we investigate stylometric features for the Polish language based on Part of Speech (POS) tagging (including POS bigrams) and function words. Due to high inflection level of Polish language the feature space tends to be very large. This in particular concerns POS n-grams. Focusing on POS bigrams, we propose their simplified representation allowing to keep the feature space compact. We report experiments, in which authorship attribution was conducted for varying in lengths documents, with use of classifiers from the Weka library. We evaluate classification results for combinations of the following features: POS tags, POS bigrams, function words and simple document statistics. Experiments indicate that the developed features provide good classification performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Szwed, P. (2017). Stylometric features for authorship attribution of Polish texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10246 LNAI, pp. 171–182). Springer Verlag. https://doi.org/10.1007/978-3-319-59060-8_17

Stylometric features for authorship attribution of Polish texts

Abstract

Author supplied keywords

Cite

Register to see more suggestions