Improving cross-topic authorship attribution: The role of pre-processing

Ilia Markov; Efstathios Stamatatos; Grigori Sidorov

Conference Proceedings

Improving cross-topic authorship attribution: The role of pre-processing

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10762 LNCS 289-302

DOI: 10.1007/978-3-319-77116-8_21

17Citations

42Readers

Get full text

Abstract

The effectiveness of character n-gram features for representing the stylistic properties of a text has been demonstrated in various independent Authorship Attribution (AA) studies. Moreover, it has been shown that some categories of character n-grams perform better than others both under single and cross-topic AA conditions. In this work, we present an improved algorithm for cross-topic AA. We demonstrate that the effectiveness of character n-grams representation can be significantly enhanced by performing simple pre-processing steps and appropriately tuning the number of features, especially in cross-topic conditions.

Author supplied keywords

Cite

CITATION STYLE

APA

Markov, I., Stamatatos, E., & Sidorov, G. (2018). Improving cross-topic authorship attribution: The role of pre-processing. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10762 LNCS, pp. 289–302). Springer Verlag. https://doi.org/10.1007/978-3-319-77116-8_21

Improving cross-topic authorship attribution: The role of pre-processing

Abstract

Author supplied keywords

Cite

Register to see more suggestions