Authorship attribution in portuguese using character N-grams

22Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

Abstract

For the Authorship Attribution (AA) task, character n-grams are considered among the best predictive features. In the English language, it has also been shown that some types of character n-grams perform better than others. This paper tackles the AA task in Portuguese by examining the performance of different types of character n-grams, and various combinations of them. The paper also experiments with different feature representations and machine-learning algorithms. Moreover, the paper demonstrates that the performance of the character n-gram approach can be improved by fine-tuning the feature set and by appropriately selecting the length and type of character n-grams. This relatively simple and language-independent approach to the AA task outperforms both a bag-of-words baseline and other approaches, using the same corpus.

Cite

CITATION STYLE

APA

Markov, I., Baptista, J., & Pichardo-Lagunas, O. (2017). Authorship attribution in portuguese using character N-grams. Acta Polytechnica Hungarica, 14(3), 59–78. https://doi.org/10.12700/APH.14.3.2017.3.4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free