Authorship attribution using content based features and N-gram features

Citations of this article
Mendeley users who have this article in their library.
Get full text


The internet is increasing exponentially with textual content primarily through social websites. The problems were also increasing with anonymous textual data in the internet. The researchers are searching for alternative techniques to know the author of an unknown document. Authorship Attribution is one such technique to predict the details of an unknown document. The researchers extracted various classes of stylistic features like character, lexical, syntactic, structural, content and semantic features to distinguish the authors writing style. In this work, the experiment performed with most frequent content specific features, n-grams of character, word and POS tags. A standard dataset is used for experimentation and identified that the combination of content based and n-gram features achieved best accuracy for prediction of author. Two standard classification algorithms were used for author prediction. The Random forest classifier attained best accuracy for prediction of author when compared with Naïve Bayes Multinomial classifier. The achieved results were good compared to many existing solutions to the Authorship Attribution.




Dara, R., & Raghunadha Reddy, T. (2019). Authorship attribution using content based features and N-gram features. International Journal of Engineering and Advanced Technology, 9(1), 1152–1156.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free