Improving authorship attribution in twitter through topic-based sampling

3Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Aliases are used as a means of anonymity on the Internet in environments such as IRC (internet relay chat), forums and micro-blogging websites such as Twitter. While there are genuine reasons for the use of aliases, such as journalists operating in politically oppressive countries, they are increasingly being used by cybercriminals and extremist organisations. In recent years, we have seen increased research on authorship attribution of Twitter messages, including authorship analysis of aliases. Previous studies have shown that anti-aliasing of randomly generated sub-aliases yields high accuracies when linking the sub-aliases, but become much less accurate when topic-based sub-aliases are used. N-gram methods have previously been demonstrated to perform better than other methods in this situation. This paper investigates the effect of topic-based sampling on authorship attribution accuracy for the popular micro-blogging website Twitter. Features are extracted using character n-grams, which accurately capture differences in authorship style. These features are analysed using support vector machines using a one-versus-all classifier. The predictive performance of the algorithm is then evaluated using two different sampling methodologies - authors that were sampled through a context-sensitive topic-based search and authors that were sampled randomly. Topic-based sampling of authors is found to produce more accurate authorship predictions. This paper presents several theories as to why this might be the case.

Cite

CITATION STYLE

APA

Pan, L., Gondal, I., & Layton, R. (2017). Improving authorship attribution in twitter through topic-based sampling. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10400 LNAI, pp. 250–261). Springer Verlag. https://doi.org/10.1007/978-3-319-63004-5_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free