Using relative entropy for authorship attribution

Ying Zhao; Justin Zobel; Phil Vines

Conference Proceedings

Using relative entropy for authorship attribution

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4182 LNCS 92-105

DOI: 10.1007/11880592_8

36Citations

68Readers

Get full text

Abstract

Authorship attribution is the task of deciding who wrote a particular document. Several attribution approaches have been proposed in recent research, but none of these approaches is particularly satisfactory; some of them are ad hoc and most have defects in terms of scalability, effectiveness, and efficiency. In this paper, we propose a principled approach motivated from information theory to identify authors based on elements of writing style. We make use of the Kullback-Leibler divergence, a measure of how different two distributions are, and explore several different approaches to tokenizing documents to extract style markers. We use several data collections to examine the performance of our approach. We have found that our proposed approach is as effective as the best existing attribution methods for two class attribution, and is superior for multi-class attribution. It has lower computational cost and is cheaper to train. Finally, our results suggest this approach is a promising alternative for other categorization problems. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Zhao, Y., Zobel, J., & Vines, P. (2006). Using relative entropy for authorship attribution. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4182 LNCS, pp. 92–105). Springer Verlag. https://doi.org/10.1007/11880592_8

Using relative entropy for authorship attribution

Abstract

Cite

Register to see more suggestions