Author attribution of email messages using parse-tree features

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Most existing research on authorship attribution uses various types of lexical, syntactic, and structural features for classification. Some of these features are not meaningful for small texts such as email messages. In this paper we demonstrate a very effective use of a syntactic feature of an author’s writing - text’s parse tree characteristics – for authorship analysis of email messages. We define author templates consisting of context free grammar (CFG) production frequencies occurring in an author’s training set of email messages. We then use similar frequencies extracted from a new email message to match against various authors’ templates to identify the best match. We evaluate our approach on Enron email dataset and show that CFG production frequencies work very well and are robust in attributing the authorship of email messages.

Cite

CITATION STYLE

APA

Patchala, J., Bhatnagar, R., & Gopalakrishnan, S. (2015). Author attribution of email messages using parse-tree features. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9166, pp. 313–327). Springer Verlag. https://doi.org/10.1007/978-3-319-21024-7_21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free