Most existing research on authorship attribution uses various types of lexical, syntactic, and structural features for classification. Some of these features are not meaningful for small texts such as email messages. In this paper we demonstrate a very effective use of a syntactic feature of an author’s writing - text’s parse tree characteristics – for authorship analysis of email messages. We define author templates consisting of context free grammar (CFG) production frequencies occurring in an author’s training set of email messages. We then use similar frequencies extracted from a new email message to match against various authors’ templates to identify the best match. We evaluate our approach on Enron email dataset and show that CFG production frequencies work very well and are robust in attributing the authorship of email messages.
CITATION STYLE
Patchala, J., Bhatnagar, R., & Gopalakrishnan, S. (2015). Author attribution of email messages using parse-tree features. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9166, pp. 313–327). Springer Verlag. https://doi.org/10.1007/978-3-319-21024-7_21
Mendeley helps you to discover research relevant for your work.