The Topic Confusion Task: A Novel Evaluation Scenario for Authorship Attribution

15Citations
Citations of this article
56Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Authorship attribution is the problem of identifying the most plausible author of an anonymous text from a set of candidate authors. Researchers have investigated same-topic and cross-topic scenarios of authorship attribution, which differ according to whether new, unseen topics are used in the testing phase. However, neither scenario allows us to explain whether errors are caused by a failure to capture authorship writing style or by a topic shift. Motivated by this, we propose the topic confusion task where we switch the author-topic configuration between the training and testing sets. This setup allows us to distinguish two types of errors: those caused by the topic shift and those caused by the features' inability to capture the writing styles. We show that stylometric features with part-of-speech tags are the least susceptible to topic variations. We further show that combining them with other features leads to significantly lower topic confusion and higher attribution accuracy. Finally, we show that pretrained language models such as BERT and RoBERTa perform poorly on this task and are surpassed by simple features such as word-level n-grams.

Cite

CITATION STYLE

APA

Altakrori, M. H., Cheung, J. C. K., & Fung, B. C. M. (2021). The Topic Confusion Task: A Novel Evaluation Scenario for Authorship Attribution. In Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (pp. 4242–4256). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-emnlp.359

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free