Developing a Benchmark for Reducing Data Bias in Authorship Attribution

9Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Authorship attribution is the task of assigning an unknown document to an author from a set of candidates. In the past, studies in this field use various evaluation datasets to demonstrate the effectiveness of preprocessing steps, features, and models. However, only a small fraction of works use more than one dataset to prove claims. In this paper, we present a collection of highly diverse authorship attribution datasets, which better generalizes evaluation results from authorship attribution research. Furthermore, we implement a wide variety of previously used machine learning models and show that many approaches show vastly different performances when applied to different datasets. We include pre-trained language models, for the first time testing them in this field in a systematic way. Finally, we propose a set of aggregated scores to evaluate different aspects of the dataset collection.

Cite

CITATION STYLE

APA

Murauer, B., & Specht, G. (2021). Developing a Benchmark for Reducing Data Bias in Authorship Attribution. In Eval4NLP 2021 - Evaluation and Comparison of NLP Systems, Proceedings of the 2nd Workshop (pp. 179–188). Association for Computational Linguistics (ACL). https://doi.org/10.26615/978-954-452-056-4_018

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free