CAG : Stylometric Authorship Attribution of Multi-Author Documents Using a Co-Authorship Graph

21Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Stylometry has been successfully applied to perform authorship identification of single-author documents (AISD). The AISD task is concerned with identifying the original author of an anonymous document from a group of candidate authors. However, AISD techniques are not applicable to the authorship identification of multi-author documents (AIMD). Unlike AISD, where each document is written by one single author, AIMD focuses on handling multi-author documents. Due to the combinatoric nature of documents, AIMD lacks the ground truth information - that is, information on writing and non-writing authors in a multi-author document - which makes this problem more challenging to solve. Previous AIMD solutions have a number of limitations: (i) the best stylometry-based AIMD solution has a low accuracy, less than 30%; (ii) increasing the number of co-authors of papers adversely affects the performance of AIMD solutions; and (iii) AIMD solutions were not designed to handle the non-writing authors (NWAs). However, NWAs exist in real-world cases - that is, there are papers for which not every co-author listed has contributed as a writer. This paper proposes an AIMD framework called the Co-Authorship Graph that can be used to (i) capture the stylistic information of each author in a corpus of multi-author documents and (ii) make a multi-label prediction for a multi-author query document. We conducted extensive experimental studies on one synthetic and three real-world corpora. Experimental results show that our proposed framework (i) significantly outperformed competitive techniques; (ii) can effectively handle a larger number of co-authors in comparison with competitive techniques; and (iii) can effectively handle NWAs in multi-author documents.

References Powered by Scopus

Comparing Images Using the Hausdorff Distance

3734Citations
N/AReaders
Get full text

ML-KNN: A lazy learning approach to multi-label learning

3198Citations
N/AReaders
Get full text

Multi-label classification: An overview

2254Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Sentiment analysis for Urdu online reviews using deep learning models

35Citations
N/AReaders
Get full text

Webometrics: evolution of social media presence of universities

24Citations
N/AReaders
Get full text

Authorship classification in a resource constraint language using convolutional neural networks

19Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Sarwar, R., Urailertprasert, N., Vannaboot, N., Yu, C., Rakthanmanon, T., Chuangsuwanich, E., & Nutanong, S. (2020). CAG : Stylometric Authorship Attribution of Multi-Author Documents Using a Co-Authorship Graph. IEEE Access, 8, 18374–18393. https://doi.org/10.1109/ACCESS.2020.2967449

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 6

46%

Professor / Associate Prof. 4

31%

Researcher 3

23%

Readers' Discipline

Tooltip

Computer Science 8

67%

Engineering 2

17%

Nursing and Health Professions 1

8%

Social Sciences 1

8%

Save time finding and organizing research with Mendeley

Sign up for free