Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap

George Mikros; Dimitris Boumparis

Journal ArticleOPEN ACCESS

Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap

Digital Scholarship in the Humanities (2024) 39(3) 954-967

DOI: 10.1093/llc/fqae028

1Citations

13Readers

Abstract

This study explores the feasibility of cross-linguistic authorship attribution and the author's gender identification using Machine Translation (MT). Computational stylistics experiments were conducted on a Greek blog corpus translated into English using Google's Neural MT. A Random Forest algorithm was employed for authorship and gender profiling, using different feature groups [Author's Multilevel N-gram Profiles, quantitative linguistics (QL), and cross-lingual word embeddings (CLWE)] in both original and translated texts. Results indicate that MT is a viable method for converting a multilingual corpus into one language for authorship attribution and gender profiling research, with considerable accuracy when training and testing datasets use identical language. In the pure cross-linguistic scenario, higher accuracies than the baselines were obtained using CLWE and QL features.

Author supplied keywords

Cite

CITATION STYLE

APA

Mikros, G., & Boumparis, D. (2024). Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap. Digital Scholarship in the Humanities, 39(3), 954–967. https://doi.org/10.1093/llc/fqae028

Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap

Abstract

Author supplied keywords

Cite

Register to see more suggestions