Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap

1Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This study explores the feasibility of cross-linguistic authorship attribution and the author's gender identification using Machine Translation (MT). Computational stylistics experiments were conducted on a Greek blog corpus translated into English using Google's Neural MT. A Random Forest algorithm was employed for authorship and gender profiling, using different feature groups [Author's Multilevel N-gram Profiles, quantitative linguistics (QL), and cross-lingual word embeddings (CLWE)] in both original and translated texts. Results indicate that MT is a viable method for converting a multilingual corpus into one language for authorship attribution and gender profiling research, with considerable accuracy when training and testing datasets use identical language. In the pure cross-linguistic scenario, higher accuracies than the baselines were obtained using CLWE and QL features.

Cite

CITATION STYLE

APA

Mikros, G., & Boumparis, D. (2024). Cross-linguistic authorship attribution and gender profiling. Machine translation as a method for bridging the language gap. Digital Scholarship in the Humanities, 39(3), 954–967. https://doi.org/10.1093/llc/fqae028

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free