Detecting machine-obfuscated plagiarism

Tomáš Foltýnek; Terry Ruas; Philipp Scharpf; Norman Meuschke; Moritz Schubotz; William Grosky; Bela Gipp

Conference Proceedings

Detecting machine-obfuscated plagiarism

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12051 LNCS 816-827

DOI: 10.1007/978-3-030-43687-2_68

12Citations

19Readers

Get full text

Abstract

Research on academic integrity has identified online paraphrasing tools as a severe threat to the effectiveness of plagiarism detection systems. To enable the automated identification of machine-paraphrased text, we make three contributions. First, we evaluate the effectiveness of six prominent word embedding models in combination with five classifiers for distinguishing human-written from machine-paraphrased text. The best performing classification approach achieves an accuracy of 99.0% for documents and 83.4% for paragraphs. Second, we show that the best approach outperforms human experts and established plagiarism detection systems for these classification tasks. Third, we provide a Web application that uses the best performing classification approach to indicate whether a text underwent machine-paraphrasing. The data and code of our study are openly available.

Author supplied keywords

Cite

CITATION STYLE

APA

Foltýnek, T., Ruas, T., Scharpf, P., Meuschke, N., Schubotz, M., Grosky, W., & Gipp, B. (2020). Detecting machine-obfuscated plagiarism. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12051 LNCS, pp. 816–827). Springer. https://doi.org/10.1007/978-3-030-43687-2_68

Detecting machine-obfuscated plagiarism

Abstract

Author supplied keywords

Cite

Register to see more suggestions