Fooling MOSS Detection with Pretrained Language Models

Stella Biderman; Edward Raff

Conference ProceedingsOPEN ACCESS

Fooling MOSS Detection with Pretrained Language Models

International Conference on Information and Knowledge Management, Proceedings (2022) 2933-2943

DOI: 10.1145/3511808.3557079

23Citations

41Readers

Get full text

Abstract

As artificial intelligence (AI) technologies become increasingly powerful and prominent in society, their misuse is a growing concern. In educational settings, AI technologies could be used by students to cheat on assignments and exams. In this paper we explore whether transformers can be used to solve introductory level programming assignments while bypassing commonly used AI tools to detect similarities between pieces of software. We find that a student using GPT-J [60] can complete introductory level programming assignments without triggering suspicion from MOSS [2], a widely used software similarity and plagiarism detection tool. This holds despite the fact that GPT-J was not trained on the problems in question and is not provided with any examples to work from. We further find that the code written by GPT-J is diverse in structure, lacking any particular tells that future plagiarism detection techniques may use to try to identify algorithmically generated code. We conclude with a discussion of the ethical and educational implications of large language models and directions for future research.

Author supplied keywords

Cite

CITATION STYLE

APA

Biderman, S., & Raff, E. (2022). Fooling MOSS Detection with Pretrained Language Models. In International Conference on Information and Knowledge Management, Proceedings (pp. 2933–2943). Association for Computing Machinery. https://doi.org/10.1145/3511808.3557079

Fooling MOSS Detection with Pretrained Language Models

Abstract

Author supplied keywords

Cite

Register to see more suggestions