The Plagiarism Singularity Conjecture

Sriram Ranga; Rui Mao; Erik Cambria; Anupam Chattopadhyay

Conference ProceedingsOPEN ACCESS

The Plagiarism Singularity Conjecture

Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025 (2025) 1 10245-10255

DOI: 10.18653/v1/2025.naacl-long.514

1Citations

7Readers

Get full text

Abstract

Large language models (LLMs) have replaced the metaphorical monkeys in the “infinite monkeys” thought experiment with machines that mirror human writing. With LLMs being used to generate content at an unprecedented scale, concerns over their misuse and the saturation of the content space with artificially generated material are growing. We foresee a point in the future where a vast majority of all the possible text in a given language would have already been generated, leading to a “Plagiarism Singularity". In this paper, we provide predictions on how far we are from this singularity in the form of an estimate of the volume of content that needs to be generated to reach this singularity. We use an LLM to calculate the probability distribution of sentences in the English language collected from a large dataset. We then estimate the minimum number of sentences to be generated to cover different percentiles of the probability mass of the set of all sentences, assuming they follow the calculated distribution, by treating the problem as an instance of the coupon collector's problem. We find that breaching the standard 20% plagiarism limit would only need around 1030 sentences to be generated, which we estimate to happen in approximately 40 years from now.

Cite

CITATION STYLE

APA

Ranga, S., Mao, R., Cambria, E., & Chattopadhyay, A. (2025). The Plagiarism Singularity Conjecture. In Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025 (Vol. 1, pp. 10245–10255). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2025.naacl-long.514

The Plagiarism Singularity Conjecture

Abstract

Cite

Register to see more suggestions