Improved evaluation framework for complex plagiarism detection

Anton Belyy; Marina Dubova; Dmitry Nekrasov

Conference ProceedingsOPEN ACCESS

Improved evaluation framework for complex plagiarism detection

ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (2018) 2 157-162

DOI: 10.18653/v1/p18-2026

3Citations

91Readers

Abstract

Plagiarism is a major issue in science and education. It appears in various forms, starting from simple copying and ending with intelligent paraphrasing and summarization. Complex plagiarism, such as plagiarism of ideas, is hard to detect, and therefore it is especially important to track improvement of methods correctly and not to overfit to the structure of particular datasets. In this paper, we study the performance of plagdet, the main measure for Plagiarism Detection Systems evaluation, on manually paraphrased plagiarism datasets (such as PAN Summary). We reveal its fallibility under certain conditions and propose an evaluation framework with normalization of inner terms, which is resilient to the dataset imbalance. We conclude with the experimental justification of the proposed measure. The implementation of the new framework is made publicly available as a Github repository.

Cite

CITATION STYLE

APA

Belyy, A., Dubova, M., & Nekrasov, D. (2018). Improved evaluation framework for complex plagiarism detection. In ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (Vol. 2, pp. 157–162). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p18-2026

Improved evaluation framework for complex plagiarism detection

Abstract

Cite

Register to see more suggestions