The focus of the paper is to improve intrinsic plagiarism detection. The paper investigates and improves performance of character n-grams profiles method proposed by Stamatatos by tuning its parameter settings and proposing new modifications and rich feature sets. We raised the overall plagdet score from 24.67% to 33.41% for the PAN-PC09 corpus and from 18.83% to 26.66% for the PAN-PC11 corpus. Results are reported on PAN-PC09 and PAN-PC11 corpora, which are especially well suited for this task and were previously used in Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection (PAN) competitions. © 2014 Springer International Publishing.
CITATION STYLE
Kuta, M., & Kitowski, J. (2014). Optimisation of character n-gram profiles method for intrinsic plagiarism detection. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8468 LNAI, pp. 500–511). Springer Verlag. https://doi.org/10.1007/978-3-319-07176-3_44
Mendeley helps you to discover research relevant for your work.