Plagiarism is specifically defined as literary theft of paragraphs or sentences from unreferenced source. This unauthorized behavior is a real problem that targets scientific research scope. This paper proposes a Hybrid Arabic Plagiarism Detection System (HYPLAG). The HYPLAG approach combines corpus-based and knowledge-based approaches by utilizing an Arabic semantic resource (Arabic WordNet). A preliminary study on texts from undergraduate students was conducted to understand their behavior and the patterns used in plagiarism. The results of the study show that students apply different techniques to plagiarized sentences, also it shows changes in sentence’s components (verbs, nouns, and adjectives). HYPLAG was evaluated on the ExAraPlagDet-2015 dataset against several other approaches that participated in the AraPlagDet PAN@FIRE shared task on Extrinsic Arabic plagiarism detection obtaining a higher performance (F-score 89% vs. 84% obtained by the best performing system at AraPlagDet) with less computational time.
CITATION STYLE
Ghanem, B., Arafeh, L., Rosso, P., & Sánchez-Vega, F. (2018). HYPLAG: Hybrid arabic text plagiarism detection system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10859 LNCS, pp. 315–323). Springer Verlag. https://doi.org/10.1007/978-3-319-91947-8_33
Mendeley helps you to discover research relevant for your work.