Semantic and similarity measure methods for plagiarism detection of students’ assignments

1Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper aims at detecting semantic plagiarism in Czech texts. The paper integrates a similarity measure technique previously used for text compression along with a synonyms structured thesaurus and a stemming algorithm to detect rewording and restructuring of texts in Czech language. Out of a 100GB corpus, we extracted 884 files of B.A., M.A., and Ph.D. students’ assignments, semester works and theses, from Computer Science major. The total size of the extracted testing data used was 1.98GB of plain text for our initial experiment. The method is tested first on short texts. Then, the method is applied on longer texts of students’ assignments. Our results on short texts showed more accurate results to detect paraphrased texts of semantic similarity, but lower accuracy was detected in case of identical texts with rearranged paragraphs. Our results experiment conducted on the long texts corpus of students’ assignment and theses show a semantic plagiarism rate of 23.9%. However, after manual scanning of documents, some noise results occur as a result of using the same technical terms and scientific definitions and references in bibliography lists in different documents. These results will be fine-tuned and optimized in the future by building a file—specific stop word list, additional exact match method and removing references and other standard text templates often used in certain parts of students’ assignment works and theses.

Cite

CITATION STYLE

APA

Soori, H., Prilepok, M., Platos, J., & Snasel, V. (2016). Semantic and similarity measure methods for plagiarism detection of students’ assignments. In Advances in Intelligent Systems and Computing (Vol. 427, pp. 117–125). Springer Verlag. https://doi.org/10.1007/978-3-319-29504-6_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free