A two-phase plagiarism detection system based on multi-layer lstm networks

Nguyen Van Son; Le Thanh Huong; Nguyen Chi Thanh

Journal ArticleOPEN ACCESS

A two-phase plagiarism detection system based on multi-layer lstm networks

IAES International Journal of Artificial Intelligence (2021) 10(3) 636-648

DOI: 10.11591/ijai.v10.i3.pp636-648

4Citations

31Readers

Abstract

Finding plagiarism strings between two given documents are the main task of the plagiarism detection problem. Traditional approaches based on string matching are not very useful in cases of similar semantic plagiarism. Deep learning approaches solve this problem by measuring the semantic similarity between pairs of sentences. However, these approaches still face the following challenging points. First, it is impossible to solve cases where only part of a sentence belongs to a plagiarism passage. Second, measuring the sentential similarity without considering the context of surrounding sentences leads to decreasing in accuracy. To solve the above problems, this paper proposes a two-phase plagiarism detection system based on multi-layer LSTM network model and feature extraction technique: (i) a passage-phase to recognize plagiarism passages, and (ii) a word-phase to determine the exact plagiarism strings. Our experiment results on PAN 2014 corpus reached 94.26% F-measure, higher than existing research in this field.

Author supplied keywords

Cite

CITATION STYLE

APA

Van Son, N., Huong, L. T., & Thanh, N. C. (2021). A two-phase plagiarism detection system based on multi-layer lstm networks. IAES International Journal of Artificial Intelligence, 10(3), 636–648. https://doi.org/10.11591/ijai.v10.i3.pp636-648

A two-phase plagiarism detection system based on multi-layer lstm networks

Abstract

Author supplied keywords

Cite

Register to see more suggestions