Vector representation of words for plagiarism detection based on string matching

2Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Plagiarism detection in documents requires appropriate definition of document similarity and efficient computation of the similarity. This paper evaluates the validity of using vector representation of words for defining a document similarity in terms of the processing time and the accuracy in plagiarism detection. This paper proposes a plagiarism detection algorithm based on the score vector weighted by vector representation of words. The score vector between two documents represents the number of matches between corresponding words for every possible gap of the starting positions of the documents. The vector and its weighted version can be computed efficiently using convolutions. In this paper, two types of vector representation of words, that is, randomly generated vectors and a distributed representation generated by a neural network-based method from training data, are evaluated with the proposed algorithm. The experimental results show that using the weighted score vector instead of the normal one for the algorithm can reduce the processing time with a slight decrease of the accuracy, and that randomly generated vector representation is more suitable for the algorithm than the distributed representation in the sense of a tradeoff between the processing time and the accuracy.

Cite

CITATION STYLE

APA

Baba, K., Nakatoh, T., & Minami, T. (2017). Vector representation of words for plagiarism detection based on string matching. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10274 LNCS, pp. 341–350). Springer Verlag. https://doi.org/10.1007/978-3-319-58524-6_28

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free