Source code representations for plagiarism detection

Michal Ďuračík; Emil Kršák; Patrik Hrkút

Conference Proceedings

Source code representations for plagiarism detection

Communications in Computer and Information Science (2018) 870 61-69

DOI: 10.1007/978-3-319-95522-3_6

5Citations

12Readers

Get full text

Abstract

At the present time the plagiarism is a growing problem due to a lot of easily accessible resources, and many papers deal with this topic. New algorithms are constantly being created, but there are not currently manny of systems, that we could use for plagiarism detection. Our aim is to explore plagiarism on a large scale. This paper focuses on selecting the appropriate representation of the source code, that is very important when searching for plagiarism. There is an overview of the current representation possibilities. We focus on representation source code using AST. Comparison of the tree structures is time-consuming operation. We will try to find how effectively represent AST in order to facilitate comparison. There are two ways to represent AST. Representation by hashing or using characteristic vectors. We present the experiment and results on which we choose the appropriate form of the representation.

Author supplied keywords

Cite

CITATION STYLE

APA

Ďuračík, M., Kršák, E., & Hrkút, P. (2018). Source code representations for plagiarism detection. In Communications in Computer and Information Science (Vol. 870, pp. 61–69). Springer Verlag. https://doi.org/10.1007/978-3-319-95522-3_6

Source code representations for plagiarism detection

Abstract

Author supplied keywords

Cite

Register to see more suggestions