This paper presents a new method for Cross-Language Plagiarism Analysis. Our task is to detect the plagiarized passages in the suspicious documents and their corresponding fragments in the source documents. We propose a plagiarism detection method composed by five main phases: language normalization, retrieval of candidate documents, classifier training, plagiarism analysis, and post-processing. To evaluate our method, we created a corpus containing artificial plagiarism offenses. Two different experiments were conducted; the first one considers only monolingual plagiarism cases, while the second one considers only cross-language plagiarism cases. The results showed that the cross-language experiment achieved 86% of the performance of the monolingual baseline. We also analyzed how the plagiarized text length affects the overall performance of the method. This analysis showed that our method achieved better results with medium and large plagiarized passages. © 2010 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Corezola Pereira, R., Moreira, V. P., & Galante, R. (2010). A new approach for cross-language plagiarism analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6360 LNCS, pp. 15–26). https://doi.org/10.1007/978-3-642-15998-5_4
Mendeley helps you to discover research relevant for your work.