Text similarity based on data compression in Arabic

Hussein Soori; Michal Prilepok; Jan Platos; Eshetie Berhan; Vaclav Snasel

Conference Proceedings

Text similarity based on data compression in Arabic

Lecture Notes in Electrical Engineering (2014) 282 LNEE 211-220

DOI: 10.1007/978-3-642-41968-3_22

11Citations

12Readers

Get full text

Abstract

With the huge amount of online and offline written data, plagiarism detection has become an eminent need for various fields of science and knowledge. Various context based plagiarism detection methods have been published in the literature. This paper, tries to develop a new plagiarism detection methods using text similarity for Arabic language text with 150 documents and 330 paragraphs (159 from the source document and 171 from Al-Khaleej corpus). The findings of the study show that the similarity measurement based on Lempel Ziv comparison algorithms is very efficient for the plagiarized part of the Arabic text documents with a successful rate of 71.42%. Future studies can improve the efficiency of the algorithms by combining more sophisticated computation, statistical and linguistics hybrid detection methods. © Springer-Verlag Berlin Heidelberg 2014.

Cite

CITATION STYLE

APA

Soori, H., Prilepok, M., Platos, J., Berhan, E., & Snasel, V. (2014). Text similarity based on data compression in Arabic. In Lecture Notes in Electrical Engineering (Vol. 282 LNEE, pp. 211–220). Springer Verlag. https://doi.org/10.1007/978-3-642-41968-3_22

Text similarity based on data compression in Arabic

Abstract

Cite

Register to see more suggestions