Similar meaning analysis for original documents identification in arabic language

5Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The progressive advancement in technology has become easy to present the language expression of someone else as one’s own with similar words semantically. This phenomenon increased the potential source of plagiarism. Its detection is a challenge especially in the case of Arabic paraphrase because of the semantic ambiguity of this language. In recent decades, researches have been hindered by the very limited availability of well-structured datasets. In this context, our main objectives are focused on constructing a corpus for Arabic and presenting thereafter its impact for identifying paraphrase. Indeed, we generated the suspect documents from the Open Source Arabic Corpora (OSAC). Distributed word representation (word2vec) and part-of-speech methods were useful for replacing each original word by its most similar one that had the same grammatical class. Moreover, we captured the structure of Arabic sentences with different window sizes and vector dimensions. Then, we studied how this corpus could be used efficiently in the evaluation of Natural Language Processing (NLP) methods (i.e. Term Frequency-Inverse Document Frequency (TF-IDF), Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), word2vec, Global Vector Representation (GloVe), and Convolutional Neural Network (CNN)) for paraphrase detection. Experiments revealed which one could outperformed significantly for preserving semantic properties of Arabic words with various linear regularities, alleviating data sparseness and increasing the degree of semantic similarity, in terms of precision and recall.

Cite

CITATION STYLE

APA

Mahmoud, A., & Zrigui, M. (2019). Similar meaning analysis for original documents identification in arabic language. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11683 LNAI, pp. 193–206). Springer Verlag. https://doi.org/10.1007/978-3-030-28377-3_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free