Word embedding-based approaches for measuring semantic similarity of arabic-english sentences

El Moatez Billah Nagoudi; Jérémy Ferrero; Didier Schwab; Hadda Cherroun

Conference Proceedings

Word embedding-based approaches for measuring semantic similarity of arabic-english sentences

Communications in Computer and Information Science (2018) 782 19-33

DOI: 10.1007/978-3-319-73500-9_2

11Citations

27Readers

Get full text

Abstract

Semantic Textual Similarity (STS) is an important component in many Natural Language Processing (NLP) applications, and plays an important role in diverse areas such as information retrieval, machine translation, information extraction and plagiarism detection. In this paper we propose two word embedding-based approaches devoted to measuring the semantic similarity between Arabic-English cross-language sentences. The main idea is to exploit Machine Translation (MT) and an improved word embedding representations in order to capture the syntactic and semantic properties of words. MT is used to translate English sentences into Arabic language in order to apply a classical monolingual comparison. Afterwards, two word embedding-based methods are developed to rate the semantic similarity. Additionally, Words Alignment (WA), Inverse Document Frequency (IDF) and Part-of-Speech (POS) weighting are applied on the examined sentences to support the identification of words that are most descriptive in each sentence. The performances of our approaches are evaluated on a cross-language dataset containing more than 2400 Arabic-English pairs of sentence. Moreover, the proposed methods are confirmed through the Pearson correlation between our similarity scores and human ratings.

Author supplied keywords

Cite

CITATION STYLE

APA

Nagoudi, E. M. B., Ferrero, J., Schwab, D., & Cherroun, H. (2018). Word embedding-based approaches for measuring semantic similarity of arabic-english sentences. In Communications in Computer and Information Science (Vol. 782, pp. 19–33). Springer Verlag. https://doi.org/10.1007/978-3-319-73500-9_2

Word embedding-based approaches for measuring semantic similarity of arabic-english sentences

Abstract

Author supplied keywords

Cite

Register to see more suggestions