Paraphrase Detection in Indian Languages Using Deep Learning

Durairaj Thenmozhi; C. Jerin Mahibha; S. Kayalvizhi; M. Rakesh; Y. Vivek; V. Poojesshwaran

Conference Proceedings

Paraphrase Detection in Indian Languages Using Deep Learning

Communications in Computer and Information Science (2023) 1802 CCIS 138-154

DOI: 10.1007/978-3-031-33231-9_9

1Citations

3Readers

Get full text

Abstract

Multiple sentences that reveal the same meaning are considered to be paraphrases. Paraphrases restate a given text, passage or statement using different words in which the original context and the meaning are kept intact. It can be used to expand, clarify or summarize the content of essays, research papers and journals. Semantic identity of sentences are detected during the process of paraphrase detection. Paraphrase detection can be related to different applications, like plagiarism detection, text summarizing, text mining, question answering, and query ranking, in the domain of Natural Language Processing. Effective paraphrase detection could be implemented if the semantics of the language and their interactions are adequately captured. The process of paraphrase detection is considered to be a difficult and challenging task due to the wide range of complex morphological structures and vocabulary that prevails in most of the Indian languages. The approaches that exist for paraphrase detection include machine learning techniques like Multinomial Logistic Regression model and Recursive Auto Encoders, which lacks in hand-crafted feature engineering. The problem could be solved when deep learning approaches are used for paraphrase detection. In the proposed system, the classification of paraphrase, semi-paraphrase and non-paraphrase sentences are implemented using an ensemble of three deep learning algorithms which includes BERT (Bidirectional Encoder Representations from Transformers), USE (Universal Sentence Encoder) and Seq2Seq (Sequence to Sequence). The DPIL corpus has been used for the evaluation of the proposed system and the highest accuracy obtained considering languages Hindi and Punjabi are 85.22% and 85.80% respectively.

Author supplied keywords

Cite

CITATION STYLE

APA

Thenmozhi, D., Mahibha, C. J., Kayalvizhi, S., Rakesh, M., Vivek, Y., & Poojesshwaran, V. (2023). Paraphrase Detection in Indian Languages Using Deep Learning. In Communications in Computer and Information Science (Vol. 1802 CCIS, pp. 138–154). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-33231-9_9

Paraphrase Detection in Indian Languages Using Deep Learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions