Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification

17Citations
Citations of this article
82Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Sentiment classification is a natural language processing task to identify opinions expressed in texts such as product or service reviews. In this work, we analyze the effects of different deep-learning model combinations, embedding methods, and tokenization approaches in sentiment classification. We feed non-contextualized (Word2Vec and GloVe) and contextualized (BERT and RoBERTa/XLM-RoBERTa) embeddings and also the output of the pretrained BERT and RoBERTa/XLM-RoBERTa models as input to neural models. We make a comprehensive analysis of eleven different tokenization approaches, including the commonly used subword methods and morphologically motivated segmentations. The experiments are conducted on three English and two Turkish datasets from different domains. The results show that BERT- and RoBERTa-/XLM-RoBERTa-based and contextualized embeddings outperform other neural models. We also observe that using words in raw or preprocessed form, stemming the words, and applying WordPiece tokenizations give the most promising results in the sentiment analysis task. We ensemble the models to find out which tokenization approaches produce better results together.

Cite

CITATION STYLE

APA

Erkan, A., & Gungor, T. (2023). Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification. IEEE Access, 11, 134951–134968. https://doi.org/10.1109/ACCESS.2023.3337354

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free