Abstract
User reviews are important resources for many processes such as recommender systems and decision-making programs. Sentiment analysis is one of the processes that is very useful for extracting the valuable information from these reviews. Data preprocessing step is of importance in the sentiment analysis process, in which suitable preprocessing methods are necessary. Most of the available research that study the effect of preprocessing methods focus on balanced small-sized dataset. In this research, we apply different preprocessing methods for building a domain lexicon for unbalanced big-sized reviews. The applied preprocessing methods study the effects of stopwords, negation words and the number of word’s occurrence. Followed by applying different preprocessing methods to determine the words that have high sentiment orientations in calculating the total review sentiment score. Two main experiments with five cases are tested on the Amazon dataset for the movie domain. The best suitable preprocessing method is then selected for building the domain lexicon as well as calculating the total review sentiment score using the generated lexicon. Finally, we evaluate the proposed lexicon by comparing it with the general-based lexicon. The proposed lexicon outperforms the general lexicon in calculating the total review sentiment score in term of accuracy and F1-measure. Furthermore, the results prove that sentiment words are not restricted to adjectives and adverbs only (as commonly claimed); nouns and verbs also contribute to the sentiment score and thus effects in the sentiment analysis process. Moreover, the results also show that negation words have positive effects in the sentiment analysis process.
Author supplied keywords
Cite
CITATION STYLE
AL-Ghuribi, S. M., Noah, S. A., & Tiun, S. (2021). Various Pre-processing Strategies for Domain-Based Sentiment Analysis of Unbalanced Large-Scale Reviews. In Advances in Intelligent Systems and Computing (Vol. 1261 AISC, pp. 204–214). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58669-0_19
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.