Sentiment analysis has become an important opinion mining technique; in recent years, it becomes one of the most interesting fields in artificial intelligence. Pre-processing is considered as a significant stage in sentiment analysis, but it is not given much attention in the literature or models. The data which are collected from different sources might contain redundant and duplicates; it needs to undergo some detection process for any occurrence of redundancy in the datasets. This paper reviews, analyzes, and compares different data cleaning algorithms such as DySNI, PSNM, and brushing for identifying redundancy in the datasets. Further, it analyzed the effects of general data cleaning methods to enhance accuracy when it is applied to different classifiers. The result reveals that the DySNI algorithm gives the highest accuracy and the brushing algorithm (BAA-DD) helps to reduce the dataset size to a greater extent. Further, applying negation replacement and acronym expansion techniques helps to enhance the accuracy level.
CITATION STYLE
Mohamed Zakir, H., & Vinila Jinny, S. (2020). A comparative study on data cleaning approaches in sentiment analysis. In Lecture Notes in Electrical Engineering (Vol. 656, pp. 421–431). Springer. https://doi.org/10.1007/978-981-15-3992-3_35
Mendeley helps you to discover research relevant for your work.