A comparative study on data cleaning approaches in sentiment analysis

H. Mohamed Zakir; S. Vinila Jinny

Conference Proceedings

A comparative study on data cleaning approaches in sentiment analysis

Lecture Notes in Electrical Engineering (2020) 656 421-431

DOI: 10.1007/978-981-15-3992-3_35

1Citations

5Readers

Get full text

Abstract

Sentiment analysis has become an important opinion mining technique; in recent years, it becomes one of the most interesting fields in artificial intelligence. Pre-processing is considered as a significant stage in sentiment analysis, but it is not given much attention in the literature or models. The data which are collected from different sources might contain redundant and duplicates; it needs to undergo some detection process for any occurrence of redundancy in the datasets. This paper reviews, analyzes, and compares different data cleaning algorithms such as DySNI, PSNM, and brushing for identifying redundancy in the datasets. Further, it analyzed the effects of general data cleaning methods to enhance accuracy when it is applied to different classifiers. The result reveals that the DySNI algorithm gives the highest accuracy and the brushing algorithm (BAA-DD) helps to reduce the dataset size to a greater extent. Further, applying negation replacement and acronym expansion techniques helps to enhance the accuracy level.

Author supplied keywords

Cite

CITATION STYLE

APA

Mohamed Zakir, H., & Vinila Jinny, S. (2020). A comparative study on data cleaning approaches in sentiment analysis. In Lecture Notes in Electrical Engineering (Vol. 656, pp. 421–431). Springer. https://doi.org/10.1007/978-981-15-3992-3_35

A comparative study on data cleaning approaches in sentiment analysis

Abstract

Author supplied keywords

Cite

Register to see more suggestions