A FEATURE EXTRACTION BASED IMPROVED SENTIMENT ANALYSIS ON APACHE SPARK FOR REAL-TIME TWITTER DATA

0Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

This paper aims to improve the accuracy of sentiment analysis on Apache Spark for a real-time general twitter data. A lot of works exist on sentiment analysis on offline or stored twitter data that uses several classification algorithms on relevant features extracted using well-known feature extraction methodologies on pre-processed text data. However, not much works exist for sentiment analysis of real-time twitter data and especially for the generic data on big data processing platforms such as Apache Spark. This paper proposes a real-time sentiment analysis for generic twitter data through Apache Spark using six classification algorithms on N-gram and Term Frequency — Inverse Document Frequency (TF-IDF) feature extraction methodologies on the pre-processed data. An exhaustive comparison is done using Logistic Regression (LR), Multinomial Naive Bayes (MNB), Random Forest Classfier(RFC), Support Vector Machine (SVM), K-Nearest Neighbour (K-NN), and Decision Tree (DT) classification algorithms. It is observed that the trigram feature extraction method performs the best on LR and SVM and the RFC results are also comparable on the considered general tweets data.

Cite

CITATION STYLE

APA

Kanungo, P., & Singh, H. (2023). A FEATURE EXTRACTION BASED IMPROVED SENTIMENT ANALYSIS ON APACHE SPARK FOR REAL-TIME TWITTER DATA. Scalable Computing, 24(4), 847–855. https://doi.org/10.12694/scpe.v24i4.2343

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free