A comparison among significance tests and other feature building methods for sentiment analysis: A first study

Raksha Sharma; Dibyendu Mondal; Pushpak Bhattacharyya

Conference Proceedings

A comparison among significance tests and other feature building methods for sentiment analysis: A first study

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10762 LNCS 3-19

DOI: 10.1007/978-3-319-77116-8_1

3Citations

8Readers

Get full text

Abstract

Words that participate in the sentiment (positive or negative) classification decision are known as significant words for sentiment classification. Identification of such significant words as features from the corpus reduces the amount of irrelevant information in the feature set under supervised sentiment classification settings. In this paper, we conceptually study and compare various types of feature building methods, viz., unigrams, TFIDF, Relief, Delta-TFIDF, χ2 test and Welch’s t-test for sentiment analysis task. Unigrams and TFIDF are the classic ways of feature building from the corpus. Relief, Delta-TFIDF and χ2 test have recently attracted much attention for their potential use as feature building methods in sentiment analysis. On the contrary, t-test is the least explored for the identification of significant words from the corpus as features. We show the effectiveness of significance tests over other feature building methods for three types of sentiment analysis tasks, viz., in-domain, cross-domain and cross-lingual. Delta-TFIDF, χ2test and Welch’s t-test compute the significance of the word for classification in the corpus, whereas unigrams, TFIDF and Relief do not observe the significance of the word for classification. Furthermore, significance tests can be divided into two categories, bag-of-words-based test and distribution-based test. Bag-of-words-based test observes the total count of the word in different classes to find significance of the word, while distribution-based test observes the distribution of the word. In this paper, we substantiate that the distribution-based Welch’s t-test is more accurate than bag-of-words-based χ2 test and Delta-TFIDF in identification of significant words from the corpus.

Cite

CITATION STYLE

APA

Sharma, R., Mondal, D., & Bhattacharyya, P. (2018). A comparison among significance tests and other feature building methods for sentiment analysis: A first study. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10762 LNCS, pp. 3–19). Springer Verlag. https://doi.org/10.1007/978-3-319-77116-8_1

A comparison among significance tests and other feature building methods for sentiment analysis: A first study

Abstract

Cite

Register to see more suggestions