Language-independent Twitter sentiment analysis

  • Narr S
  • Hulfenhaus M
  • Albayrak S
N/ACitations
Citations of this article
124Readers
Mendeley users who have this article in their library.

Abstract

Millions of tweets posted daily contain opinions and sentiment of users in a variety of languages. Sentiment classification can benefit companies by providing data for analyzing customer feed-back for products or conducting market research. Sentiment classifiers need to be able to handle tweets in multiple languages to cover a larger portion of the available tweets. Traditional clas-sifiers are however often language specific and require much work to be applied to a differ-ent language. We analyze the characterstics and feasibility of a language-independent, semi-supervised sentiment classification approach for tweets. We use emoticons as noisy labels to gen-erate training data from a completely raw set of tweets. We train a Nave Bayes classifier on our data and evaluate it on over 10000 tweets in 4 languages that were human annotated using the Mechanical Turk platform. As part of our contri-bution, we make the sentiment evaluation dataset publicly available. We present an evaluation of the performance of classifiers for each of the 4 languages and of the effects of using multilingual classifiers on tweets of mixed languages. Our ex-periments show that the classification approach can be applied effectively for multiple languages without requiring extra effort per additional lan-guage.

Cite

CITATION STYLE

APA

Narr, S., Hulfenhaus, M., & Albayrak, S. (2012). Language-independent Twitter sentiment analysis. Knowledge Discovery and Machine Learning (KDML), LWA, 12–14.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free