Millions of tweets posted daily contain opinions and sentiment of users in a variety of languages. Sentiment classification can benefit companies by providing data for analyzing customer feed-back for products or conducting market research. Sentiment classifiers need to be able to handle tweets in multiple languages to cover a larger portion of the available tweets. Traditional clas-sifiers are however often language specific and require much work to be applied to a differ-ent language. We analyze the characterstics and feasibility of a language-independent, semi-supervised sentiment classification approach for tweets. We use emoticons as noisy labels to gen-erate training data from a completely raw set of tweets. We train a Nave Bayes classifier on our data and evaluate it on over 10000 tweets in 4 languages that were human annotated using the Mechanical Turk platform. As part of our contri-bution, we make the sentiment evaluation dataset publicly available. We present an evaluation of the performance of classifiers for each of the 4 languages and of the effects of using multilingual classifiers on tweets of mixed languages. Our ex-periments show that the classification approach can be applied effectively for multiple languages without requiring extra effort per additional lan-guage.
CITATION STYLE
Narr, S., Hulfenhaus, M., & Albayrak, S. (2012). Language-independent Twitter sentiment analysis. Knowledge Discovery and Machine Learning (KDML), LWA, 12–14.
Mendeley helps you to discover research relevant for your work.