Entity Based Sentiment Analysis on Twitter
The aim of ourwork is to use the Twitter corpus to ascertain the opinion about entities that matter and enable consumption of these opinions in a user friendly way. We focus on classifying the opinions as either positive, negative or neutral. Since there arent large enough datasets of labeled tweets, limiting the sentiment categories to the above three enables us to leverage other similar but larger datasets for training custom sentiment language models. We begin by extracting entities from the Twitter dataset using the Stanford NER 8. URLs and username tags (person) are also treated as entities to augment the entities found by the NER. To learn a sentiment language model we use a corpus of 200,000 product reviews that have been labeled as positive or negative. Using this corpus the sentiment language model computes the prob- ability that a given unigram or bigram is being used in a positive context and the probability that its being used in a negative context. Using this sentiment language model we analyze all tweets associated with an entity and classify whether the overall opinion of that entity is positive or negative and by how much.