Sentiment analysis of informal te...
Sentiment analysis of informal textual communication in cyberspace Georgios Paltogloua, Stephane Gobronb, Marcin Skowronc, Mike Thelwalla, and Daniel Thalmannb aSchool of Computing and Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1LY, UK bEcole Polytechnique F��d��rale e e de Lausanne, 1015 Lausanne, Switzerland cAustrian Research Institute for Artificial Intelligence, 1010 Vienna, Austria firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org Abstract. The ability to correctly identify the existence and polarity of emotion in informal, textual communication is a very important part of a realistic and immersive 3D environment where people communicate with one another through avatars or with an automated system. Such a feature would provide the system the ability to realistically represent the mood and intentions of the participants, thus greatly enhancing their experience. In this paper, we study and compare a number of approaches for detecting whether a textual utterance is of objective or subjective nature and in the latter case detecting the polarity of the utterance (i.e. positive vs. negative). Experiments are carried out on a real corpus of social exchanges in cyberspace and general conclusions are presented. Keywords: Opinion Mining, Sentiment Analysis, Conversational Sys- tems, Virtual Reality, Virtual Human, Emotional Profile 1 Introduction The proliferation of social networks such as blogs, forums and other online means of expression and communication have resulted in a landscape where people are able to freely discuss online through a variety of means and applications. Probably one of the most novel and interesting way of communication in cyberspace is through 3D virtual environments. In such environments, people, represented by their avatars, socialize and interact with each other and with virtual humans operated by machines i.e., computer systems. Examples of such
2 Comparison between lexicon-based and machine learning classification virtual environments are flourishing and include Second Life1, World of Warcraft 2, There3, IMVU4, Moove5, Activeworlds6, Bluemars7, Club Cooee8, etc. Despite the fact that the graphics of those environments remain relatively poor, futuristic movies such as Avatar9 provide an example of sophisticated landscapes and renderings that will be attainable by such environments in the foreseeable future. However, regardless of how attractive and realistic such artifi- cial 3D worlds become, they will always remain heavily dependant on the quality of human communication that takes place within them. As shown in [17, 4, 15], communication in environments that are not limited to one, textual modality, consists of not just semantic data transfer, but also of dense non-verbal commu- nication where sentiment plays an important role. Moreover, without emotion no consistent and coherent (virtual) body language is possible. Such primordial movements include facial expressions, eye looks, arm-language coordination, etc. Sentiment detection from textual utterances can play an important role in the development of realistic and interactive dialog systems. Such systems serve various educational, business or entertainment oriented functions and also in- clude systems that are deployed in 3D virtual environments. With the aid of ���dialog coherence��� modules, conversational systems aim at a realistic interac- tion flow at the emotional level e.g., Affect Listeners  and can greatly benefit from the correct identification of the emotional state of their participants. Taking into consideration that the majority of input to practical conversational systems constitute of short, informal, textual exchanges, it is essential that the sentiment analysis component integrated in the dialog system is able to cope with this type of informal, often incomplete or ill-formed type of communication. Sentiment analysis, the process of automatically detecting if a text segment contains emotional or opinionated content and extracting its polarity or valence, is a field of research that has received significant attention in recent years, both in academia and in industry. The aforementioned increase of user-generated content on the web has resulted in a wealth of information that is potentially of vital importance to institutions and companies, providing them with data to research their consumers, manage their reputations and identify new opportunities. As a result, most of the research in the field has been limited to product reviews (i.e. [12,42]), where the aim is to predict whether the reviewer recommends a product or not, based on the textual content of the review. The focus of this paper is different. Instead of focusing our attention to prod- uct reviews, we explore a more ubiquitous field of informal, social interactions in cyberspace. The unprecedented popularity of social platforms such as Facebook, 1 http://secondlife.com 2 http://www.worldofwarcraft.com 3 http://www.there.com 4 http://www.imvu.com 5 http://www.moove.com 6 http://www.activeworlds.com 7 http://bluemars.com 8 http://www.clubcooee.com 9 http://www.avatarmovie.com/
Comparison between lexicon-based and machine learning classification 3 Twitter, MySpace as well as 3D virtual worlds has resulted in an unparallel increase of textual exchanges that remains relatively unexplored especially in terms of its emotional content. Specifically, we aim to answer the following question: can lexicon-based ap- proaches perform more effectively than machine-learning approaches in this do- main? This question is particularly important, because previous research in sentiment analysis using product reviews has shown that machine-learning ap- proaches typically outperform lexicon-based ones but no exploration of whether the same holds for informal, social interactions has been carried in the past. The difference between the two domains are numerous. Firstly, reviews tend to be longer and more verbose than typical social interactions which may only be a few words long and often contain significant spelling errors . Secondly, no clear ���golden standard��� exists in the domain of informal communications with which to train a machine-learning classifier in opposition to the ���thumbs up��� or ���thumbs down��� feature of reviews. Lastly, social exchanges on the web tend to be much more diverse in terms of their topics with issues ranging from politics and recent news to religion while in contrast, product reviews by definition have a specific subject, i.e. the product under discussion. The study of emotional and social interactions in virtual worlds imply the study of virtual human (VH) behaviors. Two types of VH exist: avatars (i.e. the projection of a real human in the 3D environment) and agents (i.e. the projection of an autonomous machine simulating a human in the virtual world). These VH types result in three possible types of communications: avatar to avatar, agent to agent and avatar to agent. Each one of those has the following interesting aspects respectively: ��� A non verbal body language based on VH emotional states and mind profile. ��� A potential visualization of the interaction from a third VH that should be represented by an avatar ��� A non-verbal communication for the human representation and an action of agent strongly influenced by interpreted emotions from the avatar. It seems only logical that artificial intelligence and conversation systems would strongly benefit these aspects in order to make the communication more realistic. The structure of this paper is as follows. The next section provides a brief overview of relevant work in sentiment analysis. Section 3 presents the lexicon- based classifier and section 4 presents the two machine-learning classifiers that will be used in this study. Section 5 describes the data sets that were used and explains the experimental setup while section 6 presents and analyzes the results. Finally, we conclude and present some potential future directions of research. 2 Prior Work Sentiment analysis, also known as opinion mining, has known considerable inter- est recently. Most research has focused on analyzing the content of either movie or general product reviews (e.g. [31,5,12]). Attempts to expand the application