Yahoo! for Amazon: Sentiment Extr...
MANAGEMENT SCIENCE Vol. 53, No. 9, September 2007, pp. 1375���1388 issn 0025-1909 eissn 1526-5501 07 5309 1375 informs �� doi 10.1287/mnsc.1070.0704 �� 2007 INFORMS Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web Sanjiv R. Das Department of Finance, Leavey School of Business, Santa Clara University, Santa Clara, California 95053, srdas@scu.edu Mike Y. Chen Ludic Labs, San Mateo, California 94401, mike@ludic-lab.com Einvestor xtracting sentiment from text is a hard semantic problem. We develop a methodology for extracting small sentiment from stock message boards. The algorithm comprises different classifier algorithms cou- pled together by a voting scheme. Accuracy levels are similar to widely used Bayes classifiers, but false positives are lower and sentiment accuracy higher. Time series and cross-sectional aggregation of message information improves the quality of the resultant sentiment index, particularly in the presence of slang and ambiguity. Empirical applications evidence a relationship with stock values���tech-sector postings are related to stock index levels, and to volumes and volatility. The algorithms may be used to assess the impact on investor opinion of management announcements, press releases, third-party news, and regulatory changes. Key words: text classification index formation computers-computer science artificial intelligence finance investment History: Accepted by David A. Hsieh, finance received May 4, 2004. This paper was with the authors 1 year and 2 weeks for 1 revision. Published online in Articles in Advance July 20, 2007. 1. Introduction Language is itself the collective art of expression, a summary of thousands upon thousands of individual intuitions. The individual gets lost in the collective cre- ation, but his personal expression has left some trace in a certain give and flexibility that are inherent in all collective works of the human spirit���Edward Sapir, cited in Society of Mind by Minsky (1985, p. 270). We develop hybrid methods for extracting opin- ions in an automated manner from discussions on stock message boards, and analyze the performance of various algorithms in this task, including that of a widely used classifier available in the public domain. The algorithms are used to generate a sentiment index and we analyze the relationship of this index to stock values. As we will see, this analysis is efficient, and useful relationships are detected. The volume of information flow on the Web has accelerated. For example, in the case of Amazon Inc., there were cumulatively 70,000 messages by the end of 1998 on Yahoo���s message board, and this had grown to about 900,000 messages by the end of 2005. There are almost 8,000 stocks for which message board activity exists, across a handful of message board providers. The message flow comprises valu- able insights, market sentiment, manipulative behav- ior, and reactions to other sources of news. Message boards have attracted the attention of investors, cor- porate management, and of course, regulators.1 In this paper, ���sentiment��� takes on a specific mean- ing, that is, the net of positive and negative opin- ion expressed about a stock on its message board. Hence, we specifically delineate our measure from other market conventions of sentiment such as devi- ations from the expected put-call ratio. Our measure is noisy because it comprises information, sentiment, noise, and estimation error. Large institutions express their views on stocks via published analyst forecasts. The advent of stock chat and message boards enables small investors to express their views too, frequently and forcefully. We show that it is possible to capture this sentiment using statistical language techniques. Our algorithms are val- idated using revealed sentiment on message boards, and from the statistical relationship between senti- ment and stock returns, which track each other. 1 Das et al. (2005) present an empirical picture of the regulari- ties found in messages posted to stock boards. The recent case of Emulex Corp. highlights the sensitivity of the Internet as a senti- ment channel. Emulex���s stock declined 62% when an anonymous, false news item on the Web claimed reduced earnings and the res- ignation of the CEO. The Securities Exchange Commission (SEC) promptly apprehended the perpetrator, a testimony to the commit- ment of the SEC to keeping this sentiment channel free and fair. In relation to this, see the fascinating article on the history of market manipulation by Leinweber and Madhavan (2001). 1375
Das and Chen: Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web 1376 Management Science 53(9), pp. 1375���1388, �� 2007 INFORMS Posted messages offer opinions that are bullish, bearish, and many that are confused, vitriolic, rumor, and spam (null messages). Some are very clear in their bullishness, as is the following message on Amazon���s board (Msg 195006): The fact is The value of the company increases because the leader (Bezos) is identified as a commodity with a vision for what the future may hold. He will now be a public figure until the day he dies. That is value. In sharp contrast, this message was followed by one that was strongly bearish (Msg 195007): Is it famous on infamous? A commodity dumped below cost without profit, I agree. Bezos had a chance to make a profit without sales tax and couldn���t do it. The future looks grim here. These (often ungrammatical) opinions provide a basis for extracting small investor sentiment from discus- sions on stock message boards. While financial markets are just one case in point, the Web has been used as a medium for informa- tion extraction in fields such as voting behavior, consumer purchases, political views, quality of infor- mation equilibria, etc. (see Godes and Mayzlin 2004, Lam and Myers 2001, Wakefield 2001, Admati and Pfleiderer 2000 for examples). In contrast to older approaches such as investor questionnaires, sentiment extraction from Web postings is relatively new. It con- stitutes a real-time approach to sentiment polling, as opposed to traditional point-in-time methods. We use statistical and natural language processing techniques to elicit emotive sentiment from a posted message we implement five different algorithms, some language dependent, others not, using varied parsing and statistical approaches. The methodology used here has antecedents in the text classification lit- erature (see Koller and Sahami 1997, Chakrabarti et al. 1998). These papers classify textual content into natu- ral hierarchies, a popular approach employed by Web search engines. Extracting the emotive content of text, rather than factual content, is a complex problem. Not all mes- sages are unambiguously bullish or bearish. Some require context, which a human reader is more likely to have, making it even harder for a computer algo- rithm with limited background knowledge. For exam- ple, consider the following from Amazon���s board (Msg 195016): You���re missing this Sonny, the same way the cynics pronounced that ���Gone with the Wind��� would be a total bust. Simple, somewhat ambiguous messages like this also often lead to incorrect classification even by human subjects. We analyze the performance of various algo- rithms in the presence of ambiguity, and explore approaches to minimizing its impact. The technical contribution of this paper lies in the coupling of various classification algorithms into a system that compares favorably with standard Bayesian approaches, popularized by the phenome- nal recent success of spam-filtering algorithms. We develop metrics to assess algorithm performance that are well suited to the finance focus of this work. There are unique contributions within the specific algo- rithms used as well as accuracy improvements over- all, most noticeably in the reduction of false positives in sentiment classification. An approach for filtering ambiguity in known message types is also devised and shown to be useful in characterizing algorithm performance. Recent evidence suggests a link between small in- vestor behavior and stock market activity. Noticeably, day-trading volume has spurted.2 Choi et al. (2002) analyze the impact of a Web-based trading channel on the trading activity in corporate 401(k) plans, and find that the ���Web effect��� is very large���trading fre- quency doubles, and portfolio turnover rises by over 50%, when investors are permitted to use the Web as an information and transaction channel. Wysocki (1998), using pure message counts, reports that vari- ation in daily message posting volume is related to news and earnings announcements. Lavrenko et al. (2000) use computer algorithms to identify news sto- ries that influence markets, and then trade success- fully on this information. Bagnoli et al. (1999) examine the predictive validity of whisper forecasts, and find them to be superior to those of First Call (Wall Street) analysts.3 Antweiler and Frank (2004) examine the bullishness of messages, and find that while Web talk does not predict stock movements, it is pre- dictive of volatility. Tumarkin and Whitelaw (2001) also find similar results using self-reported sentiment (not message content) on the Raging Bull message board. Antweiler and Frank (2002) argue that message posting volume is a priced factor, and higher post- ing activity presages high volatility and poor returns. Tetlock (2005) and Tetlock et al. (2006) show that negative sentiment from these boards may be pre- dictive of future downward moves in firm values. 2 Stone (2001) cites a Bear Stearns report that reports a huge spurt in volume, and a total number of day-traders in excess of 50,000. 3 The ���whisper��� number, an aggregate of informal earnings forecasts self-reported by individual investors, is now watched extensively by market participants, large and small. Whispers are forecasts of the quarterly earnings of a firm posted to the Web by individuals in a voluntary manner. The simple average of these forecasts is presented on the whisper Web page, along with the cor- responding forecast from First Call, which is an aggregate of the sentiment of Wall Street analysts.
Das and Chen: Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web Management Science 53(9), pp. 1375���1388, �� 2007 INFORMS 1377 Figure 1 Schematic of the Algorithms and System Design Used for Sentiment Extraction World Wide Web Web-Scraper Program Message File Preprocessing: DATA Stock data Dictionary Lexicon Grammar Helper Programs: Message parsing, Multinomial calc, Dictionary handler Naive classifier Vector distance classifier Discriminant classifier Adjective/Adverb Cl Bayesian classifier OUTPUT Classified messages Statistical summary CLASSIFIERS Cleanup HTML, expand abbreviations, tag negations These results suggest the need for algorithms that can rapidly access and classify messages with a view to extracting sentiment���the goal of this paper.4 The illustrative analyses presented in this paper confirm many of these prior empirical findings, and extend them as well. Overall, this paper comprises two parts: (i) method- ology and validation, in ��2, which presents the algo- rithms used and their comparative performance, and (ii) the empirical relationship of market activity and sentiment, in ��3. Section 4 contains discussion and conclusions. 2. Methodology 2.1. Overview The first part of the paper is the extraction of opinions from message board postings to build a sentiment index. Messages are classified by our algorithms into one of three types: bullish (optimistic), bearish (pes- simistic), and neutral (comprising either spam or mes- sages that are neither bullish nor bearish). We use five algorithms, each with different conceptual under- pinnings, to classify each message. These comprise a 4 In contrast, Antweiler and Frank (2005) recently used computa- tional linguistic algorithms to sort news stories into topics, instead of sentiment, and uncovered many interesting empirical regulari- ties relating news stories and stock values. blend of language features such as parts of speech tag- ging, and more traditional statistical methods.5 Before initiating classification, the algorithms are tuned on a training corpus, i.e., a small subset of preclas- sified messages used for training the algorithms.6 The algorithms ���learn��� sentiment classification rules from the preclassified data set, and then apply these rules out-of-sample. A simple majority across the five algorithms is required before a message is finally classified, or else it is discarded. This voting approach results in a better signal to noise ratio for extracting sentiment. Figure 1 presents the flowchart for the method- ology and online Appendix A (provided in the 5 This paper complements techniques such as support vector machines (SVMs) that are optimization methods that classify con- tent. See the papers by Vapnik (1995), Vapnik and Chervonenkis (1964), and Joachims (1999) for a review. A recent paper by Antweiler and Frank (2004) uses SVMs to carry out an exercise similar to the one in this paper. These approaches are computation- ally intensive and are often run on parallel processors. Moreover, they have been used for more than 30 years, and the technology is well developed. In this paper, we did not employ support vector machines, instead choosing to focus on purely analytic techniques that did not require optimization methods in the interests of com- putational efficiency. 6 The training corpus is kept deliberately small to avoid overfitting, which is a common ailment of text classification algorithms.