Learning opinions in user-generated web content
- ISSN: 13513249
- DOI: 10.1017/S135132491100012X
Abstract
The user-generated Web content has been intensively analyzed in Information Extraction and Natural Language Processing research. Web-posted reviews of consumer goods are studied to find customer opinions about the products. We hypothesize that nonemotionally charged descriptions can be applied to predict those opinions. The descriptions may include indicators of product size (tall), commonplace (some), frequency of happening (often), and reviewer certainty (maybe). We first construct patterns of how the descriptions are used in consumer-written texts and then represent individual reviews through these patterns. We propose a semantic hierarchy that organizes individual words into opinion types. We run machine learning algorithms on five data sets of user-written product reviews: four are used in classification experiments, another one for regression and classification. The obtained results support the use of non-emotional descriptions in opinion learning.
Learning opinions in user-generated web content
doi:10.1017/S135132491100012X
1
Learning opinions in user-generated
web content
M. S O K O L O V A1 and G. L A P A L M E2
1Department of Pediatrics, Faculty of Medicine,
Children’s Hospital of Eastern Ontario Research Institute, University of Ottawa,
401 Smyth Rd., Ottawa, Ontario, Canada, K1H 8L1
email : sokolova@uottawa.ca
2De´partement d’informatique et de recherche ope´rationnelle, Universite´ de Montre´al,
C.P. 6128, Succ Centre-Ville, Montre´al, Quebec, Canada, H3C 3J7
email : lapalme@iro.umontreal.ca
(Received 29 October 2009; revised 1 October 2010; accepted 25 January 2011 )
Abstract
The user-generated Web content has been intensively analyzed in Information Extraction
and Natural Language Processing research. Web-posted reviews of consumer goods are
studied to find customer opinions about the products. We hypothesize that nonemotionally
charged descriptions can be applied to predict those opinions. The descriptions may include
indicators of product size (tall), commonplace (some), frequency of happening (often), and
reviewer certainty (maybe). We first construct patterns of how the descriptions are used in
consumer-written texts and then represent individual reviews through these patterns. We
propose a semantic hierarchy that organizes individual words into opinion types. We run
machine learning algorithms on five data sets of user-written product reviews: four are used
in classification experiments, another one for regression and classification. The obtained
results support the use of non-emotional descriptions in opinion learning.
1 Opinions in user-generated Web content
The user-generated Web content refers to publicly available data produced by
the Web end users (Directorate for Science, Technology and Industry, 2007). For
instance, blogs, social network profiles, and consumer-written product reviews are
parts of the user-generated textual content. In those texts, users share their personal
stories, discuss life experience, and comment on various events, making the Web
content more personalized and subjective. This rapidly growing phenomenon has
attracted attention of many researchers, at the same time adding new analysis areas
to the field of text and language studies. In traditional text mining applications,
texts (documents) were often classified according to their topics. In studies of user-
generated texts, the focus shifted from topic classification to sentiment and opinion
analysis. Opinion studies often analyze user-written reviews to predict opinions about
the consumed goods. Our work focuses on opinion analysis methods applicable when
the product consumers reveal their opinions in a non-emotional way. We want to
establish a link between non-emotional expressions and reviewer’s opinions.
Table 1. Examples of emotional and descriptive reviews written by consumers
Reviews with emotional markers Descriptive reviews
[purchase] is a good value for the money
[product] is very well designed
I have a PC running some very high
specifications with over six USB devices,
LCD monitor, printer and Ethernet
equipment, with everything on at the
same time.
I would NOT buy this [product] again Also with some models the batteries need
to be replaced after a few years if you
end up using them a lot.
Opinion is a private state that may not be objectively observed or verified (Quirk
et al. 1985). Opinions, nevertheless, can be reliably predicted if we gain access
to useful language indicators. If the reviewer explicitly states her attitude, then
the expressed sentiments can be used to determine whether opinion is positive
or negative. In other cases, consumers describe their experience without invoking
positive or negative emotions. Table 1 presents reviews with emotional markers in the
left column and descriptive, non-emotional reviews in the right column. Hereinafter,
all examples in this font are taken from user-generated Web content.
In the three examples of emotional opinions in the left column of Table 1, we can
rely on positive and negative markers to determine the opinion label: [product] is
very well designed and [purchase] is a good value for the money infer positive
opinions, I would NOT buy this [product] again – a negative opinion. The opinion
analysis becomes more nontrivial if the users avoid emotional positive and negative
statements. In order to understand the expressed points of view in the two descriptive,
non-emotional opinions in the left column of Table 1, we want to look at both the
sentences from the context of the product use:
(i) I have a PC running some very high specifications with over six USB devices,
LCD monitor, printer, and Ethernet equipment, with everything on at the same
time.
(ii) Also with some models the batteries need to be replaced after a few years
if you end up using them a lot.
The first sentence says that the PC runs high specifications with over several
devices, the second sentence tells about the model’s need for the battery replacement
after a few years. Comparison of several positive and negative reviews helps us to
find that ability to run high specifications over several devices is a positive hint for the
PC users, whereas the need for a frequent change of batteries is a negative hint. The
first sentence, thus, is positive, whereas the second sentence leans toward a negative
opinion.
We hypothesize that it is possible to predict opinions from general, non-emotional
descriptions, which discuss place, time, manner, comparison, and physical conditions.
Those descriptions are not topic- or domain-specific. We elicit them with questions
where, when, how and what shape, what size. Free-form, free-content format stimulates
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


