Polarity Classification of Subjective Words Using Common-Sense Knowledge-Base
- ISSN: 03029743
- ISBN: 9783642106453
Abstract
Semantic orientation of a word indicates whether the word denotes a positive or a negative evaluation. We present an approach to compute semantic orientation of words using machine-interpretable commonsense knowledge. We employ ConceptNet (a large semantic network of commonsense knowledge) for determining the polarity or semantic orientation of a sentiment expressing word. We apply heuristics on certain pre-defined predicates expressing semantic relationship between two concepts for classifying words that have a positive or negative polarity and finding words that have similar polarity. The advantages of the proposed approach are that it does not require any pre-annotated training dataset or manually created seed list. The proposed solution relies on a lexical resource which is created by volunteers on the Internet and not by trained or specialized knowledge engineers. We test our approach on publicly available pre-classified sentiment lexicon and present the results of our experiments and also examine the tradeoffs and limitations of the proposed solution. We conclude that it is possible to determine polarity of words with high accuracy by exploiting a machine-understandable layman's knowledge and basic facts that ordinary people know about the world.
Author-supplied keywords
Polarity Classification of Subjective Words Using Common-Sense Knowledge-Base
Using Common-Sense Knowledge-Base
Ashish Sureka, Vikram Goyal, Denzil Correa, and Anirban Mondal
Indraprastha Institute of Information Technology (IIIT), India
{ashish,vikram,denzilc,anirban}@iiitd.ac.in
http://www.iiitd.edu.in/
Abstract. Semantic orientation of a word indicates whether the word
denotes apositive or anegative evaluation.Wepresent an approach to com-
pute semantic orientation of words using machine-interpretable common-
sense knowledge. We employ ConceptNet (a large semantic network of
commonsense knowledge) for determining the polarity or semantic orien-
tation of a sentiment expressing word. We apply heuristics on certain pre-
defined predicates expressing semantic relationship between two concepts
for classifying words that have a positive or negative polarity and finding
words that have similar polarity. The advantages of the proposed approach
are that it does not require any pre-annotated training dataset or man-
ually created seed list. The proposed solution relies on a lexical resource
which is created by volunteers on the Internet and not by trained or spe-
cialized knowledge engineers. We test our approach on publicly available
pre-classified sentiment lexicon and present the results of our experiments
and also examine the tradeoffs and limitations of the proposed solution.
We conclude that it is possible to determine polarity of words with high
accuracy by exploiting amachine-understandable layman’s knowledge and
basic facts that ordinary people know about the world.
Keywords: Word-Level Polarity Classification, Common-Sense Knowl-
edge Base, Sentiment Analysis, Opinion Mining.
1 Introduction
Semantic orientation of a word indicates whether the word denotes a positive
evaluation (such as praise or positive opinion) or a negative evaluation (such
as criticism or negative opinion) [8][7]. Semantic orientation of a word is also
referred as the valence or polarity of a word and systems to automatically de-
termine semantic orientation of a word has applications in the area of sentiment
analysis, opinion mining, multi-perspective question and answering and filtering
abusive messages. Opinion mining and sentiment analysis of a product review
or any subjective statement is an area which has received significant interest in
recent times and polarity determination of a word is fundamental to the problem
of sentiment analysis (refer to a detailed survey on opinion mining and sentiment
analysis by Bo Pang and Lillian Lee [5]). Polarity determination at world level
H. Sakai et al. (Eds.): RSFDGrC 2009, LNAI 5908, pp. 486–493, 2009.
c
© Springer-Verlag Berlin Heidelberg 2009
(fine-grained analysis) forms a component of a larger system wherein polarity
determination at sentence, paragraph or document level needs to be performed.
There are two sub-problems within the problem of determining semantic orien-
tation of a word. One sub-problem consists of computing the direction (positive
or negative) and the other sub-problem consists of computing the intensity or
strength (weak or strong) within the computed direction. For example, the word
good is a weak positive word whereas excellent or fabulous or astonishing is a
strong positive word. Similarly, the word bad is a weak negative word whereas
horrible or terrible is a strong negative word. Automatically determining the
semantic orientation of word is required for developing sentiment lexicon as it is
tedious and time consuming to manually label all the words in a language with
its polarity and intensity.
The earliest work to solve the problem of automatically determining the seman-
tic orientation of a word was done by Hatzivassiloglou et al [8]. The basis of the ap-
proach by Hatzivassiloglou et al is that adjectives conjoined by words such as and
or or share the same polarity whereas adjectives conjoined by words such as but
will have opposite polarity or orientation. The methods consists of extracting pairs
of adjectives using conjunctions like and, or, but, either-or, or neither-nor from
1987 Wall Street Journal Corpus (a document set consisting of 21 million words)
and assigning similar or different polarities to adjectives based on the type of con-
juctions. Turney et al. proposed a general strategy for inferring semantic orienta-
tion of a word based on their hypothesis that the semantic orientation of a word
tends to correspond to the semantic orientation of its neighbors [7]. Neighborhood
between words is determined using statistical association or statistical dependence
between words (word co-occurrence). Kamps et al. use WordNet to measure se-
mantic orientation of adjectives by exploiting the graph-theoretic model of Word-
Net’s synonym relations [3]. Esuli et al. present a technique for determining the
semantic orientation of terms through gloss classification (performs quantitative
analysis of the glosses or definitions of terms given in on-line dictionaries) [1]. Wil-
son et al. presents an approach to recognizing contextual polarity of phrases (a
two-step process that employs machine learning that begins with a large stable
of clues marked with prior polarity and then identifies the contextual polarity of
the phrases that contain instances of those clues in a corpus) [9].Takamura et al.
present a technique that consists of construcing a lexical network by connecting
similar or related words and adopting the Potts model for the probability model
of the lexical network [6].
1.1 Paper Contributions
We propose a novel technique for determining the polarity of a word by mak-
ing use of a semantic network of common-sense knowledge. Previous approaches
compute semantic orientation of words in a corpus-driven manner by performing
statistical analysis on a corpus or rely on lexical resources created by experts and
trained knowledge engineers. Previous approaches also rely on a pre-annotaed
training dataset or a seed list of pre-classified sentiment words for performing
its task. In this paper, we present a new approach that differs from the previous
approaches and has the following advantages. The main advantages of our solu-
tion is that it relies on a lexical resource (called as ConceptNet) that represents
common-sense knowledge created by volunteers on the Internet (14,000 contrib-
utors from around the world as mentioned in the paper by Liu et al. [4]) and
not by trained or specialized knowledge engineers. Also, the proposed approach
does not require any pre-annotated training dataset or manually created seed
list to perform its tasks. Creating training dataset of pre-classified words and
manually building specialized lexical resources for sentiment analysis application
requires trained and specialized people and can be a time-consuming as well as
tedious process. The proposed solution overcomes the dependency on experts by
automatically creating sentiment lexicon and computing semantic orientation of
words based on common-sense knowledge created by ordinary people as volun-
teers and not specialized knowledge engineers. The proposed approach performs
polarity classification of sentiment word belonging to any lexical category (ad-
jective, adverb noun and verb) unlike some approaches that are able to perform
polarity classification of words belonging to just adjectives. We present empirical
results (based on experiments performed on publicly available test dataset and a
standard benchmark for this task) which prove that it is possible to predict with
good accuracy the polarity of a word by using laymans common-sense knowl-
edge. The limitation of our approach is that the accuracy and coverage of the
words is a function of the number of concepts, assertions, relations and quality of
data in the common-sense knowledge-base. The work presented in this paper is
a step in the direction of our research on investigating the usefulness of machine
understandable commonsense knowledge in the application domain of sentiment
analysis and opinion mining.
2 Solution Approach
We leverage ConceptNet (which is machine-interpretable semantic network rep-
resenting common-sense knowledge) for polarity classification of words. The
common-sense knowledge present in ConceptNet is collected from volunteers on
the Internet since the year 2000 and represents facts that ordinary people knows
about the world [2]. The data present in ConceptNet is contributed by ordinary
people unlike lexical resources such as WordNet and FrameNet which are mainly
created by trained and specialized knowledge engineers. As ConceptNet is a se-
mantic network, it consists of nodes connected by edges. The nodes represent
the concepts and the edges represent predicates. Predicates express semantic
relationships between two concepts. Some relationships between concepts in the
ConceptNet semantic network are: IsA, MadeOf, UsedFor, CapableOf, DesireOf,
CreatedBy, InstanceOf, PartOf, PropertyOf and EffectOf [2]. In ConceptNet, an
assertion is uniquely defined by five properties: language, relation, concept1, con-
cept2 and frequency. The Language property defines the language an assertion is
expressed in (such as English). The Relation property defines the relation or the
name of the predicate that connects the two concepts in the assertion (such as
IsA, PartOf). Concept 1 and Concept 2 define the first and the second argument
Table 1. Pre-defined pattern over assertions belonging to the Desires relation
Assertion Property Value of the Assertion Property
Language English
Relation Desires
Concept 1 a person or human or everyone
Concept 2 Word whose polarity needs to be determined
Assertion Type +1 or -1
of the relation (words and phrases). The Frequency property expresses how often
the given concepts would be related by the given relation, ranging from never to
always. Also for each assertion, there is a field which defines the assertion type.
The value of the assertion type is +1 if the assertion makes a positive statement
(such as Diamonds are pretty) and -1 if it makes a negative statement (such as
a person doesn’t want anxiety).
(Step 1). The first step of the proposed solution consist of checking if the
word matches the pattern or structure defined in Table 1. The pattern is based
on our hypothesis that if a person or human or everyone (as Concept 1) desires
(Relation type as Desires) something (represented as Concept 2), then Concept 2
(in our case a sentiment expressing word whose polarity needs to be determined)
will have positive connotation if the assertion type is positive (i.e. has a value
of +1) and will have negative connotation if the assertion type is negative (i.e.
has a value of -1). This step does not require any seed list or pre-classified
sentiment word and has an advantage over approaches that depend on having
a training dataset or manually created seed list. We validated our hypothesis
by entering few terms on the web-based interface provided at the ConceptNet
website. For example, some of the words which are expressed as Concept 2 and
where the Concept 1 is person, Relation is Desires, Assertion Type is +1 are:
accomplish (verb), admiration (noun), affection (adjective), beautiful (adjective),
bliss (noun), clever (adjective), comfort (verb) etc. Similarly, some of the words
which are expressed as Concept 2 and where the Concept 1 is person, Relation
is Desires, Assertion Type is -1 are: agonize (verb), annoyance (noun), anxiety
(noun), bad (adjective), boredom (noun), cancer (noun), confuse (verb), criminal
(noun), criticism (adjective), damaging (adjective) etc. We noticed that some
words fall into a category where Concept 1 is person (or human or everyone),
Relation is Desires and Assertion Type is both +1 and -1. Since, there is a
conflict in assertion type, we do not predict the polarity of such words and leave
it blank to be computed in the next steps of the overall process.
(Step 2). The second step of the solution consists of checking a pattern based
on DefinedAs relationship. The pattern is based on the hypothesis that two con-
cepts connected to each other using a DefinedAs relation in the same assertion will
have the same polarity (synonym or semantically similar relationship). Hence, if
the polarity of one of the concept is known in such a relation then the polarity of
the connected work can also be computed. This step uses the classifications from
the previous step to perform classifications of unclassified words. The seed for this
step comes from previous step and hence this step as well as the subsequent steps
does not require any pre-created seed list or training dataset. Unlike Step 1 (which
is applied once), Step 2 is executed repeatedly until there is no additional coverage
between two consecutive steps. This is done because the first run of Step 2 may
result in polarity determination of certain words that can help in predicting po-
larities of words which could not be determined during the first run of Step 2. For
example, let us say that there are two assertions ”A DefinedAs B” & ”B DefinedAs
C” where A,B & C are three concepts in the ConceptNet semantic network. If the
polarity of A is known and B is unknown after Step 1, then at the end of the first
run of Step 2, polarity of B can be determined. The polarity of concept C can be
determined after the second run of Step 2. Thus, Step 2 is repeated as long as the
coverage is increasing. We validated our hypothesis by entering few terms on the
web-based interface provided at the ConceptNet website. Some illustrative exam-
ples of two concepts connected to each other using DefinedAs relation: (blossom,
flower), (devil, Satan), (eliminate, exclude), (grotesque, bizarre), (indelicate, in-
decent), (savage, vicious), (advance, progress), and (whip, beat). The concepts
in ConceptNet are natural language fragments and we noticed that often the re-
lationship is of the type A DefinedAs Same B” and ”A DefinedAs Opposite B”
where A and B are concepts. For example, one of the assertions in ConceptNet
is: ”Advance DefinedAs same Progress” (can be interpreted as synonyms). Some
illustrative examples on concepts having the same polarity that we have provided
belong to the assertion type A DefinedAs Same B”. This can be handled by lo-
cating the word same in the concept and removing it from the concept string for
extracting the word whose polarity needs to be determined. We noticed several as-
sertions of type A DefinedAs Opposite B (can be interpreted as antonyms). Such
assertions can be handled by extracting the term opposite from the concept string
and flipping the polarity of B i.e. applying the inference that concept Bs polarity
is opposite to the polarity of concept A. Some illustrative examples of two con-
cepts connected to each other using DefinedAs relation and where the assertion
is of type ”A DefinedAs Opposite B” are: (dawn, dusk) (selfishness, selflessness),
(slow, fast), (abnormal, normal), (bad, good), (clean, dirty), (cruel, kind), (evil,
good), (evil, nice), (happiness, sadness), (hard, soft), and (yes, no).
(Step 3 and Step 4). Similar to Step 2, the third and fourth step consists
of classifying a word using the polarity of words computed from previous steps
(viewed as pre-annotated dataset or seed list for this step) and exploiting the
IsA and HasProperty predicate of ConceptNet. This is based on the hypoth-
esis that Concepts (in our case sentiment expressing nouns, verbs, adverbs or
adjectives) connected to each other using IsA relationship are semantically re-
lated (may not be similar as in the case of DefinedAs predicate) and share the
same polarity. Similar to the previous Step, we check the value of assertion type
(+1 or -1) and the presence of terms like same and opposite in the concept for
computing the semantic orientation of an unclassified word connected to a word
(whose polarity is known) through the IsA and HasProperty predicate. Step
2,3 and 4 are executed repeatedly (DefinedAs analysis followed by IsA analysis
Table 2. Total coverage and category-wise coverage after executing Step 1
Positive & Negative Positive Negative
Test Data 2007 830 1177
Coverage Absolute 550 245 305
Coverage Percentage 27.40% 29.51% 25.91%
Table 3. Confusion matrix and classification accuracy after executing Step 1
Predicted
Positive Negative
Actual Positive 227 7
Actual Negative 18 298
Correct Classification (227+298)/550 = 95.45%
Incorrect Classification (7+18)/550 = 4.54%
Table 4. Confusion matrix and accuracy after executing Step 2,3 and 4
Predicted
Positive Negative
Actual Positive 288 23
Actual Negative 51 398
Correct Classification (288+398)/760 = 90.26%
Incorrect Classification (23+51)/760 = 9.74%
connected to each other using DefinedAs predicate and having same polarity
were: (fancy, like), (gratitude, thank), (liberal, generous), (murky, dark), (para-
noia, fear). The polarity of like, thank, generous, dark and fear were computed
from previous step which resulted in correctly classifing the polarity of words
fancy, gratitude, liberal, murky and paranoia in Step 2. In this step, the system
was also able to correctly classify (with 100% accuracy) words connected using
DefinedAs relationship but having opposite polarity (as implied by the presence
of the word opposite in the concept): (cold, warm), (cruel, kind), (hard, easy)
and (rich, poor). We noticed that in this step, the accuracy was 100% but the
coverage was low. Table 4 presents the final results obtained after executing
Steps 2,3 and 4 repeatedly (Step 2 followed by Step 3 followed by Step 4) until
no further classifications were observed. As shown in Table 4, the approach cor-
rectly predicted 686 words from a total of 760 words that it could classify (an
accuracy of 90.26%). The system was able to create a sentiment lexicon of 760
words from a common-sense knowledge base without using any training dataset
or a seed list with an accuracy of around 90%.
4 Conclusions
This paper investigates the usefulness of commonsense knowledge for classifying
polarity of sentiment expressing words as positive or negative. Evaluation on test
data consisting of publicly available pre-annotated subjectivity lexicon shows
that leveraging common-sense knowledge that is shared by the vast majority of
people for determining semantic orientation determination of words is feasible.
The main advantage of the system is that it does not require any training data,
hand-crafted seed list or any external resource that is created by trained and
specialized knowledge engineers. The accuracy and coverage of the words is a
function of the number of concepts, assertions, relations and quality of data in
the common-sense knowledge-base.
References
1. Esuli, A., Sebastiani, F.: Determining the semantic orientation of terms through
gloss classification. In: Proceedings of the 2005 ACM CIKM International Confer-
ence on Information and Knowledge Management, pp. 617–624 (2005)
2. Havasi, C., Speer, R., Alonso, J.: ConceptNet 3: A Flexible Multilingual Semantic
Network for Common Sense Knowledge. In: Proceedings of Recent Advances in
Natural Languges Processing, Borovets (2007)
3. Kamps, J., Marx, M., Mokken, R.J., Rijke, M.D.: Using wordnet to measure seman-
tic orientation of adjectives. In: Proceedings of the 4th International Conference on
Language Resources and Evaluation, vol. IV, pp. 1115–1118 (2004)
4. Liu, H., Singh, P.: Commonsense Reasoning in and over Natural Language. In:
Negoita, M.G., Howlett, R.J., Jain, L.C. (eds.) KES 2004. LNCS (LNAI), vol. 3215,
pp. 293–306. Springer, Heidelberg (2004)
5. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends
in Information Retrieval 2, 1–135 (2008)
6. Takamura, H., Inui, T., Okumura, M.: Extracting Semantic Orientations of Phrases
from Dictionary. In: The Conference of the North American Chapter of the Associ-
ation for Computational Linguistics, Rochester, pp. 292–299 (2007)
7. Turney, P.D., Littman, M.L.: Measuring praise and criticism: Inference of semantic
orientation from association. ACM Transactions on Information Systems 21(4), 315–
346 (2003)
8. Vasileios, H., Kathleen, M.R.: Predicting the semantic orientation of adjectives.
In: Proceedings of the 35th Annual Meeting of the Association for Computational
Linguistics and the 8th Conference of the European Chapter of the ACL, New
Brunswick, pp. 174–181 (1997)
9. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level
sentiment analysis. In: Proceedings of Conference on Empirical Methods in Natural
Language Processing, Vancouver (2005)
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime



