Uniting a priori and a posteriori knowledge : A research framework
Available from ir-kr.okkam.org
Page 1
Uniting a priori and a posteriori knowledge : A research framework
Uniting a priori and a posteriori knowledge:
A research framework
Michael Witbrock, Elizabeth Coppock, and Robert Kahlert
Cycorp, Inc.
fwitbrock,ecoppock,rckg@cyc.com
Abstract
The ability to perform machine classification is a
critical component of an intelligent system. We
propose to unite the logical, a priori approach
to this problem with the empirical, a posteriori
approach. We describe in particular how the a
priori knowledge encoded in Cyc can be merged
with technology for probabilistic inference using
Markov logic networks. We describe two problem
domains – the Whodunit Problem and noun phrase
understanding – and show that Cyc’s commonsense
knowledge can be fruitfully combined with proba-
bilistic reasoning.
1 Introduction
Machine classification is a general problem of fundamental
importance to the field of artificial intelligence. The ability
to harness the vast amount of information freely available
on the World Wide Web, for example, depends on technol-
ogy for solving the entity resolution problem: Determining
whether two expressions refer to the same entity. Classifi-
cation is important in military and law enforcement domains
as well; consider the Whodunit Problem: given features of
a criminal act, who is the most likely perpetrator? Classifi-
cation problems in natural language processing include word
sense disambiguation and noun phrase understanding: in the
phrase, Swiss bank, what sense of bank is involved, and what
relation to Switzerland does the referent have? With good
machine classification technology, it will be possible to solve
many important problems across a wide range of domains.
Our research agenda is to develop systems that are capable
of taking into account both empirical statistics and a priori
knowledge in order to solve classification problems. Descrip-
tion logic [Baader et al., 2003] is an example of a purely log-
ical, a priori approach to the problem of classification. De-
scription logic provides a medium for encoding facts about
the real world, and can be used to classify objects based on
their attributes. This approach, however, is fundamentally too
brittle; failure to meet any one of the conditions for classify-
ing an object into a particular class is equivalent to meeting
none of the conditions.
On the other end of the spectrum are machine-learning
techniques. This type of approach succeeds in being more
flexible than approaches like description logic, by having
weighted constraints that may combine in a gradient fash-
ion. The pure machine-learning approach suffers, however,
from being overly reliant on having large quantities of train-
ing data. Training data is often lacking: It is costly to produce
labelled data, and even labelled data may be sparse for other
reasons. For instance, because most words are infrequent (by
Zipf’s law), training data for many natural language process-
ing tasks is likely to be missing. Moreover, information that is
already known should not have to be re-learned; it should be
possible to combine what is known already with knowledge
gained from empirical statistics.
In recent years, the gap between statistical and logical ap-
proaches to classification has begun to narrow. In the field
of information extraction, statistical and non-statistical meth-
ods have been combined, for example in the TextRunner
system [Banko and Etzioni, 2008] and SOFIE [Suchanek et
al., 2009]. The fields of relational data mining [Dzeroski
and Lavrac, 2001] and statistical relational learning com-
bine ideas from probability and statistics with tools from logic
and databases [Getoor and Taskar, 2007]. A wide variety of
techniques within these field have been developed, such as
probabilistic relational models, knowledge-based model con-
struction, and stochastic logic programs, and many of these
techniques are special cases of Markov logic [Domingos and
Richardson, 2007], [Domingos et al., 2006]. In Markov
logic, weights are attached to arbitrary formulas in first order
logic, defining a probability distribution over possible worlds
[Richardson and Domingos, 2006].
We propose to integrate Markov logic with the Cyc project,
a large-scale effort to represent commonsense knowledge.
The Cyc system cannot trivially be converted into a Markov
logic network, because the Cyc knowledge base is quite large,
and Cyc uses higher order logic. However, it is possible to
create Markov logic networks over subsets of the Cyc knowl-
edge base, and to bridge these two resources in a way that
usefully combines logical and statistical approaches to artifi-
cial intelligence.
2 Background
2.1 Cyc
For over 20 years, the Cyc project has been devoted to the de-
velopment of a system that is capable of reasoning with com-
A research framework
Michael Witbrock, Elizabeth Coppock, and Robert Kahlert
Cycorp, Inc.
fwitbrock,ecoppock,rckg@cyc.com
Abstract
The ability to perform machine classification is a
critical component of an intelligent system. We
propose to unite the logical, a priori approach
to this problem with the empirical, a posteriori
approach. We describe in particular how the a
priori knowledge encoded in Cyc can be merged
with technology for probabilistic inference using
Markov logic networks. We describe two problem
domains – the Whodunit Problem and noun phrase
understanding – and show that Cyc’s commonsense
knowledge can be fruitfully combined with proba-
bilistic reasoning.
1 Introduction
Machine classification is a general problem of fundamental
importance to the field of artificial intelligence. The ability
to harness the vast amount of information freely available
on the World Wide Web, for example, depends on technol-
ogy for solving the entity resolution problem: Determining
whether two expressions refer to the same entity. Classifi-
cation is important in military and law enforcement domains
as well; consider the Whodunit Problem: given features of
a criminal act, who is the most likely perpetrator? Classifi-
cation problems in natural language processing include word
sense disambiguation and noun phrase understanding: in the
phrase, Swiss bank, what sense of bank is involved, and what
relation to Switzerland does the referent have? With good
machine classification technology, it will be possible to solve
many important problems across a wide range of domains.
Our research agenda is to develop systems that are capable
of taking into account both empirical statistics and a priori
knowledge in order to solve classification problems. Descrip-
tion logic [Baader et al., 2003] is an example of a purely log-
ical, a priori approach to the problem of classification. De-
scription logic provides a medium for encoding facts about
the real world, and can be used to classify objects based on
their attributes. This approach, however, is fundamentally too
brittle; failure to meet any one of the conditions for classify-
ing an object into a particular class is equivalent to meeting
none of the conditions.
On the other end of the spectrum are machine-learning
techniques. This type of approach succeeds in being more
flexible than approaches like description logic, by having
weighted constraints that may combine in a gradient fash-
ion. The pure machine-learning approach suffers, however,
from being overly reliant on having large quantities of train-
ing data. Training data is often lacking: It is costly to produce
labelled data, and even labelled data may be sparse for other
reasons. For instance, because most words are infrequent (by
Zipf’s law), training data for many natural language process-
ing tasks is likely to be missing. Moreover, information that is
already known should not have to be re-learned; it should be
possible to combine what is known already with knowledge
gained from empirical statistics.
In recent years, the gap between statistical and logical ap-
proaches to classification has begun to narrow. In the field
of information extraction, statistical and non-statistical meth-
ods have been combined, for example in the TextRunner
system [Banko and Etzioni, 2008] and SOFIE [Suchanek et
al., 2009]. The fields of relational data mining [Dzeroski
and Lavrac, 2001] and statistical relational learning com-
bine ideas from probability and statistics with tools from logic
and databases [Getoor and Taskar, 2007]. A wide variety of
techniques within these field have been developed, such as
probabilistic relational models, knowledge-based model con-
struction, and stochastic logic programs, and many of these
techniques are special cases of Markov logic [Domingos and
Richardson, 2007], [Domingos et al., 2006]. In Markov
logic, weights are attached to arbitrary formulas in first order
logic, defining a probability distribution over possible worlds
[Richardson and Domingos, 2006].
We propose to integrate Markov logic with the Cyc project,
a large-scale effort to represent commonsense knowledge.
The Cyc system cannot trivially be converted into a Markov
logic network, because the Cyc knowledge base is quite large,
and Cyc uses higher order logic. However, it is possible to
create Markov logic networks over subsets of the Cyc knowl-
edge base, and to bridge these two resources in a way that
usefully combines logical and statistical approaches to artifi-
cial intelligence.
2 Background
2.1 Cyc
For over 20 years, the Cyc project has been devoted to the de-
velopment of a system that is capable of reasoning with com-
Page 2
monsense knowledge. At the core, Cyc consists of a power-
ful inference engine combined with a knowledge base (KB)
that contains over 6 million assertions. These assertions are
expressed in a language (CycL) based on first-order logic, en-
hanced by a quoting mechanism and higher-order extensions
[Matuszek et al., 2006]. In normal inference, the assertions
in the Cyc KB function as “hard constraints” in the sense that
if a formula contradicts an existing fact (within a given con-
text), it is considered simply to be false. Thus, for the most
part, Cyc represents the logical, symbolic, a priori approach
to artificial intelligence.1
A portion of the information in the Cyc KB is taxonomic,
expressing (i) the class membership of terms, using the
binary predicate isa, which relates an instance to a collec-
tion e.g. (isa Snoopy Dog), where Snoopy is an individual
dog and Dog stands for the collection of all dogs; (ii) the
subsumption relationships among those classes, expressed
with genls, relating a subcollection to a supercollection
e.g. (genls Dog Animal); (iii) disjointness information,
expressed with disjointWith, which holds of collections
that do not share any members. Cyc predicates (including
isa and genls) are associated with definitional information,
which constrain the types of entities that may appear as
arguments to the predicate. Consider some of the argument
constraint information for the predicate biologicalMother
(read “has as the biological mother”).
(arg1Isa biologicalMother Animal)
(arg2Isa biologicalMother FemaleAnimal)
The argument constraint information states that the no-
tion “has as a biological mother” is defined for pairs of
instances whose first member is an Animal, and whose sec-
ond member is a FemaleAnimal. Definitional information,
combined with the taxonomic hierarchy, makes Cyc into a
higher-order system.
Another higher-order feature of Cyc is that predi-
cates are also arranged in a generalization hierarchy;
biologicalMother is a more specific predicate than
relatives. This relation between the two predicates is
expressed with the second-order predicate genlPreds as
follows:
(genlPreds biologicalMother relatives)
This means that biologicalMother inherits all of the
constraints on the predicate relatives, including the
following rule (CycL variables, noted with question marks,
are implicitly universally bound by default):
(implies
1Some assertions in Cyc are defeasible; CycL contains five possi-
ble truth values: monotonically false, default false, unknown, default
true, and monotonically true. Default assertions can be overridden
when two rules conclude P, but one concludes that P is monotoni-
cally true and the other concludes that P is default false. Then, all
else being equal, Cyc sets the truth value of P to the one suggested
by the monotonic rule [Panton et al., 2006]. However, formulas are
not associated with probabilities in Cyc.
(isa ?COL BiologicalSpecies)
(interArgIsa1-2 relatives ?COL ?COL))
The predicate interArgIsa1-2 specifies a type con-
straint on one argument, given the type of another. This rule
requires, for example, that if X is Y ’s relative, and X is a
bird, then Y is also a bird.
This higher-order information is used extensively by the
Cyc inference engine to prune search when answering
queries. The reduction in search space makes it feasible to
perform inference over a KB of the size of Cyc. This means
that the Cyc KB cannot be converted as a whole into Markov
logic, but it is possible to use this higher-order information to
identify subsets of the KB with which to build Markov logic
networks.
2.2 Markov Logic Networks
Markov logic is a language that unifies first order logic with
probabilistic graphical models [Richardson and Domingos,
2006]. In Markov logic, logical formulas are associated with
weights. Intuitively, the higher the weight is for a given for-
mula, the less likely it is to be contradicted. Formally, weights
are interpreted using a Markov logic network, which defines
a probability distribution X over assignments of truth values
to propositional variables, or worlds. Given a set of formulas
and their associated weights, the probability of a world x is
defined as:
P (X = x) =
1
Z
exp(Fi=1wini(x))
where F is the number of formulas, Z is a normalization con-
stant ensuring legal probabilities, wi is the weight of the ith
formula, and ni(x) is the number of true groundings of the
ith formula in x. This means that the more times a world vi-
olates a formula, the less likely the world is (when the weight
is positive), and the higher the weight, the stronger the ef-
fect. When the weight is infinite, violations of the formula
are impossible; this is how “hard constraints” are modelled.
Software for weight learning and inference with Markov
logic networks is provided through the Alchemy system [Kok
et al., 2007]. Alchemy is a flexible software package provid-
ing generative and discriminative methods for weight learn-
ing and several methods of performing probabilistic infer-
ence, including MC-SAT, Gibbs Sampling, and Belief Prop-
agation (ibid). In what follows, we describe a framework for
integrating Cyc with Alchemy.
3 Merging Cyc with Markov Logic
3.1 The Whodunit Problem
The Cyc Analyst’s Knowledge Base (AKB) is a portion of
the Cyc KB that contains over 4500 events of terrorism, with
information about each event including the type of attack, the
location, and the agent. Given facts about an event, the goal is
to predict who was the perpetrator – the Whodunit Problem.2
2Several approaches to this problem were presented by [Halstead
and Forbus, 2007].
ful inference engine combined with a knowledge base (KB)
that contains over 6 million assertions. These assertions are
expressed in a language (CycL) based on first-order logic, en-
hanced by a quoting mechanism and higher-order extensions
[Matuszek et al., 2006]. In normal inference, the assertions
in the Cyc KB function as “hard constraints” in the sense that
if a formula contradicts an existing fact (within a given con-
text), it is considered simply to be false. Thus, for the most
part, Cyc represents the logical, symbolic, a priori approach
to artificial intelligence.1
A portion of the information in the Cyc KB is taxonomic,
expressing (i) the class membership of terms, using the
binary predicate isa, which relates an instance to a collec-
tion e.g. (isa Snoopy Dog), where Snoopy is an individual
dog and Dog stands for the collection of all dogs; (ii) the
subsumption relationships among those classes, expressed
with genls, relating a subcollection to a supercollection
e.g. (genls Dog Animal); (iii) disjointness information,
expressed with disjointWith, which holds of collections
that do not share any members. Cyc predicates (including
isa and genls) are associated with definitional information,
which constrain the types of entities that may appear as
arguments to the predicate. Consider some of the argument
constraint information for the predicate biologicalMother
(read “has as the biological mother”).
(arg1Isa biologicalMother Animal)
(arg2Isa biologicalMother FemaleAnimal)
The argument constraint information states that the no-
tion “has as a biological mother” is defined for pairs of
instances whose first member is an Animal, and whose sec-
ond member is a FemaleAnimal. Definitional information,
combined with the taxonomic hierarchy, makes Cyc into a
higher-order system.
Another higher-order feature of Cyc is that predi-
cates are also arranged in a generalization hierarchy;
biologicalMother is a more specific predicate than
relatives. This relation between the two predicates is
expressed with the second-order predicate genlPreds as
follows:
(genlPreds biologicalMother relatives)
This means that biologicalMother inherits all of the
constraints on the predicate relatives, including the
following rule (CycL variables, noted with question marks,
are implicitly universally bound by default):
(implies
1Some assertions in Cyc are defeasible; CycL contains five possi-
ble truth values: monotonically false, default false, unknown, default
true, and monotonically true. Default assertions can be overridden
when two rules conclude P, but one concludes that P is monotoni-
cally true and the other concludes that P is default false. Then, all
else being equal, Cyc sets the truth value of P to the one suggested
by the monotonic rule [Panton et al., 2006]. However, formulas are
not associated with probabilities in Cyc.
(isa ?COL BiologicalSpecies)
(interArgIsa1-2 relatives ?COL ?COL))
The predicate interArgIsa1-2 specifies a type con-
straint on one argument, given the type of another. This rule
requires, for example, that if X is Y ’s relative, and X is a
bird, then Y is also a bird.
This higher-order information is used extensively by the
Cyc inference engine to prune search when answering
queries. The reduction in search space makes it feasible to
perform inference over a KB of the size of Cyc. This means
that the Cyc KB cannot be converted as a whole into Markov
logic, but it is possible to use this higher-order information to
identify subsets of the KB with which to build Markov logic
networks.
2.2 Markov Logic Networks
Markov logic is a language that unifies first order logic with
probabilistic graphical models [Richardson and Domingos,
2006]. In Markov logic, logical formulas are associated with
weights. Intuitively, the higher the weight is for a given for-
mula, the less likely it is to be contradicted. Formally, weights
are interpreted using a Markov logic network, which defines
a probability distribution X over assignments of truth values
to propositional variables, or worlds. Given a set of formulas
and their associated weights, the probability of a world x is
defined as:
P (X = x) =
1
Z
exp(Fi=1wini(x))
where F is the number of formulas, Z is a normalization con-
stant ensuring legal probabilities, wi is the weight of the ith
formula, and ni(x) is the number of true groundings of the
ith formula in x. This means that the more times a world vi-
olates a formula, the less likely the world is (when the weight
is positive), and the higher the weight, the stronger the ef-
fect. When the weight is infinite, violations of the formula
are impossible; this is how “hard constraints” are modelled.
Software for weight learning and inference with Markov
logic networks is provided through the Alchemy system [Kok
et al., 2007]. Alchemy is a flexible software package provid-
ing generative and discriminative methods for weight learn-
ing and several methods of performing probabilistic infer-
ence, including MC-SAT, Gibbs Sampling, and Belief Prop-
agation (ibid). In what follows, we describe a framework for
integrating Cyc with Alchemy.
3 Merging Cyc with Markov Logic
3.1 The Whodunit Problem
The Cyc Analyst’s Knowledge Base (AKB) is a portion of
the Cyc KB that contains over 4500 events of terrorism, with
information about each event including the type of attack, the
location, and the agent. Given facts about an event, the goal is
to predict who was the perpetrator – the Whodunit Problem.2
2Several approaches to this problem were presented by [Halstead
and Forbus, 2007].
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
7 Readers on Mendeley
by Discipline
14% Engineering
14% Linguistics
by Academic Status
43% Ph.D. Student
29% Post Doc
14% Researcher (at a non-Academic Institution)
by Country
29% India
14% United Kingdom
14% Slovenia


