Sign up & Download
Sign in

Explorations of an incremental, Bayesian algorithm for categorization

by John R Anderson, Michael Matessa
Machine Learning (1992)

Abstract

An incremental categorization algorithm is described which, at each step, assigns the next instance to the most probable category. Probabilities are estimated by a Bayesian inference scheme which assumes that instances are partitioned into categories and that within categories features are displayed independently and probabilistically. This algorithm can be shown to be an optimization of an ideal Bayesian algorithm in which predictive accuracy is traded for computational efficiency. The algorithm can deliver predictions about any dimension of a category and does not treat specially the prediction of category labels. The algorithm has successfully modeled much of the empirical literature on human categorization. This paper describes its application to a number of data sets from the machine learning literature. The algorithm performs reasonably well, having its only serious difficulty because the assumption of independent features is not always satisfied. Bayesian extensions to deal with nonindependent features are described and evaluated.

Cite this document (BETA)

Available from www.springerlink.com
Page 1
hidden

Explorations of an incremental, Bayesian algorithm for categorization

Machine Learning, 9, 275-308 (1992)
© 1992 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
Explorations of an Incremental, Bayesian Algorithm
for Categorization
JOHN R. ANDERSON AND MICHAEL MATESSA JAOS@ANDREW.CMU.EDU
Department of Psychology, Carnegie Me/Ion University, Pittsburgh, PA 15213
Editor: Dennis Kibler
Abstract. An incremental categorization algorithm is described which, at each step, assigns the next instance
to the most probable category. Probabilities are estimated by a Bayesian inference scheme which assumes that
instances are partitioned into categories and that within categories features are displayed independently and prob-
abilistically. This algorithm can be shown to be an optimization of an ideal Bayesian algorithm in which predic-
tive accuracy is traded for computational efficiency. The algorithm can deliver predictions about any dimension
of a category and does not treat specially the prediction of category labels. The algorithm has successfully modeled
much of the empirical literature on human categorization. This paper describes its application to a number of
data sets from the machine learning literature. The algorithm performs reasonably well, having its only serious
difficulty because the assumption of independent features is not always satisfied. Bayesian extensions to deal with
nonindependent features are described and evaluated.
Keywords. Bayesian inference, concept learning, human learning, incremental algorithms
1. Introduction
We have been engaged in a project to understand human categorization which has led us
to develop a machine learning algorithm. Our research began as an exploration of the issue
of whether human categorization can be considered optimal. We were interested in this
both as a philosophical issue and as a practical means for predicting human behavior. As
to the philosophical score, if human categorization can be shown to be optimal this would
be further evidence for the view that human cognition in general is strongly adapted to
its environment. As a practical matter, if optimal, one can predict human categorization
by investigating what is optimal in a particular categorization situation, thus bypassing the
traditional path of proposing specific cognitive mechanisms and all the murky issues of
identifiability that come with a mechanistic approach (Anderson, 1990).
To pursue the issue of whether human cognition is optimal requires specifying two things.
First, we need a definition of optimality. Second, we need a specification of the structure
of the environment so we can determine what behavior is optimal in that environment.
These are the first two issues that we will address in this paper.
1.1. Preliminary definition of optimization
Our assumption has been that the goal of categorization is to predict unknown features
of various objects that we encounter. For instance, when one sees a creature on a path
Page 2
hidden
276 J.R. ANDERSON AND M. MATESSA
one would like to predict whether it is dangerous or not. One can gain accuracy in predic-
tion of certain features by identifying the category (e.g., tiger) from which the object comes.
Optimal prediction behavior is behavior that achieves a maximal tradeoff between accuracy
of prediction and cost of computing the prediction. It is clear we need this trade-off. An
exquisitely accurate estimate of the danger of this object would do no good if it took hours
to compute. It is the constraint of minimizing computation that leads to a concern with
the efficiency of the algorithm for computing the prediction.
Thus, there are the issues of how to measure accuracy, computational cost, and how to
combine them. With respect to accuracy, we have adopted in this paper the goal of minimizing
absolute error. In the case of predicting discrete features, this comes down to predicting
the most probable value. In the case of predicting features with continuous normal distribu-
tions, this comes down to predicting the mean value. In terms of a Bayesian decision
framework (e.g., DeGroot, 1970), one might not always want to minimize accuracy in terms
of absolute error. For instance, one might not always want to predict the most probable
discrete value. A possible example of this is treating an animal as dangerous even if it is
more likely friendly because the cost of misclassifying a dangerous animal as friendly is
greater than the cost of misclassifying a friendly animal as dangerous. However, since we
do not have such complex utility metrics available in our applications, we have opted for
minimizing absolute error.
With respect to computational cost we have chosen to focus on minimizing time. This
ignores potentially relevant considerations such as space but time is generally viewed as
a more precious commodity in the human case. It is also the case that the steps we will
take to minimize time will also substantially reduce storage costs. Minimizing time is a
somewhat underspecified goal and will require further statement of the constraint under
which the minimization takes place. We will develop these later in the paper.
To have a precise definition of optimization, we need a rule for combining error and
time to come up with a total cost. Assuming each unit of time has a cost a and each unit
of error has a cost b, the total cost should be cast as a weighted sum of time and error—
i.e., a function of the form aT + bE where T is time and E is absolute error. Before we
can more precisely specify time or error, we need to discuss the structure of the environment.
1.2. The structure of the environment
Our theory of the structure of the environment has been focused on the structure of living
things (arguably, the largest portion of the objects in the world) because of the aid biology
gives in objectively specifying the organization of these objects. In particular the theory
developed rests on the structure of living objects produced by the phenomenon of species.
Species form a nearly disjoint partitioning of the living things because of the inability to
interbreed between species. Within a species there is a common genetic pool which means
that individual members of the species will display particular feature values with probabilities
that reflect the proportion of that phenotype in the population. Another useful feature of
species structure is that the display of features within a freely-interbreeding species is largely

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

8 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
38% Ph.D. Student
 
13% Student (Master)
 
13% Other Professional
by Country
 
13% China
 
13% United Kingdom
 
13% Netherlands