Functional Compositionality and a new view of Knowledge Representation
- DOI: 10.1.1.25.7063
Abstract
It is argued that recent connectionist techniques for compositional representation have opened up a new view of knowledge representation which extends the old view, by demonstrating that compositionality can be achieved in more than one way, and that the form of representation is just as important as the choice of representational language for constraining the processing that can be done. In my view this could be one of the most important contributions of connectionism to AI in general thus far. 1 Introduction In any science there are bursts of optimism and activity as a new theory is introduced which solves various problems and becomes a fashionable topic on which to do research. This is often followed by a more sober appreciation of (or even a loss of confidence in) the theory in question. Inevitably it seems, the earlier optimism fades, perhaps along with a sense of confidence in the field as a whole. However progress in the field will still have occurred, if only through the...
Functional Compositionality and a new view of Knowledge Representation
Representation
James A. Hammerton
School of Computer Science
The University of Birmingham
Edgbaston, Birmingham, B15 2TT
J.A.Hammerton@cs.bham.ac.uk
Abstract
It is argued that recent connectionist techniques for compositional representation have
opened up a new view of knowledge representation which extends the old view, by demon-
strating that compositionality can be achieved in more than one way, and that the form of
representation is just as important as the choice of representational language for constrain-
ing the processing that can be done. In my view this could be one of the most important
contributions of connectionism to AI in general thus far.
1 Introduction
In any science there are bursts of optimism and activity as a new theory is introduced which solves
various problems and becomes a fashionable topic on which to do research. This is often followed
by a more sober appreciation of (or even a loss of condence in) the theory in question. Inevitably
it seems, the earlier optimism fades, perhaps along with a sense of condence in the eld as a
whole. However progress in the eld will still have occurred, if only through the realisation that an
initially promising idea was
awed. Thus it could be said that many scientic subjects experience
an \ebb and
ow" of condence as they progress, but with each \ebb" leaving the level of progress
higher than before the previous \
ow". In order to make progress, each \ebb" in a eld must
include a proper evaluation of the previous \
ow". That is, the new theory must be properly
evaluated and what is of value must be preserved and built upon.
Currently, in Articial Intelligence (AI) one might feel that it is a eld which has made almost
no progress toward its early goals. According to this view (which I shall term the \AI in crisis"
view), AI seems to be a collection of disparate techniques and approaches such as neural nets,
generate and test search, genetic algorithms or fuzzy systems, each with their own strengths and
weaknesses, but with no real progress being made. I believe this view to be mistaken, and in this
paper
1
I will discuss a development (and the relevance of my work to it) which I believe has involved
signicant progress and will also lead to future progress. Like other elds, Articial Intelligence
has had its own ebbs and
ows. An example is the rise of connectionism as an approach to AI in
the 1980s. A combination of frustration as a response to the perceived limitations of symbolic ap-
proaches and the publication of some promising research in the Rumelhart & McClelland volumes
[12] led many to believe that connectionism was the breakthrough AI had been waiting for. Thus,
by the early 1990s, connectionism had become a fashionable sub-discipline within AI with its own
journals and respected researchers. Many were then optimistic at the prospect of connectionism
replacing symbolic AI. Today connectionism is becoming regarded as complimentary to the sym-
bolic approach with its own strengths and weaknesses, and it is clear that an articial mind is still
a distant goal. This is the \ebb" that has followed from the earlier \
ow" of connectionism which I
1
This work was supported by a research student-ship from the School of Computer Science, University of Birm-
ingham, UK.
1
long for shortcomings with the earlier work to be discovered and criticisms of the whole approach
to be developed (e.g. [5]) and much of recent connectionist research has focused on dealing with
these criticisms. Nevertheless the challenge that connectionism made to the symbolic approach
has probably led to a better understanding of what is required for a true AI to be built.
For connectionism, and AI generally, to progress, critical evaluation of the eld's work must
take place. The debate between connectionists and symbolists has made valuable contributions to
this process and may lead to a better understanding of the potential that connectionism oers.
One area of research that has been spurred on by this debate is the attempt to provide symbol
processing capabilities within connectionist models. My work involves evaluating some claims made
for the techniques that have thus been developed. In the rest of this paper I shall I argue that
the project of providing connectionist systems with symbol processing capabilities may already
have made an important contribution to AI. One of the most fundamental concepts in AI is
that of knowledge representation, and I shall argue that the connectionist methods that have
been developed for representing compositional structures, such as phrase structure trees in natural
language processing (NLP) systems, have opened up a new view of knowledge representation which
extends the traditional view on which symbolic AI is based. I shall describe how my work is related
to this new view, and discuss the implications of both the new view and my work for the future
development of the eld.
2 Connectionism and Knowledge Representation
2.1 The traditional view of Knowledge Representation
Any intelligent system must have a means of representing the knowledge that it uses, and therefore
how that knowledge is represented is fundamental. The method chosen to represent knowledge
constrains what can be done with the knowledge and thus how well an intelligent system can
perform. Traditionally nding a method of knowledge representation has been viewed as nding a
language (e.g. a formal language such as predicate logic) for expressing the knowledge in a concise
manner. This linguistic approach to knowledge representation is exemplied in the description of
knowledge representation contained in the popular AI textbook by Russell & Norvig [13]:
From page 157:
\The object of knowledge representation is to express knowledge in a computer tractable
form, such that it can be used to help agents perform well. A knowledge represen-
tation language is dened by two aspects:
The syntax of a language describes the possible congurations that can constitute
sentences. Usually we describe syntax in terms of how sentences are represented on
the printed page but the real representation is inside the computer: each sentence
is implemented by a physical conguration or physical property of some part of
the agent...
The semantics determines the facts in the world to which sentences refer...",
(emphasis in original)
And from page 163:
\It is possible in principle to dene a language in which every sentence has a com-
pletely arbitrary interpretation. But in practice, all representation languages impose a
systematic relationship between sentences and facts. The languages we will deal with
are all compositional | the meaning of a sentence is a function of the meaning of its
parts."
The above emphasises a key point in knowledge representation, namely that the representations
are normally compositional. (Fodor & Pylyshyn [5] argue that compositionality is essential in a
2
up the representation of a complex proposition directly out of the representation of its constituents.
For example the sentence \John loves Mary" concatenates the constituents \John", \loves" and
\Mary". Symbolic systems may concatenate symbols in more or less the same manner, or they may
build up the data structure using pointers to indicate the relationship between constituents. Either
way the representation of the complex structure contains the representation of its constituents as
its building blocks. The resources used for the representation are directly proportional to the
number of constituents in the structure. Following the lead of van Gelder [16] I shall refer to this
form of compositionality as concatenative compositionality.
The nal point to note is that these representations require a method of denoting a symbol,
such as the bit patterns that represent a symbol in a computer. Abstractly one has a collection
of representational elements (e.g. the bit patterns) which are arranged in some sort of representa-
tional space (e.g. the computer's memory). Together these form the representational medium in
which the representations are formed. This point is mentioned in Russell & Norvig's discussion,
but its importance is not discussed. Indeed traditional knowledge representation focuses on choos-
ing the representational language which is best suited to a particular task and typically ignores
the representational medium. However, as I shall illustrate below, the choice of representational
medium may constrain the forms of processing of the knowledge just as much as the choice of
representational language.
3 How Connectionism has altered the traditional view of
Knowledge Representation
From the start, connectionism had a rather dierent concept of representations. Where symbolic
representations involved collections of discrete symbols, connectionist representations involved pat-
terns of activity across a set of neurons which amount to being vectors of (typically) real-valued
numbers. In much of the work, the representation of some object was done across several units
(as opposed to a single unit), giving rise to the notion of a distributed representation
2
. It became
apparent, especially after the critique of connectionism by Fodor & Pylyshyn [5], that neural net-
works had diculty with handling tasks that required compositional representations of the sort
that symbolic systems routinely use. Since a compositional representation could be of arbitrary
size, the assignation of a node or a set of nodes to each possible proposition was infeasible. What
was needed was a way of creating a compositional representation that did not use a number of
units that was directly related to the number of constituents in the structure. In response to Fodor
& Pylyshyn's challenge, several techniques for representing compositional structures in neural net-
works were soon developed (e.g. BoltzCONS [15], Recursive Auto-Associative Memories (RAAMs)
[11] and Holographic Reduced Representations (HRRs) [9, 10]).
Many of these connectionist representational techniques involve methods of combining several
vectors representing the constituents of a structure to create a single vector representing the struc-
ture itself. In the case of RAAMs and HRRs (in my view the two most promising techniques
developed thus far), the representation of a complex structure was a vector of the same size as the
vectors representing its constituents, which made these representations amenable to processing by
standard neural models such as feed-forward networks. These representations dier from the tra-
ditional compositional representations quite fundamentally. A complex structure in this instance
is represented not by the concatenation of the constituents but by a vector which is systematically
related to the constituents via mathematical transformations (either hand picked in the case of
HRRs or learned in the case of RAAMs). Compositionality is thus achieved by the process of com-
bining the vectors representing the constituents of a structure to create a single vector from which
the constituents can be extracted. Van Gelder [16] has coined the term functional compositionality
to describe this. It is this development of a new way of achieving compositionality that lies behind
my claim that a new view of knowledge representation has been opened up. Until connectionists
2
The Rumelhart&McClelland volumes [12] favoured this approach
3
a given that arose naturally from the linguistic form of traditional representations. Showing that
compositionality can be achieved in more than one way has by itself altered the traditional view
of knowledge representation.
Representing structures out of xed-width vectors makes them amenable to processing by stan-
dard neural models, but it has been argued that it has a more important consequence. Symbolic
systems operated on compositional structures atomistically by nding each symbol that needs
to be changed individually and altering it. Neural networks operate on vectors holistically, and
by representing the structures in xed-width vectors it is possible to operate on symbolic struc-
tures holistically (i.e. where all the constituents of the structure are processed simultaneously).
This is known as holistic computation [4, 6] and potentially allows complex structure sensitive
transformations to be performed in constant time. Even on parallel machines, standard symbolic
representations would require time proportional to the depth of the structure, and thus holistic
computation oers the possibility of developing more ecient forms of inference
3
.
One of the earliest demonstrations of holistic computation was performed by Chalmers [3] who
trained a network to perform active to passive transformations of RAAM representations of sim-
ple sentences holistically. Applications of holistic computation have included logical inference [8],
and the holistic unication of logical terms [2, 18]). Furthermore via holistic representations (also
known as superpositional representations [17]), holistic computation is possible even on sequential
machines. In a holistic representation, when a complex structure is represented, each representa-
tional element is involved in representing every constituent of the structure and every element of
the structure is represented across all the representational elements taking part. Thus one cannot
isolate part of the structure by isolating some of the elements, and a change to a single element
results in changes across all the constituents of the structure. HRRs are holistic representations in
this sense. Thus not only do we have a new form of compositionality and new forms of representa-
tion but these also allow a new way of processing the compositional structures that are formed with
these techniques. The use of a representational medium involving vectors of real-valued numbers
makes it possible to achieve compositionality in this way, and thus illustrates that the properties of
the representational medium, such as the nature of the representational elements, do in fact con-
strain what forms of processing are possible with a particular method of representation. Thus the
discovery of holistic computation has shown that the traditional view of knowledge representation
was incomplete. A new view thus emerges where the representational medium and the mode of
compositionality used determine the forms of processing possible with the sentences of the repre-
sentational language and therefore how well an intelligent agent can perform. In the next section
I shall illustrate how my work on holistic computation is relevant to this new view of knowledge
representation.
4 Holistic Computation and Holistic Representations
The discovery of holistic computation and representation, and the new view of knowledge repre-
sentation that arises, open up various lines of research, such as:
Investigation of how holistic computation can be achieved and exploited.
Investigating and explaining how compositional structures can be processed within neural
architectures and how such processing can be learned.
Further investigations into exactly how the form of representation, e.g. the representational
medium and the mode of composition, in
uences and constrains processing.
Investigating whether other forms of representation exist.
Investigation into whether and how the brain exploits holistic computation and holistic rep-
resentations.
3
The downside to this is that there is a maximum size to the structure that can be encoded in a vector of given
width, since you can only store so much information in one vector
4
Connectionist representations and holistic computation require critical evaluation if the most is to
be gained from them. My work has two distinct aspects which contribute to this:
4.1.1 Clarifying the exact nature of holistic computation.
I have argued elsewhere [6] that in much of the literature holistic computation has been only
vaguely dened and has often been confused with holistic representation. For example, Chrisman
[4] describes holistic computation (which he refers to as holistic inference) thus:
\This form of inference occurs in a gestalt fashion by deriving a solution to the inference
problem directly from a representation of structured data without decomposing, locating
or accessing its constituent elements." (page 346).
\Inferences that use a representation directly without accessing its compositional struc-
ture are said to perform holistic inference." (page 349).
Here it is not clear what Chrisman means when an inference occurs in a gestalt fashion {
does this mean that all the constituents of a structure are processed simultaneously or that all
the constituents of the representation of the structure are processed simultaneously? With any
representation, processing all the constituents of the representation simultaneously will involve
processing all the constituents of the structure simultaneously. However with a holistic repre-
sentation, changing a single representational element involves changes across the entire structure
occurring simultaneously, thus every constituent of the structure is being altered simultaneously
even though only one element of the representation is being changed. In such representations the
constituents of a structure are not directly tokenised in the elements of the representation and
holistic computation will occur however you operate on the representation. Chalmers [3] claimed
that the representations developed by his RAAMs had this property:
\In the distributed representations formed by RAAMs, there is no such explicit tokening
of the original words" (from the reprint in [14], page 54).
He contrasted this with the explicit tokening of constituents in classical representations, im-
plying that this enabled holistic computation. Chalmers did not actually test his representations
directly to see if this was the case. However, Imre Balogh [1] did test the representations to see
if this property held. He found that the representations were in fact localist representations with
explicit tokening of the words in Chalmers' sentences. He concluded that holistic computation was
not occurring, a conclusion which can only hold if you believe that distributed or holistic repre-
sentations are necessary for holistic computation { an assumption shared by Chalmers (and other
connectionist researchers). I believe this assumption to be mistaken if holistic computation is re-
garded as any computational process that can act on all the constituents of an object simultaneously
without the need to perform a search to locate or access those constituents[6].
But why should we prefer this denition to earlier ones? As noted earlier denitions don't
specify exactly what is holistic about a computation { e.g. whether it is the representation that is
holistic or whether it is the computation that is holistic. This denition at least makes it clear that
it is the computation that is holistic. Whilst any computation involving holistic representations
is a holistic computation in this sense, other sorts of computation can be too. For example the
parallel processing of a xed length localist representation where each elements of the represen-
tation is processed in parallel would be holistic on the above denition. This is what I argue is
going on with Chalmers work, and it thus qualies as an example of holistic computation, despite
Balogh's ndings. The primary reason for preferring this denition is therefore conceptual clarity
and with it a better picture of what is needed to achieve holistic computation. Since both forms of
holistic computation open up the possibility of performing complex structure sensitive operations
in constant time, I see no reason to focus purely on the holistic computations that involve holistic
representations. Indeed if your goal is to build systems that can perform structure sensitive opera-
tions in constant time then regarding only computations on holistic representations as holistic gives
5
of the literature on holistic computation claims that the possibility of doing this is an advantage
of using connectionist representations, the denition also ts (and perhaps better ts) in with the
purposes of the research which used the earlier denitions.
Finally, one consequence of my denition is that holistic computation is not unique to connec-
tionist systems, although connectionist systems may be well suited to exploiting it. If an array of
N elements is permuted on a parallel machine with N or more processors by assigning a processor
to move each element from its old position to its new position, then a holistic computation can be
said to have been performed on the array. The connectionists' innovation is to nd ways of repre-
senting compositional structures in a manner which allows holistic computation to be performed
on them. It may well be worth investigating whether any non-connectionist representations might
support holistic computation.
4.1.2 Investigating the use of SRAAMs for holistic computation
Much of the work on holistic computation thus far involves using RAAM representations. RAAMs
are noted for the complexity and diculty in training and Kwasny & Kalman [7] have suggested
that sequential RAAMs or SRAAMs may be easier to use. They reported that by using a simple
modication of the training method for a simple recurrent network and representing trees as
sequences of symbols, they could train SRAAMs to encode and decode a set of parse trees more
quickly and more easily than is normal with RAAMs. Their preliminary investigation into holistic
computation with such representations produced mixed results, thus the question of their suitability
for holistic computation is raised. I am currently investigating whether the unication of feature
structures could be performed, holistically, using SRAAM representations of the feature structures.
Holistic term unication has already been performed successfully [18] using a representation derived
from HRRs, and a standard feed-forward network to perform the unication itself so it is certainly
possible with hand created representations. My work extends this to the more complicated task of
feature-structure unication and the use of a representation that has been learned. If this could be
done, this would have applications with unication-based parsers. This work will indicate whether
serious applications of holistic computation can be performed using standard simple recurrent
and feed-forward networks and standard training techniques (e.g. conjugate gradient or back-
propagation), which not only learn to perform holistic unication but learn how to represent
the structures. It will also involve an investigation of the types of representations that SRAAMs
develop. The results of this work, whether positive or negative, will go some way towards indicating
how much needs to be done for eective exploitation of holistic computation.
4.1.3 My work and the new view of KR
Taken together, these two aspects of my current work will contribute to the investigations of
how holistic computation can be exploited: the rst by making it clearer exactly what holistic
computation is and therefore what one is trying to achieve when exploiting it; the second by
examining the suitability of some specic techniques for doing so. With respect to the new view
of knowledge representation being developed, the new denition of holistic computation opens
up a clearer view of what forms of representation can support holistic computation. All that
appears to be required is a xed length representation, which makes the information relevant
to a particular task directly available (i.e. without search) to the processes operating on it. If
these conditions are satised, then the task may be performed holistically (at least with a parallel
machine). Finally the second aspect of my work contributes to the investigation into the forms
of specically connectionist representation that are best suited to exploiting holistic computation
and whether such forms can be learned. This line of work is of considerable importance to the
development of connectionism. If methods of representation can be developed which allow eective
exploitation of holistic computation within connectionist systems, then not only will such systems
provide examples of how to achieve symbol processing in neural architectures, but they will also
provide more ecient methods of symbol processing. It may turn out that instead of replacing
symbolic AI, connectionism will rene symbolic methods.
6
Connectionism has not (yet) been the profound breakthrough that some might have hoped for.
However it has made genuine contributions to AI generally, and has established itself as serious sub-
discipline with AI. Amongst the contributions that it has made, I believe that the emergence of the
new view of knowledge representation described above will turn out to be the most important and
enduring of the contributions it has made. By demonstrating that the traditional view of knowledge
representation can be extended, connectionism has opened up the possibility of developing new
forms of representation with dierent properties to the traditional compositional representations,
such as the representational techniques described in this paper. Even if connectionism achieves
nothing else, this will have made it worthwhile, and perhaps the next \
ow" in AI will occur as
methods of exploiting holistic computation eectively are developed.
Acknowledgements
I would like to thank my PhD supervisor, Peter Hancox, for his advice during the writing of this
paper.
References
[1] Imre L. Balogh. An analysis of a connectionist internal representation: Do RAAM networks
produce truly distributed representations? PhD thesis, New Mexico State University, Las
Cruces, NM, 1994.
[2] A. Browne and J.Pilkington. Unication using a distributed representation. SIGART Bulletin,
5(1):33{42, 1994.
[3] D. J. Chalmers. Syntactic transformations on distributed representations. Connection Science,
2(1{2):53{62, 1990. Reprinted in [14], pages 46{55.
[4] L. Chrisman. Learning recursive distributed representations for holistic computation. Con-
nection Science, 3(4):345{366, 1991.
[5] J. A. Fodor and Z. W. Pylyshyn. Connectionism and cognitive architecture: a critical analysis.
Cognition, 28(1{2):3{71, 1988.
[6] J.A. Hammerton. Holistic computation: Reconstructing a muddled concept. Connection
Science, To appear, 1997.
[7] S. C. Kwasny and B. L. Kalman. Tail-recursive distributed representations and simple recur-
rent networks. Connection Science, 7(1):61{80, 1995.
[8] L. Niklasson and N.E. Sharkey. Systematicity and generalization in compositional connection-
ist representations. In G. Dorner, editor, Neural Networks and a New Articial Intelligence,
pages 217{232. Thomson Computer Press, London, UK, 1997.
[9] T. A. Plate. Holographic Reduced Representations: Convolution algebra for compositional
distributed representations. In J. Mylopoulos and R. Reiter, editors, Proceedings of the 12th
International Joint Conference on Articial Intelligence, Sydney, Australia, August 1991,
pages 30{35, San Mateo, CA, 1991. Morgan Kauman. Reprinted in Mehra P. and Wah B.W.
(Eds). Articial Neural Networks: Concepts and Theory, Los Alamitos, CA, IEEE Computer
Society Press,1992.
[10] T. A. Plate. Distributed Representations and Nested Compositional Structure. PhD thesis,
University of Toronto, 1994.
7
1990.
[12] D. Rumelhart and J. McClelland. Parallel Distributed Processing, volume 1 & 2. MIT Press,
Cambridge, MA, 1986.
[13] S. J. Russell and P. Norvig. Articial Intelligence: A Modern Approach. Prentice Hall Inter-
national, New Jersey, USA, 1995.
[14] N. E. Sharkey, editor. Connectionist Natural Language Processing:Readings from Connection
Science. Intellect, Oxford, UK, 1992.
[15] D. S. Touretzky. BoltzCONS: Dynamic symbol structures in a connectionist network. Articial
Intelligence, 46(1{2):5{46, 1990.
[16] T. van Gelder. Compositionality: A connectionist variation on a classical theme. Cognitive
Science, 14:335{364, 1990.
[17] T. van Gelder. What is the \D" in PDP? A survey of the concept of distribution. In W. Ram-
sey, S.P. Stich, and D. E. Rumelhart, editors, Philosophy and Connectionist Theory, pages
33{60. Lawrence Erlbaum Associates, Hillsdale, NJ, 1991.
[18] V. Weber. Connectionist unication with a distributed representation. In Proceedings of
the International Joint Conference on Neural Networks { IJCNN '92, Beijing, China, pages
555{560, Piscataway, NJ, 1992. IEEE.
8
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


