A fuzzy ontology for medical document retrieval
- ISSN: 15627020
Abstract
Ontologies represent a method of formally expressing a shared understanding of information, and have been seen by many authors as a prerequisite for the ``Semantic web''. A mapping between query terms and members of an ontology is usually a key part of any ontology enhanced searching tool. However the relative importance of a particular mapping to an overloaded term may be different for different users, and this information is vital for accurate satisfaction of a query. One way of overcoming this problem is the postulation of a ``fuzzy ontology''. By adding a value for degree of membership to each term that is ``overloaded'', for each user or group of users then the recovered documents from ontology mediated search can reflect the likely information need. The author will discuss means of ontology fuzzification, by both analysis of a corpus of documents and the use of a relevance feedback mechanism and some possible extensions to this scheme.
Author-supplied keywords
A fuzzy ontology for medical document retrieval
David Parry
School of Computer and Information Sciences
Auckland University of Technology
Private Bag 92006 Auckland 1020 New Zealand
Dave.parry@aut.ac.nz
Abstract
Ontologies represent a method of formally expressing a
shared understanding of information, and have been seen
by many authors as a prerequisite for the “Semantic
web”. A mapping between query terms and members of
an ontology is usually a key part of any ontology
enhanced searching tool. However the relative
importance of a particular mapping to an overloaded term
may be different for different users, and this information
is vital for accurate satisfaction of a query.
One way of overcoming this problem is the postulation of
a “fuzzy ontology”. By adding a value for degree of
membership to each term that is “overloaded”, for each
user or group of users then the recovered documents from
ontology mediated search can reflect the likely
information need. The author will discuss means of
ontology fuzzification, by both analysis of a corpus of
documents and the use of a relevance feedback
mechanism and some possible extensions to this scheme.
Keywords: Ontology, Ontology Combination, Fuzzy
Logic, Information retrieval.
1 Introduction
“When I use a word,' Humpty Dumpty said, in a rather
scornful tone,' it means just what I choose it to mean,
neither more nor less”(Carroll 1872).
In Computer Science terms an ontology is used to
“express formally a shared understanding of information”
(Noy, Sintek et al. 2001). There has been a great deal of
interest recently in the construction of ontologies for
representing medical knowledge, in deed the construction
and use of ontologies has been described as the main task
of medical informatics (Musen 2001). A number of
Ontologies - such as the Unified Medical language
system (UMLS) and its component parts such as the
Medical Subject Heading (MeSH) and the Semantic
Network already exist in the medical domain. The use,
reuse and sharing of information between ontologies, is
especially important in the medical field with the growth
of evidence-based medicine(Moody and G. 1999), and
the consequent requirement for appropriate information to
be available to clinicians and patients(Grutter, Eikemeier
et al. 2001).
Currently ontologies are seen as one of the key
technologies involved in the “Semantic web”(Berners-
Lee, Hendler et al. 2001), and representations in
dedicated formats such as Protégé and in particular
implementations of XML documents have been
constructed. Communication and merging of ontologies
remains problematic however, although there have been
attempts to solve this problem – for example the SMART
system (Noy and Musen 1999), which is related to the
PROTÉGÉ ontology development and validation
tool(Musen, Gennari et al. 1995). In particular there is a
pressing need to be able to use multiple ontologies in
order to relate knowledge stored in references sources
with data collected from clinical records for example.
However the constructor of an ontology is faced with an
essential paradox – by increasing the suitability of an
ontology for a particular part of a domain, the coverage of
the ontology decreases and its use as a communication
tool decreases as the potential audience becomes more
specialised. At the limit, an ontology that perfectly
expresses one persons understanding of the world is
useless for anyone else with a different view of the world.
Communication between ontologies is necessary to avoid
this type of solipsism1.
Fuzzy set theory has been extensively used in the context
of information retrieval (Bordogna and Pasi 2000). A
number of different schemes have been devised to
implement fuzzy logic in IR. This work has covered such
concepts as fuzzy construction of queries, the retrieval of
fuzzy sets of documents and fuzzy relevance measures.
However this has not been combined with the use of an
ontology although in (Widyantoro 2001) the term fuzzy
ontology is introduced in terms of the use of the fuzzy
combination of query terms.
As previously stated, the UMLS MeSH ontology is a
particularly useful one for the medical domain. However,
of the 21836 terms within it, 10072 appear in more than
one place. Thus the “overloading” of terms in a mature
ontology can be seen to be significant. In addition, as
ontologies are extended – for example outside their home
domain this problem is likely to get worse.
In their comprehensive review of ontology roles and
structures, (Lassila and McGuinness 2001), describe a
spectrum of structures from a catalogue, to a full ontology
with extremely complicated relations between the terms.
The UMLS represents a number of points on this
spectrum, for a fairly simple hierarchy represented by the
MeSH tree, to the rich relationships embodied in the
Metathesaurus. One of the drawbacks of the
Metathesaurus is that because it is effectively a
Copyright © 2004, Australian Computer Society, Inc. This
paper appeared at The Australasian Workshop on
DataMining and Web Intelligence (DMWI2004), Dunedin.
Conferences in Research and Practice in Information
Technology, Vol. 32.. Reproduction for academic, not-for profit
purposes permitted provided this text is included..
121
other systems, the exact relationships within the parent
systems cannot always be replicated within it.
The aim of this work is to suggest that a fruitful approach
to the reuse, and generalisability of ontologies may be
made by the introduction of the concept of a “Fuzzy
ontology”. In this paper the ontology described is a
modification of the MeSH hierarchy, which is much
simpler than many of the other ontologies described by
(Noy and McGuinness 2001). However some future
extensions to this work are raised in section 5. - for
example Section 2 outlines the nature of the fuzzy
ontology; Section 3 gives some indications of ways of
assigning membership values within ontology. Section 4
describes the current implementation of a system to
support FuzzOnt for information retrieval and Section 5
describes current and future experimental work.
This work has been performed in the context of the
development of an intelligent searching system for
finding useful medical information(Parry 2001).
One of the issues that arise in ontology construction is
why they are needed. Often Semantic Web research
focuses on the advantages of having ontologies that
artificial agents can use to enhance searching, and
reconciling differences between the interpretation of the
ontologies used by different documents (Stephens and
Huhns 2001). However, this work is focused on
attempting to represent the ontology of the users, and
particular concentrating on what a user believes his or her
query means. A collective ontology is also proposed,
which is intended to allow users of a searching system to
act as human agents in a collective intelligence,
improving the performance of the system for all users by
means of a special form of fuzzy ontology updating –
described in section 2.
2 Theory of the fuzzy ontology
2.1 Fuzzy Logic
Zadeh originally introduced fuzzy logic in 1965 (Zadeh
1965), in the context of set theory. I use the concept of
“membership” as an attribute of an item within an
ontology. By use of the membership value, a fuzzy logic
can be used, modifying the standard Boolean logic as
used in classical information retrieval. To replace the
“AND” term the fuzzy MIN term is used and to replace
the “Or” term the fuzzy MAX term is used. Briefly, the
MAX relation involves assigning the highest membership
value found in the antecedents – for example if a
document has both “ Head” and “Nose” within it and the
membership value for the “Anatomy” part of the ontology
for these terms is 0.7 and 0.9 respectively, then a query
asking for “Head AND Nose” would assign the lowest
common value, which is 0.7. However a query asking for
“Head OR Nose” would assign a value of 0.9 to the
document.
The fuzzy ontology (FuzzONT) is based on modification
of an existing crisp ontology. Currently there are
ontologies with an extremely rich set of relations between
members, for example component parts of the UMLS
have over 80 types of relations between ontology
members, ranging from the simple “is a” to such
specialised relations as “uniquely_mapped_to” and
“developmental_form_of”. By preserving these relations,
an extremely rich set of relations can and form the
framework of the ontology when beginning fuzzification.
The modification is entirely incremental, conversion to a
fuzzy ontology adds membership values to the currently
existing relations, and may also add new entries, in the
ontology. The ontology membership is normalised in
respect to each of the terms in the ontology that is the
sum of the membership value of each term in the
ontology is equal to 1. This is because it is primarily
concerned with mapping from queries to the ontology.
This is justified on the basis that that for each term in a
query, only one of the meanings will be required, and that
these meanings are exclusive.
In the vocabulary of Noy, this is a “merging” process,
rather than an alignment because the new ontology
contains both of the old ones, but is itself only one. The
advantage of this process however is that no information
is lost.. The fuzzification process is shown
diagrammatically in figure 1.
Figure 1: Membership values in a fuzzy ontology
The membership value can be assigned in one of two
ways – via a user preference assigned using a
membership function, as described in section 3.1, or
automatically as described in section 3.2.
The main objection to this scheme is that the ontology
becomes more complex. However this complexity
already exists in the case of terms that are located in
multiple positions in the ontology. Take for example the
term “Pain” which occurs in the Mesh ontology 5 times in
different trees at different levels – see table 1
Because the term is located in a number of different
places, query expansion for this term is difficult, because
there are wide numbers of “related” terms. In the case of
Pain for example, a fairly standard expansion using the
immediate parent, and the immediate “offspring” i.e.
terms below Pain in the ontology yields the following
potential expansion of 5 Parent terms + 19 Child terms,
122
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


