Associating Clinical Archetypes Through UMLS Metathesaurus Term Clusters.
Journal of Medical Systems (2010)
- ISSN: 01485598
- DOI: 10.1007/s10916-010-9586-9
- PubMed: 20827566
Available from Journal of Medical Systems
or
Abstract
Clinical archetypes are modular definitions of clinical data, expressed using standard or open constraint-based data models as the CEN EN13606 and openEHR. There is an increasing archetype specification activity that raises the need for techniques to associate archetypes to support better management and user navigation in archetype repositories. This paper reports on a computational technique to generate tentative archetype associations by mapping them through term clusters obtained from the UMLS Metathesaurus. The terms are used to build a bipartite graph model and graph connectivity measures can be used for deriving associations.
Author-supplied keywords
Available from Journal of Medical Systems
Page 1
Associating Clinical Archetypes Through UMLS Metathesaurus Term Clusters.
J Med Syst
DOI 10.1007/s10916-010-9586-9
ORIGINAL PAPER
Associating Clinical Archetypes Through UMLS
Metathesaurus Term Clusters
Leonardo Lezcano · Salvador Sánchez-Alonso ·
Miguel-Angel Sicilia
Received: 29 May 2010 / Accepted: 24 August 2010
© Springer Science+Business Media, LLC 2010
Abstract Clinical archetypes are modular definitions
of clinical data, expressed using standard or open
constraint-based data models as the CEN EN13606
and openEHR. There is an increasing archetype speci-
fication activity that raises the need for techniques to
associate archetypes to support better management and
user navigation in archetype repositories. This paper
reports on a computational technique to generate ten-
tative archetype associations by mapping them through
term clusters obtained from the UMLS Metathesaurus.
The terms are used to build a bipartite graph model and
graph connectivity measures can be used for deriving
associations.
Keywords Clinical archetypes · UMLS · Graphs
Introduction
The archetypes philosophy is a two-level approach to
address the lack of interoperability between health
L. Lezcano · S. Sánchez-Alonso · M.-A. Sicilia (B)
Information Engineering Research Unit,
Computer Science Department,
University of Alcalá,
Alcalá, Spain
e-mail: msicilia@uah.es
L. Lezcano
e-mail: leonardo.lezcano@uah.es
S. Sánchez-Alonso
e-mail: salvador.sanchez@uah.es
information systems. Under the openEHR1 two-level
model, a stable reference information model consti-
tutes the first level of modeling, while formal defi-
nitions of clinical content in the form of archetypes
constitute the second. Only the first level (the Reference
Model) is implemented in software, significantly re-
ducing the dependency of deployed systems and data
on variable content definitions. The only other parts
of the model universe implemented in software are
highly stable languages/models of representation. As
a consequence, systems have the possibility of being
far smaller and more maintainable than single-level
systems. The openEHR approach as well as concepts
like interoperability, two-level modeling and the formal
language for the distributed definition of archetypes
(ADL) are explained in the “Background” section.
Such environment allows archetypes to be devel-
oped by disparate groups that work independently
and that eventually publish their results in archetypes
repositories. Nevertheless, as this approach is becom-
ing widely accepted, it is certain that the number of
available archetypes will become very large and hard to
manage. Besides, while one of the greatest advantages
of two-level modeling is the development of archetype
definitions as a decentralized process, it is exposed to
content overlapping and limits the normalization scope.
This paper addresses how to provide a better inte-
gration and management of archetypes by perform-
ing their semantic classification and clustering. Such
framework could then support navigation across the
archetypes repositories by providing similarities be-
tween definitions. To accomplish those tasks, this
1http://www.openehr.org/
DOI 10.1007/s10916-010-9586-9
ORIGINAL PAPER
Associating Clinical Archetypes Through UMLS
Metathesaurus Term Clusters
Leonardo Lezcano · Salvador Sánchez-Alonso ·
Miguel-Angel Sicilia
Received: 29 May 2010 / Accepted: 24 August 2010
© Springer Science+Business Media, LLC 2010
Abstract Clinical archetypes are modular definitions
of clinical data, expressed using standard or open
constraint-based data models as the CEN EN13606
and openEHR. There is an increasing archetype speci-
fication activity that raises the need for techniques to
associate archetypes to support better management and
user navigation in archetype repositories. This paper
reports on a computational technique to generate ten-
tative archetype associations by mapping them through
term clusters obtained from the UMLS Metathesaurus.
The terms are used to build a bipartite graph model and
graph connectivity measures can be used for deriving
associations.
Keywords Clinical archetypes · UMLS · Graphs
Introduction
The archetypes philosophy is a two-level approach to
address the lack of interoperability between health
L. Lezcano · S. Sánchez-Alonso · M.-A. Sicilia (B)
Information Engineering Research Unit,
Computer Science Department,
University of Alcalá,
Alcalá, Spain
e-mail: msicilia@uah.es
L. Lezcano
e-mail: leonardo.lezcano@uah.es
S. Sánchez-Alonso
e-mail: salvador.sanchez@uah.es
information systems. Under the openEHR1 two-level
model, a stable reference information model consti-
tutes the first level of modeling, while formal defi-
nitions of clinical content in the form of archetypes
constitute the second. Only the first level (the Reference
Model) is implemented in software, significantly re-
ducing the dependency of deployed systems and data
on variable content definitions. The only other parts
of the model universe implemented in software are
highly stable languages/models of representation. As
a consequence, systems have the possibility of being
far smaller and more maintainable than single-level
systems. The openEHR approach as well as concepts
like interoperability, two-level modeling and the formal
language for the distributed definition of archetypes
(ADL) are explained in the “Background” section.
Such environment allows archetypes to be devel-
oped by disparate groups that work independently
and that eventually publish their results in archetypes
repositories. Nevertheless, as this approach is becom-
ing widely accepted, it is certain that the number of
available archetypes will become very large and hard to
manage. Besides, while one of the greatest advantages
of two-level modeling is the development of archetype
definitions as a decentralized process, it is exposed to
content overlapping and limits the normalization scope.
This paper addresses how to provide a better inte-
gration and management of archetypes by perform-
ing their semantic classification and clustering. Such
framework could then support navigation across the
archetypes repositories by providing similarities be-
tween definitions. To accomplish those tasks, this
1http://www.openehr.org/
Page 2
J Med Syst
research reports on a computational approach that
allows ADL definitions to be categorized with the
existing archetypes by means of the Unified Medical
Language System.2 The UMLS has been used for
many different purposes [1–5] and its structure is also
described in the Background section.
Along with the theoretical issues, this paper de-
scribes a first case study that has been build upon
40 archetype samples. The ADL instances were
taken from the openEHR repository that contains
quite known clinical statements like the Health Rate
OBSERVATION,3 the Pregnancy EVALUATION and the
Transfusion ACTION. The case study outcome reveals
uncovered areas inside the inputted archetypes domain
as well as saturated fields. The semantic feature of the
obtained associations goes beyond the static classifi-
cation provided by the Reference Model, connecting
archetypes which are subsumed by different Reference
Model entries. The case study has also detected isolated
archetypes like the Intravenous f luid administration.
The rest of this paper is structured as follows.
The next section provides a review of related re-
searches, including similarities and differences with
this approach. Then Section “Background” gives a
background overview, introducing the involved stan-
dards and technologies like the ADL and the UMLS
Knowledge Source Server. Section “Mapping archetype
terms to UMLS concepts” describes a method to map
archetypes local terms onto UMLS term clusters.
Then, Section “Analyzing archetypes intersection” de-
scribes how archetypes cohesion can be measured
using graph techniques, especially recommending the
m-slices method. The paper continues with Section
“Working the Metathesaurus relations”, where the
associations obtained so far are enriched with Meta-
thesaurus Relations. Section “Conclusions and further
work” finishes the article with a conclusions and further
work explanation.
Related work
Qamar and Rector [6] discussed the principles and
methods that allow for semantic mapping of data in
the Archetype model, amongst others, to formal bio-
medical terminologies. The first step of the current
approach is similar to their first stage as they both
end by offering mappings between ADL terms and
terminology codes on the basis of context and non-
2http://umlsinfo.nlm.nih.gov
3Elements in the openEHR Reference Model are in Courier font
from here on.
context methods, including lexical and semantic tech-
niques. However, they considered those results as
candidate mappings that are then processed by filtering
mechanisms based on certain description logic axioms
as well as on the intervention of a clinical modeler in
order to ensure the mapping accuracy and correctness.
In this manner, [6] main goal is to assist experts during
archetype modeling, that is to say a pre-definition aid.
In contrast, our final objective is different as it is ori-
ented to find relevant intersections and similarities be-
tween already defined archetypes. It is a post-definition
approach whose input consists of already existing ADL
files. The methodology presented below relies on pat-
terns of the graph theory and network analysis tech-
niques that allow rapidly and automatically handling
large repositories of archetypes.
The Bisbal and Berry research [7] is designed to find
alignments between archetypes about the same clinical
concept, but defined in heterogeneous two-level ap-
proaches. Matching archetypes from different sources
facilitates the interoperability of the three major
players in this domain: CEN 13606, HL7’s CDA RIM
and openEHR. Therefore the objective of [7] does not
overlap with our goals either. We are instead concern
with management, categorization and semantic brows-
ing of archetypes within a given repository. Moreover,
their methods are also different as the mapping step
relies on the bindings to biomedical terminologies pro-
vided by the archetype’s ontology section. Subontolo-
gies of the biomedical terminologies are then aligned
to complete the match of archetypes.
Related work also includes the master’s thesis [8]
that elucidates methods for connecting digital publica-
tions inside the Biomedical literature. It is presented
as an automated approach to biological knowledge
discovery from PubMed abstracts that involves the sys-
tematic application of a storytelling algorithm followed
by a series of filtering and compression operations
over the mined stories. According to the author, the
automatic integration of information across multiple
publications is a key task to gaining insight into the
functioning of biological systems as a whole.
Background
This section briefly covers the underlying technologies
supporting the results described in this paper. Subsec-
tion “The openEHR RM, AOM & ADL” describes
the basics of the openEHR Reference Model, the
Archetype Object Model and the Archetype Definition
Language. Then Subsection “The Unified Medical
Language System (UMLS)” briefly presents the UMLS
research reports on a computational approach that
allows ADL definitions to be categorized with the
existing archetypes by means of the Unified Medical
Language System.2 The UMLS has been used for
many different purposes [1–5] and its structure is also
described in the Background section.
Along with the theoretical issues, this paper de-
scribes a first case study that has been build upon
40 archetype samples. The ADL instances were
taken from the openEHR repository that contains
quite known clinical statements like the Health Rate
OBSERVATION,3 the Pregnancy EVALUATION and the
Transfusion ACTION. The case study outcome reveals
uncovered areas inside the inputted archetypes domain
as well as saturated fields. The semantic feature of the
obtained associations goes beyond the static classifi-
cation provided by the Reference Model, connecting
archetypes which are subsumed by different Reference
Model entries. The case study has also detected isolated
archetypes like the Intravenous f luid administration.
The rest of this paper is structured as follows.
The next section provides a review of related re-
searches, including similarities and differences with
this approach. Then Section “Background” gives a
background overview, introducing the involved stan-
dards and technologies like the ADL and the UMLS
Knowledge Source Server. Section “Mapping archetype
terms to UMLS concepts” describes a method to map
archetypes local terms onto UMLS term clusters.
Then, Section “Analyzing archetypes intersection” de-
scribes how archetypes cohesion can be measured
using graph techniques, especially recommending the
m-slices method. The paper continues with Section
“Working the Metathesaurus relations”, where the
associations obtained so far are enriched with Meta-
thesaurus Relations. Section “Conclusions and further
work” finishes the article with a conclusions and further
work explanation.
Related work
Qamar and Rector [6] discussed the principles and
methods that allow for semantic mapping of data in
the Archetype model, amongst others, to formal bio-
medical terminologies. The first step of the current
approach is similar to their first stage as they both
end by offering mappings between ADL terms and
terminology codes on the basis of context and non-
2http://umlsinfo.nlm.nih.gov
3Elements in the openEHR Reference Model are in Courier font
from here on.
context methods, including lexical and semantic tech-
niques. However, they considered those results as
candidate mappings that are then processed by filtering
mechanisms based on certain description logic axioms
as well as on the intervention of a clinical modeler in
order to ensure the mapping accuracy and correctness.
In this manner, [6] main goal is to assist experts during
archetype modeling, that is to say a pre-definition aid.
In contrast, our final objective is different as it is ori-
ented to find relevant intersections and similarities be-
tween already defined archetypes. It is a post-definition
approach whose input consists of already existing ADL
files. The methodology presented below relies on pat-
terns of the graph theory and network analysis tech-
niques that allow rapidly and automatically handling
large repositories of archetypes.
The Bisbal and Berry research [7] is designed to find
alignments between archetypes about the same clinical
concept, but defined in heterogeneous two-level ap-
proaches. Matching archetypes from different sources
facilitates the interoperability of the three major
players in this domain: CEN 13606, HL7’s CDA RIM
and openEHR. Therefore the objective of [7] does not
overlap with our goals either. We are instead concern
with management, categorization and semantic brows-
ing of archetypes within a given repository. Moreover,
their methods are also different as the mapping step
relies on the bindings to biomedical terminologies pro-
vided by the archetype’s ontology section. Subontolo-
gies of the biomedical terminologies are then aligned
to complete the match of archetypes.
Related work also includes the master’s thesis [8]
that elucidates methods for connecting digital publica-
tions inside the Biomedical literature. It is presented
as an automated approach to biological knowledge
discovery from PubMed abstracts that involves the sys-
tematic application of a storytelling algorithm followed
by a series of filtering and compression operations
over the mined stories. According to the author, the
automatic integration of information across multiple
publications is a key task to gaining insight into the
functioning of biological systems as a whole.
Background
This section briefly covers the underlying technologies
supporting the results described in this paper. Subsec-
tion “The openEHR RM, AOM & ADL” describes
the basics of the openEHR Reference Model, the
Archetype Object Model and the Archetype Definition
Language. Then Subsection “The Unified Medical
Language System (UMLS)” briefly presents the UMLS
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
3 Readers on Mendeley
by Discipline
by Academic Status
100% Ph.D. Student
by Country
33% Netherlands
33% Ireland
33% Spain


