Sign up & Download
Sign in

Environmental knowledge in EcoLexicon

by Pilar León Araúz, Arianne Reimerink, Pamela Faber
Proceedings of the Computational Linguistics Applications conferece (2011)

Abstract

EcoLexicon is a multilingual terminological knowl- edge base (TKB) on the environment that targets different user groups who wish to expand their knowledge of the environment for the purpose of text comprehension and/or generation. Users can freely access EcoLexicon, and are able to find the information needed, thanks to a user-friendly visual interface with different modules for conceptual, linguistic, and graphical data. The main goal of this TKB is user knowledge acquisition. This paper briefly explains the theoretical premises and methodology applied in EcoLexicon for knowledge extraction and representation. It also shows how environmental concepts are represented, interrelated, and contextualized. EcoLexicon combines the advantages of a relational database, allowing for a quick deployment and feeding of the platform, and an ontology, enhancing user queries. The internal coherence at all levels of a dynamic knowledge representation shows that even complex domains can be represented in a user-friendly way.

Cite this document (BETA)

Available from lexicon.ugr.es
Page 1
hidden

Environmental knowledge in EcoLexicon

Environmental knowledge in EcoLexicon
Pilar Leo´n Arau´z, Arianne Reimerink and Pamela Faber
Department of Translation and Interpreting, University of Granada
Buensuceso 11, 18002, Granada, Spain
Email: {pleon, arianne, pfaber}@ugr.es
Abstract—EcoLexicon is a multilingual terminological knowl-
edge base (TKB) on the environment that targets different user
groups who wish to expand their knowledge of the environment
for the purpose of text comprehension and/or generation. Users
can freely access EcoLexicon, and are able to find the information
needed, thanks to a user-friendly visual interface with different
modules for conceptual, linguistic, and graphical data. The main
goal of this TKB is user knowledge acquisition. This paper briefly
explains the theoretical premises and methodology applied in
EcoLexicon for knowledge extraction and representation. It also
shows how environmental concepts are represented, interrelated,
and contextualized. EcoLexicon combines the advantages of a
relational database, allowing for a quick deployment and feeding
of the platform, and an ontology, enhancing user queries. The
internal coherence at all levels of a dynamic knowledge repre-
sentation shows that even complex domains can be represented
in a user-friendly way.
I. INTRODUCTION
ECOLEXICON1 is a multilingual terminological knowl-edge base (TKB) on the environment. The knowledge
base was initially implemented in Spanish, English and Ger-
man. Currently, three more languages are being added: Modern
Greek, Russian and Dutch. So far it has 3,250 concepts
and 14,550 terms. It targets different user groups, such as
translators, technical writers, environmental experts, etc., who
wish to expand their knowledge of the environment for the
purpose of text comprehension or generation. These users can
freely access EcoLexicon, and are able to find the information
needed, thanks to a user-friendly visual interface with different
modules for conceptual, linguistic, and graphical data. The
main and ultimate goal of EcoLexicon is user knowledge
acquisition, which can only be achieved if TKBs account for
the natural dynamism of knowledge mainly caused by context
and multidimensionality.
II. THEORETICAL UNDERPINNINGS OF ECOLEXICON
EcoLexicon is primarily based on theoretical and method-
ological premises derived from cognitive linguistics and corpus
linguistics. Context and situated cognition are the seman-
tic foundations of our knowledge representation framework,
whereas corpus analysis guides our knowledge extraction
procedures.
A. Knowledge extraction
According to corpus-based studies, when a term is studied
in its linguistic context, information about its meaning and
1http://ecolexicon.ugr.es
its use can be extracted [1]–[3]. For EcoLexicon, two corpora
were created: a textual corpus and a visual corpus. The En-
glish textual corpus (5 million words) consists of specialized
texts (e.g., scientific journal articles, PhD theses, etc.), semi-
specialized texts (textbooks, manuals, etc.), and texts for the
general public, all belonging to the multidisciplinary domain of
the environment. The visual corpus consists of images selected
according to the following criteria: iconicity, abstraction, and
dynamism as ways of referring to and representing specific
attributes of specialized concepts. Images were classified in
terms of the morphological features described by Marsh and
White regarding the functional relationship between images
and texts [4].
The extraction of conceptual knowledge from the textual
corpus combines manual direct term searches and knowl-
edge pattern analysis. According to many research studies,
knowlede patterns (KPs) are considered to be one of the
most reliable methods for knowledge extraction [5]–[9]. This
involves several complementary steps. Normally, the most
recurrent knowledge patterns for each conceptual relation
identified in previous research are used to find related term
pairs [10], [11]. Afterwards, these terms become seed words
that are used for direct term searches to find new KPs and
relations. The methodology consists of the cyclic repetition of
both procedures. Although previous studies propose a semi-
automatized annotation-based approach, first of all certain
selection criteria must be defined by manually identifying what
information is useful, why it is useful, and how it is structured.
Conceptual concordances of EROSION show how different
KPs convey different relations with other specialized concepts.
The main relations reflected in EROSION concordances are
caused by, affects, has location, and has result, which high-
light the procedural nature of the concept and the important
role played by non-hierarchical relations in knowledge repre-
sentations.
In Figure 1, EROSION is related to various kinds of agents,
such as STORM SURGE (1, 7), WAVE ACTION (2, 13), RAIN
(3), WIND (4), JETTY (5), CONSTRUCTION PROJECTS (6),
MANGROVE REMOVAL (8), SURFACE RUNOFF (9), FLOOD
(10), HUMAN-INDUCED FACTORS (11), STORM (12) and ME-
ANDERING CHANNELS (14). They can be retrieved thanks to
all KPs expressing the relation caused by, such as resultant
(1), agent for (2, 3), due to (6, 7), responsible for (11) and
lead to (13). This relation can also be conveyed through
compound adjective phrases, such as flood-induced (10) or
storm-caused (12) and any expression containing cause as a
Proceedings of the Computational
Linguistics-Applications Conference pp. 9–16 ISBN 978-83-60810-47-7
9
Page 2
hidden
Fig. 1. Non-hierarchical relations associated with EROSION
verb or noun: one of the causes of (9), cause (4, 5, 8) and
caused by (14).
EROSION is also linked to the patients it affects, such as WA-
TER (15), SEDIMENTS (16), COASTLINES (16), BEACHES (17),
BUILDINGS (18), DELTAS (19) and CLIFFS (20). However, the
affected entities, or patients, are often equivalent to locations
(eg. if EROSION affects BEACHES it actually takes place at
the BEACH). The difference lies in the kind of KPs linking
the propositions. The affects relation is often reflected by the
preposition of (10) or by verbs like threatens (18), damaged by
(17) or provides (19). In contrast, the has location relation is
conveyed through directional prepositions (around, 21; along,
22; downdrift, 23) or spatial expressions, such as takes place
(24). In this way, EROSION is linked to the following locations:
LITTORAL BARRIERS (21), COASTS (22) and STRUCTURES
(23). Result is an essential dimension in the description of
any process since it is not only initiated by an agent affecting
a patient in a particular location, but also has certain effects,
which can be the creation of a new entity (SEDIMENTS, 25;
PRIMARY COASTS, 26; BEACH MATERIAL, 27; SHORELINES,
28; MARSHES, 29; BAYS, 31) or the beginning of another
process (SEAWATER INTRUSION, 31; PROFILE STEEPENING,
32).
As can be seen, all these related concepts are quite heteroge-
neous. They belong to different paradigms in terms of category
membership and/or hierarchical range. For instance, some of
the agents of EROSION are natural (WIND, WAVE ACTION) or
artificial (JETTY, MANGROVE REMOVAL) and others are gen-
eral concepts (STORM) or very specific ones (MEANDERING
CHANNEL). This explains why knowledge extraction must still
be performed manually. Nevertheless, it also illustrates one
of the major problems in knowledge representation: multidi-
mensionality [12]. This is better exemplified in the following
concordances (Figure 2), since multidimensionality is most
often codified in the is a relation.
In the scientific discourse community, concepts are not
always described in the same way because they depend
on perspective and subject-fields. For instance, EROSION is
described as a natural process of REMOVAL (33), a GEO-
MORPHOLOGICAL PROCESS (34), a COASTAL PROCESS (35)
or a STORMWATER IMPACT (36). The first two cases can be
regarded as traditional ontological hyperonyms. The choice of
one or the other depends on the upper-level structure of the
representational system and its level of abstraction. However,
COASTAL PROCESS and STORMWATER IMPACT frame the
concept in more concrete subject-fields and referential settings.
The same applies to subtypes, where the multidimensional
nature of EROSION is clearly shown. EROSION can thus
be classified according to the dimensions of result (SHEET,
RILL, GULLY, 37; DIFFERENTIAL EROSION, 38), direction
(LATERAL, 39; HEADWARD EROSION, 49), agent (WAVE, 41;
FLUVIAL, 42; WIND, 43, 46; WATER, 44; GLACIAL EROSION;
45) and patient (SEDIMENT, 47; DUNE, 48; SHORELINE
EROSION, 49). In section III, the consequences of multidi-
mensionality for knowledge representation are shown.
B. Knowledge representation
According to Meyer et al. [13], TKBs should reflect con-
ceptual structures similarly to how concepts are related in the
mind. The organization of semantic information in the brain
should thus underlie any theoretical assumption concerning the
10 PROCEEDINGS OF THE COMPUTATIONAL LINGUISTICS-APPLICATIONS CONFERENCE. JACHRANKA, 2011

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

1 Reader on Mendeley
by Discipline
 
by Academic Status
 
100% Ph.D. Student
by Country
 
100% Spain

Groups

LexiCon