Developing annotation solutions for online Data Driven Learning
- ISSN: 09583440
- DOI: 10.1017/S0958344009000093
Abstract
Although annotation is a widely-researched topic in Corpus Linguistics (CL), its potential role in Data Driven Learning (DDL) has not been addressed in depth by Foreign Language Teaching (FLT) practitioners. Furthermore, most of the research in the use of DDL methods pays little attention to annotation in the design and implementation of corpus-based/driven language teaching.In this paper, we set out to examine the process of development of SACODEYL Annotator, an application that seeks to assist SACODEYL system users in annotating XML multilingual corpora. First, we discuss the role of annotation in DDL and the dominating paradigm in general corpus applications. In the context of the language classroom, we argue that it is essential that corpora should be pedagogically motivated (Braun, 2005 and 2007a). Then, we move on to deal with the analysis and design stages of our annotation solution by illustrating its main features. Some of these include a user friendly hierarchical and extensible taxonomy tree to facilitate the learner-oriented annotation of the corpora; real-time graphics representation of the annotated corpus matching the XML TEI-compliant (Text Encoding Initiative) standard, as well as an intuitive management of the different data sections and associated metadata.SACODEYL (System Aided Compilation and Open Distribution of European Youth Language) is an EU funded MINERVA project which aims to develop an ICT-based system for the assisted compilation and open distribution of multimedia European teen talk in the context of language education. This research lays emphasis on the functionalities of the application within the SACODEYL context. However, our paper addresses similarly the needs of potential multimedia language corpus administrators in general on the lookout for powerful annotation assisting software. SACODEYL Annotator is free to use and can be downloaded from our website.
Author-supplied keywords
Developing annotation solutions for online Data Driven Learning
like WordSmith or Monoconc or with online resources such as Spaceless.1 Other
proposals use annotated corpora for language classroom activities. However, the
very notion of annotation has remained bound to the linguistic research paradigm,
which has led teachers to identify corpus applications in DDL with the exploitation
of grammatical features and text genres. The usefulness of this approach is undis-
putable, but we were concerned by the fact that the efforts to implement DDL were
mediated by a dominating morphosyntactic approach to language data. This
becomes even more striking when one reviews mainstream publications in the field of
foreign language teaching (FLT) and comes to realize that the morphology or the
syntax of a language is driving pedagogy in none of them. So the issue is why Data in
DDL should be restricted to morphological tagging. The challenge would be to try
and turn Data into a pedagogical construct, just like any other CALL or published
material.
In this paper we present the analytical framework that guided our research in
providing annotation solutions for the implementation of the SACODEYL system,
an effort to deliver the actual voices of European young people online. In particular,
it is devoted to the design of the annotation application that has been developed for
SACODEYL: SACODEYL Annotator.2 The role of pedagogic annotation is central
here as the resulting annotation will condition and shape the learning experiences of
learners using our system. For reasons of space, areas of the system other than the
annotation solution cannot be fully discussed here.
2 Annotation in corpus linguistics
Annotation in the context of linguistics can be seen as both the process and the
resulting product of adding information to electronic texts. In the field of corpus
linguistics (CL), annotation is primarily conceived of as an add-on, a way to describe
the information that matches the needs of language researchers, linguists or corpus
users. In this way, Leech (1991) equates annotation with analysis, while McEnery
and Wilson (1996) describe corpus annotation in terms of processing. Within
this scheme, the whole point of annotating corpora lies in the fact that annotation
allows corpus users both refined information retrieval capabilities and subsequent
treatment of the data. Once captured, this information may serve a very wide array
of purposes, from language description to natural language processing.
There exist different approaches to the conceptualization of the annotation pro-
cess. Annotation can be (a) automatic, semi-automatic or manual, depending on
the degree of human intervention in the process, or (b) it can be done by one single
annotator or a group of annotators. In either case, its raison d’eˆtre inevitably
(c) reflects the different nature of the ultimate aim of the meta-information being
added to the corpus. As an illustration, annotators’ needs may range from an interest
1 Available at http://www.spaceless.com/concordancer.php; Sabine Braun offers an excellent
selection of tools and resources at http://www.corpora4learning.net/
2 SACODEYL Annotator is free to use and can be downloaded from http://www.um.es/
sacodeyl
56 P. Pe´rez-Paredes and J. M. Alcaraz-Calero
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


