TEI and LMF crosswalks

  • Romary L
N/ACitations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

The present paper explores various arguments in favour of making the Text Encoding Initiative (TEI) guidelines an appropriate serialisation for ISO standard 24613:2008 (LMF, Lexical Markup Framework) 2. It also identifies the issues that would have to be resolved in order to reach an appropriate implementation of these ideas, in particular in terms of infor-mational coverage. We show how the customisation facilities offered by the TEI guidelines can provide an adequate background, not only to cover missing components within the current Dictionary chapter of the TEI guidelines, but also to allow specific lexical projects to deal with local constraints. We expect this proposal to be a basis for a future ISO project in the context of the on going revision of LMF. Since this paper adopts the specific viewpoint of the TEI guidelines, no precise description of LMF is made here. For an introduction to LMF, see section 4 of (ROMARY 2013). 1 Towards a more intimate relationship between the TEI and the LMF standards This chapter is about a simple thesis: the TEI framework could be the optimal serialisation 3 background for the LMF standard, since it provides both an ideal XML specification platform and a representation vocabulary that can be easily tuned (or customized) to cover the various LMF packages and components. This thesis does not come out of the blue but arises naturally when one observes the history of both initiatives, and their current impacts in various communities in the humanities and in computational linguistics, but also when one ponders on the relevance of having an LMF-specific serialisation when lexical data are in essence to be interconnected with various other types of linguistic resources. As a matter of fact, the current XML serialisation of LMF suffers from both generic and specific problems that have prevented it from being widely used by the various communities interested in digital lexical resources. Right from the onset, the lack of consensus on the strategy to define a reliable and stable XML serialisation has forced the ISO working group on LMF to confine it to an informative annex, with the following main shortcomings: Being carved in stone within the ISO standard, rather than being pointed to as an external and stable online resource, prevents it from being properly maintained, in order to either make corrections on identified weak points or bugs, or to add additional features; It is only defined as a DTD, a vestigial XML schema language that hardly any XML developer currently uses anymore and which deeply limits its capacity to express constraints on types or to factorise global attributes. For the sake of simplicity (and this can be easily understood when one has to finalise a text for an ISO standard) no parallel definition of a RelaxNG or W3C schema was provided; It does not reflect the intrinsic extensibility of LMF, as it does not contain any dedicated mechanism for customization, for instance when the developer of a new lexical model would like to discard some packages or add her own extensions;

Cite

CITATION STYLE

APA

Romary, L. (2015). TEI and LMF crosswalks. Journal for Language Technology and Computational Linguistics, 30(1), 47–70. https://doi.org/10.21248/jlcl.30.2015.195

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free