Spanish treebank annotation of informal non-standard web text

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper presents the Latin American Spanish Discussion Forum Treebank (LAS-DisFo). This corpus consists of 50,291 words and 2,846 sentences that are part-of-speech tagged, lemmatized and syntactically annotated with constituents and functions. We describe how it was built and the methodology followed for its annotation, the annotation scheme and criteria applied for dealing with the most problematic phenomena commonly encountered in this kind of informal unedited web text. This is the first available Latin American Spanish corpus of non-standard language that has been morphologically and syntactically annotated. It is a valuable linguistic resource that can be used for the training and evaluation of parsers and PoS taggers.

Cite

CITATION STYLE

APA

Taulé, M., Martí, M. A., Bies, A., Nofre, M., Garí, A., Song, Z., … Ellis, J. (2015). Spanish treebank annotation of informal non-standard web text. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9396, pp. 15–27). Springer Verlag. https://doi.org/10.1007/978-3-319-24800-4_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free