Annotation of lexical bundles with discourse functions in a Spanish academic corpus

2Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

This paper describes the process of annotation of 996 lexical bundles (LB) assigned to 39 different discourse functions in a Spanish academic corpus. The purpose of the annotation is to obtain a new Spanish gold-standard corpus of 1,800,000 words useful for training and evaluating computational models that are capable of identifying automatically LBs for each context in new corpora, as well as for linguistic analysis about the role of LBs in academic discourse. The annotation process revealed that correspondence between LBs and discourse functions is not biunivocal and that the degree of ambiguity is high, so linguists’ contribution has been essential for improving the automatic assignation of tags.

Cite

CITATION STYLE

APA

Guzzi, E., Ramos, M. A., Garcia, M., & Salido, M. G. (2023). Annotation of lexical bundles with discourse functions in a Spanish academic corpus. In 19th Workshop on Multiword Expressions, MWE 2023 - Proceedings (pp. 99–105). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.mwe-1.14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free