Using a Discourse Bank and a Lexicon for the Automatic Identification of Discourse Connectives

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We describe two new resources that have been prepared for European Portuguese and how they are used for discourse parsing: the Portuguese subpart of the TED-MDB corpus, a multilingual corpus of TED Talks that has been annotated in the PDTB style, and the Lexicon of Discourse Markers for Portuguese (LDM-PT). Both lexicon and corpus are used in a preliminary experiment for discourse connective identification in texts. This includes, in many cases, the difficult task of disambiguating between connective and non-connective uses. We annotated the PT-TED-MDB corpus with POS, lemma and syntactic constituency and focus on the 10 most frequent connectives in the corpus. The best approach considers word-form+POS+syntactic annotation and leads to 85% precision.

Cite

CITATION STYLE

APA

Mendes, A., & del Río, I. (2018). Using a Discourse Bank and a Lexicon for the Automatic Identification of Discourse Connectives. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11122 LNAI, pp. 211–221). Springer Verlag. https://doi.org/10.1007/978-3-319-99722-3_22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free