Using a Discourse Bank and a Lexicon for the Automatic Identification of Discourse Connectives

Amália Mendes; Iria del Río

Conference Proceedings

Using a Discourse Bank and a Lexicon for the Automatic Identification of Discourse Connectives

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11122 LNAI 211-221

DOI: 10.1007/978-3-319-99722-3_22

1Citations

2Readers

Get full text

Abstract

We describe two new resources that have been prepared for European Portuguese and how they are used for discourse parsing: the Portuguese subpart of the TED-MDB corpus, a multilingual corpus of TED Talks that has been annotated in the PDTB style, and the Lexicon of Discourse Markers for Portuguese (LDM-PT). Both lexicon and corpus are used in a preliminary experiment for discourse connective identification in texts. This includes, in many cases, the difficult task of disambiguating between connective and non-connective uses. We annotated the PT-TED-MDB corpus with POS, lemma and syntactic constituency and focus on the 10 most frequent connectives in the corpus. The best approach considers word-form+POS+syntactic annotation and leads to 85% precision.

Cite

CITATION STYLE

APA

Mendes, A., & del Río, I. (2018). Using a Discourse Bank and a Lexicon for the Automatic Identification of Discourse Connectives. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11122 LNAI, pp. 211–221). Springer Verlag. https://doi.org/10.1007/978-3-319-99722-3_22

Using a Discourse Bank and a Lexicon for the Automatic Identification of Discourse Connectives

Abstract

Cite

Register to see more suggestions