A Large-Scale Corpus of E-mail Conversations with Standard and Two-Level Dialogue Act Annotations

2Citations
Citations of this article
60Readers
Mendeley users who have this article in their library.

Abstract

We present a large-scale corpus of e-mail conversations with domain-agnostic and two-level dialogue act (DA) annotations towards the goal of a better understanding of asynchronous conversations. We annotate over 6,000 messages and 35,000 sentences from more than 2,000 threads. For a domain-independent and application-independent DA annotations, we choose ISO standard 24617-2 as the annotation scheme. To assess the difficulty of DA recognition on our corpus, we evaluate several models, including a pre-trained contextual representation model, as our baselines. The experimental results show that BERT outperforms other neural network models, including previous state-of-the-art models, but falls short of a human performance. We also demonstrate that DA tags of two-level granularity enable a DA recognition model to learn efficiently by using multi-task learning. An evaluation of a model trained on our corpus against other domains of asynchronous conversation reveals the domain independence of our DA annotations.

Cite

CITATION STYLE

APA

Taniguchi, M., Ueda, Y., Taniguchi, T., & Ohkuma, T. (2020). A Large-Scale Corpus of E-mail Conversations with Standard and Two-Level Dialogue Act Annotations. In COLING 2020 - 28th International Conference on Computational Linguistics, Proceedings of the Conference (pp. 4969–4980). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.coling-main.436

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free