A Large-Scale Dataset for Empathetic Response Generation

60Citations
Citations of this article
94Readers
Mendeley users who have this article in their library.

Abstract

Recent development in NLP shows a strong trend towards refining pre-trained models with a domain-specific dataset. This is especially the case for response generation where emotion plays an important role. However, existing empathetic datasets remain small, delaying research efforts in this area, for example, the development of emotion-aware chatbots. One main technical challenge has been the cost of manually annotating dialogues with the right emotion labels. In this paper, we describe a large-scale silver dataset consisting of 1M dialogues annotated with 32 fine-grained emotions, eight empathetic response intents, and the Neutral category. To achieve this goal, we have developed a novel data curation pipeline starting with a small seed of manually annotated data and eventually scaling it to a satisfactory size. We compare its quality against a state-of-the-art gold dataset using offline experiments and visual validation methods. The resultant procedure can be used to create similar datasets in the same domain as well as in other domains.

Cite

CITATION STYLE

APA

Welivita, A., Xie, Y., & Pu, P. (2021). A Large-Scale Dataset for Empathetic Response Generation. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 1251–1264). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.96

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free