CMCC: A Comprehensive and Large-Scale Human-Human Dataset for Dialogue Systems

0Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

Dialogue modeling problems severely limit the real-world deployment of neural conversational models and building a human-like dialogue agent is an extremely challenging task. Recently, data-driven models become more and more prevalent which need a huge amount of conversation data. In this paper, we release around 100,000 dialogue, which come from real-world dialogue transcripts between real users and customer-service staffs. We call this dataset as CMCC (China Mobile Customer Care) dataset, which differs from existing dialogue datasets in both size and nature significantly. The dataset reflects several characteristics of human-human conversations, e.g., task-driven, care-oriented, and long-term dependency among the context. It also covers various dialogue types including task-oriented, chitchat and conversational recommendation in real-world scenarios. To our knowledge, CMCC is the largest real human-human spoken dialogue dataset and has dozens of times the data scale of others, which shall significantly promote the training and evaluation of dialogue modeling methods. The results of extensive experiments indicate that CMCC is challenging and needs further effort. We hope that this resource will allow for more effective models across various dialogue sub-problems to be built in the future.

Cite

CITATION STYLE

APA

Huang, Y., Wu, X., Chen, S., Hu, W., Zhu, Q., Feng, J., … Zhao, J. (2022). CMCC: A Comprehensive and Large-Scale Human-Human Dataset for Dialogue Systems. In SereTOD 2022 - Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems, Proceedings of the Workshop (pp. 48–61). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.seretod-1.7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free