SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

Hyunwoo Kim; Jack Hessel; Liwei Jiang; Peter West; Ximing Lu; Youngjae Yu; Pei Zhou; Ronan Le Bras; Malihe Alikhani; Gunhee Kim; Maarten Sap; Yejin Choi

Conference ProceedingsOPEN ACCESS

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (2023) 12930-12949

DOI: 10.18653/v1/2023.emnlp-main.799

36Citations

71Readers

Abstract

Data scarcity has been a long standing issue in the field of open-domain social dialogue. To quench this thirst, we present SODA: the first publicly available, million-scale high-quality social dialogue dataset. By contextualizing social commonsense knowledge from a knowledge graph, we are able to distill an exceptionally broad spectrum of social interactions from a large language model. Human evaluation shows that conversations in SODA are more consistent, specific, and (surprisingly) natural than those in prior human-authored datasets. Using SODA, we train COSMO: a generalizable conversation model that is significantly more natural and consistent on unseen datasets than best-performing conversation models (e.g., GODEL, BlenderBot-1, Koala, Vicuna). Experiments reveal COSMO is sometimes even preferred to the original human-written gold responses. Additionally, our results shed light on the distinction between knowledge-enriched conversations and natural social chitchats. We make our data, models, and code public.

Cite

CITATION STYLE

APA

Kim, H., Hessel, J., Jiang, L., West, P., Lu, X., Yu, Y., … Choi, Y. (2023). SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 12930–12949). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.799

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

Abstract

Cite

Register to see more suggestions