Improving Large-Scale Conversational Assistants using Model Interpretation based Training Sample Selection

Stefan Schroedl; Manoj Kumar; Kiana Hajebi; Morteza Ziyadi; Sriram Venkathapaty; Anil Ramakrishna; Rahul Gupta; Pradeep Natarajan

Conference ProceedingsOPEN ACCESS

Improving Large-Scale Conversational Assistants using Model Interpretation based Training Sample Selection

EMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track (2022) 381-388

DOI: 10.18653/v1/2022.emnlp-industry.37

1Citations

18Readers

Abstract

Natural language understanding (NLU) models are a core component of large-scale conversational assistants. Collecting training data for these models through manual annotations is slow and expensive that impedes the pace of model improvement. We present a three stage approach to address this challenge: First, we identify a large set of relatively infrequent utterances from live traffic where the users implicitly communicated satisfaction with a response (such as by not interrupting), along with the existing model outputs as candidate annotations. Second, we identify a small subset of these utterances usings Integrated Gradients based importance scores computed with the current models. Finally, we augment our training sets with these utterances and retrain our models. We demonstrate the effectiveness of our approach in a large-scale conversational assistant, processing billions of utterances every week. By augmenting our training set with just 0.05% more utterances through our approach, we observe statistically significant improvements for infrequent tail utterances: a 0.45% reduction in semantic error rate (SemER) in offline experiments, and a 1.23% reduction in defect rates in online A/B tests.

Cite

CITATION STYLE

APA

Schroedl, S., Kumar, M., Hajebi, K., Ziyadi, M., Venkathapaty, S., Ramakrishna, A., … Natarajan, P. (2022). Improving Large-Scale Conversational Assistants using Model Interpretation based Training Sample Selection. In EMNLP 2022 - Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track (pp. 381–388). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-industry.37

Improving Large-Scale Conversational Assistants using Model Interpretation based Training Sample Selection

Abstract

Cite

Register to see more suggestions