Reducing cohort bias in natural language understanding systems with targeted self-training scheme

2Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Bias in machine learning models can be an issue when the models are trained on particular types of data that do not generalize well, causing under performance in certain groups of users. In this work, we focus on reducing the bias related to new customers in a digital voice assistant system. It is observed that natural language understanding models often have lower performance when dealing with requests coming from new users rather than experienced users. To mitigate this problem, we propose a framework that consists of two phases (1) a fxing phase with four active learning strategies used to identify important samples coming from new users, and (2) a self training phase where a teacher model trained from the frst phase is used to annotate semi-supervised samples to expand the training data with relevant cohort utterances. We explain practical strategies that involve an identifcation of representative cohort-based samples through density clustering as well as employing implicit customer feedbacks to improve new customers' experience. We demonstrate the effectiveness of our approach in a real world large scale voice assistant system for two languages, German and French through a number of experiments.

Cite

CITATION STYLE

APA

Le, D. T., Cortes, G., Chen, B., & Bradford, M. (2023). Reducing cohort bias in natural language understanding systems with targeted self-training scheme. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 5, pp. 552–560). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.acl-industry.53

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free