Evaluating the Practical Utility of Confidence-score based Techniques for Unsupervised Open-world Intent Classification

1Citations
Citations of this article
33Readers
Mendeley users who have this article in their library.

Abstract

Open-world classification in dialog systems require models to detect open intents, while ensuring the quality of in-domain (ID) intent classification. In this work, we revisit methods that leverage distance-based statistics for unsupervised out-of-domain (OOD) detection. We show that despite their superior performance on threshold-independent metrics like AUROC on test-set, threshold values chosen based on the performance on a validation-set do not generalize well to the test-set, thus resulting in substantially lower performance on ID or OOD detection accuracy and F1-scores. Our analysis shows that this lack of generalizability can be successfully mitigated by setting aside a holdout set from validation data for threshold selection (sometimes achieving relative gains as high as 100%). Extensive experiments on seven benchmark datasets show that this fix puts the performance of these methods at par with, or sometimes even better than, the current state-of-the-art OOD detection techniques.

Cite

CITATION STYLE

APA

Khosla, S., & Gangadharaiah, R. (2022). Evaluating the Practical Utility of Confidence-score based Techniques for Unsupervised Open-world Intent Classification. In Insights 2022 - 3rd Workshop on Insights from Negative Results in NLP, Proceedings of the Workshop (pp. 18–23). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.insights-1.3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free