An evaluation dataset for intent classification and out-of-scope prediction

Stefan Larson; Anish Mahendran; Joseph J. Peper; Christopher Clarke; Andrew Lee; Parker Hill; Jonathan K. Kummerfeld; Kevin Leach; Michael A. Laurenzano; Lingjia Tang; Jason Mars

Conference ProceedingsOPEN ACCESS

An evaluation dataset for intent classification and out-of-scope prediction

EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2019) 1311-1316

DOI: 10.18653/v1/d19-1131

307Citations

249Readers

Abstract

Task-oriented dialog systems need to know when a query falls outside their range of supported intents, but current text classification corpora only define label sets that cover every example. We introduce a new dataset that includes queries that are out-of-scope-i.e., queries that do not fall into any of the system's supported intents. This poses a new challenge because models cannot assume that every query at inference time belongs to a system-supported intent class. Our dataset also covers 150 intent classes over 10 domains, capturing the breadth that a production task-oriented agent must handle. We evaluate a range of benchmark classifiers on our dataset along with several different out-of-scope identification schemes. We find that while the classifiers perform well on in-scope intent classification, they struggle to identify out-of-scope queries. Our dataset and evaluation fill an important gap in the field, offering a way of more rigorously and realistically benchmarking text classification in task-driven dialog systems.

Cite

CITATION STYLE

APA

Larson, S., Mahendran, A., Peper, J. J., Clarke, C., Lee, A., Hill, P., … Mars, J. (2019). An evaluation dataset for intent classification and out-of-scope prediction. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 1311–1316). Association for Computational Linguistics. https://doi.org/10.18653/v1/d19-1131

An evaluation dataset for intent classification and out-of-scope prediction

Abstract

Cite

Register to see more suggestions