We investigate potential label errors present in the popular BANKING77 dataset and the associated negative impacts on intent classification methods. Motivated by our own negative results when constructing an intent classifier, we applied two automated approaches to identify potential label errors in the dataset. We found that over 1,400 (14%) of the 10,003 training utterances may have been incorrectly labelled. In a simple experiment, we found that by removing the utterances with potential errors, our intent classifier saw an increase of 4.5% and 8% for the F1-Score and Adjusted Rand Index, respectively, in supervised and unsupervised classification. This paper serves as a warning of the potential of noisy labels in popular NLP datasets. Further study is needed to fully identify the breadth and depth of label errors in BANKING77 and other datasets.
CITATION STYLE
Ying, C., & Thomas, S. (2022). Label Errors in BANKING77. In Insights 2022 - 3rd Workshop on Insights from Negative Results in NLP, Proceedings of the Workshop (pp. 139–143). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.insights-1.19
Mendeley helps you to discover research relevant for your work.