To annotate or not? Predicting performance drop under domain shift

Hady Elsahar; Matthias Gallé

Conference ProceedingsOPEN ACCESS

To annotate or not? Predicting performance drop under domain shift

EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2019) 2163-2173

DOI: 10.18653/v1/d19-1222

77Citations

148Readers

Abstract

Performance drop due to domain-shift is an endemic problem for NLP models in production. This problem creates an urge to continuously annotate evaluation datasets to measure the expected drop in the model performance which can be prohibitively expensive and slow. In this paper, we study the problem of predicting the performance drop of modern NLP models under domain-shift, in the absence of any target domain labels. We investigate three families of methods (H-divergence, reverse classification accuracy and confidence measures), show how they can be used to predict the performance drop and study their robustness to adversarial domain-shifts. Our results on sentiment classification and sequence labelling show that our method is able to predict performance drops with an error rate as low as 2.15% and 0.89% for sentiment analysis and POS tagging respectively.

Cite

CITATION STYLE

APA

Elsahar, H., & Gallé, M. (2019). To annotate or not? Predicting performance drop under domain shift. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 2163–2173). Association for Computational Linguistics. https://doi.org/10.18653/v1/d19-1222

To annotate or not? Predicting performance drop under domain shift

Abstract

Cite

Register to see more suggestions