Making Heads and Tails of Models with Marginal Calibration for Sparse Tagsets

1Citations
Citations of this article
41Readers
Mendeley users who have this article in their library.

Abstract

For interpreting the behavior of a probabilistic model, it is useful to measure a model's calibration - the extent to which the model produces reliable confidence scores. We address the open problem of calibration for tagging models with sparse tagsets, and recommend strategies to measure and reduce calibration error (CE) in such models. We show that several post-hoc recalibration techniques all reduce calibration error across the marginal distribution for two existing sequence taggers. Moreover, we propose tag frequency grouping (TFG) as a way to measure calibration error in different frequency bands. Further, recalibrating each group separately promotes a more equitable reduction of calibration error across the tag frequency spectrum.

Cite

CITATION STYLE

APA

Kranzlein, M., Liu, N. F., & Schneider, N. (2021). Making Heads and Tails of Models with Marginal Calibration for Sparse Tagsets. In Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 (pp. 4919–4928). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.findings-emnlp.423

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free