Large-scale pre-training is widely used in recent document understanding tasks. During deployment, one may expect that models should trigger a conservative fallback policy when encountering out-of-distribution (OOD) samples, which highlights the importance of OOD detection. However, most existing OOD detection methods focus on single-modal inputs such as images or texts. While documents are multimodal in nature, it is underexplored if and how multi-modal information in documents can be exploited for OOD detection. In this work, we first provide a systematic and in-depth analysis on OOD detection for document understanding models. We study the effects of model modality, pre-training, and fine-tuning across various types of OOD inputs. In particular, we find that spatial information is critical for document OOD detection. To better exploit spatial information, we propose a spatial-aware adapter, which serves as a parameter-efficient add-on module to adapt transformer-based language models to the document domain. Extensive experiments show that adding the spatial-aware adapter significantly improves the OOD detection performance compared to directly using the language model and achieves superior performance compared to competitive baselines.
CITATION STYLE
Gu, J., Ming, Y., Zhou, Y., Kuen, J., Morariu, V. I., Zhao, H., … Nenkova, A. (2023). A Critical Analysis of Document Out-of-Distribution Detection. In Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4973–4999). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-emnlp.332
Mendeley helps you to discover research relevant for your work.