Probing the Robustness of Pre-trained Language Models for Entity Matching

20Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The paradigm of fine-tuning Pre-trained Language Models (PLMs) has been successful in Entity Matching (EM). Despite their remarkable performance, PLMs exhibit tendency to learn spurious correlations from training data. In this work, we aim at investigating whether PLM-based entity matching models can be trusted in real-world applications where data distribution is different from that of training. To this end, we design an evaluation benchmark to assess the robustness of EM models to facilitate their deployment in the real-world settings. Our assessments reveal that data imbalance in the training data is a key problem for robustness. We also find that data augmentation alone is not sufficient to make a model robust. As a remedy, we prescribe simple modifications that can improve the robustness of PLM-based EM models. Our experiments show that while yielding superior results for in-domain generalization, our proposed model significantly improves the model robustness, compared to state-of-the-art EM models.

Cite

CITATION STYLE

APA

Akbarian Rastaghi, M., Kamalloo, E., & Rafiei, D. (2022). Probing the Robustness of Pre-trained Language Models for Entity Matching. In International Conference on Information and Knowledge Management, Proceedings (pp. 3786–3790). Association for Computing Machinery. https://doi.org/10.1145/3511808.3557673

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free