CoSiNES: Contrastive Siamese Network for Entity Standardization

0Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Entity standardization maps noisy mentions from free-form text to standard entities in a knowledge base. The unique challenge of this task relative to other entity-related tasks is the lack of surrounding context and numerous variations in the surface form of the mentions, especially when it comes to generalization across domains where labeled data is scarce. Previous research mostly focuses on developing models either heavily relying on context, or dedicated solely to a specific domain. In contrast, we propose CoSiNES, a generic and adaptable framework with Contrastive Siamese Network for Entity Standardization that effectively adapts a pretrained language model to capture the syntax and semantics of the entities in a new domain. We construct a new dataset in the technology domain, which contains 640 technical stack entities and 6,412 mentions collected from industrial content management systems. We demonstrate that CoSiNES yields higher accuracy and faster runtime than baselines derived from leading methods in this domain. CoSiNES also achieves competitive performance in four standard datasets from the chemistry, medicine, and biomedical domains, demonstrating its crossdomain applicability. Code and data is available at https://github.com/konveyor/tackle-container-advisor/tree/main/entity_standardizer/cosines.

Cite

CITATION STYLE

APA

Yuan, J., Merler, M., Choudhury, M., Pavuluri, R., Singh, M. P., & Vukovic, M. (2023). CoSiNES: Contrastive Siamese Network for Entity Standardization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 109–119). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.matching-1.9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free