Ontology Enrichment from Texts: A Biomedical Dataset for Concept Discovery and Placement

4Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

Mentions of new concepts appear regularly in texts and require automated approaches to harvest and place them into Knowledge Bases (KB), e.g., ontologies and taxonomies. Existing datasets suffer from three issues, (i) mostly assuming that a new concept is pre-discovered and cannot support out-of-KB mention discovery; (ii) only using the concept label as the input along with the KB and thus lacking the contexts of a concept label; and (iii) mostly focusing on concept placement w.r.t a taxonomy of atomic concepts, instead of complex concepts, i.e., with logical operators. To address these issues, we propose a new benchmark, adapting MedMentions dataset (PubMed abstracts) with SNOMED CT versions in 2014 and 2017 under the Diseases subcategory and the broader categories of Clinical finding, Procedure, and Pharmaceutical/biologic product. We provide usage on the evaluation with the dataset for out-of-KB mention discovery and concept placement, adapting recent Large Language Model based methods.

Cite

CITATION STYLE

APA

Dong, H., Chen, J., He, Y., & Horrocks, I. (2023). Ontology Enrichment from Texts: A Biomedical Dataset for Concept Discovery and Placement. In International Conference on Information and Knowledge Management, Proceedings (pp. 5316–5320). Association for Computing Machinery. https://doi.org/10.1145/3583780.3615126

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free