Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system

33Citations
Citations of this article
46Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: We develop medical-specialty specific ontologies that contain the settled science and common term usage. We leverage current practices in information and relationship extraction to streamline the ontology development process. Our system combines different text types with information and relationship extraction techniques in a low overhead modifiable system. Our SEmi-Automated ontology Maintenance (SEAM) system features a natural language processing pipeline for information extraction. Synonym and hierarchical groups are identified using corpus-based semantics and lexico-syntactic patterns. The semantic vectors we use are term frequency by inverse document frequency and context vectors. Clinical documents contain the terms we want in an ontology. They also contain idiosyncratic usage and are unlikely to contain the linguistic constructs associated with synonym and hierarchy identification. By including both clinical and biomedical texts, SEAM can recommend terms from those appearing in both document types. The set of recommended terms is then used to filter the synonyms and hierarchical relationships extracted from the biomedical corpus. We demonstrate the generality of the system across three use cases: ontologies for acute changes in mental status, Medically Unexplained Syndromes, and echocardiogram summary statements. Results: Across the three uses cases, we held the number of recommended terms relatively constant by changing SEAM's parameters. Experts seem to find more than 300 recommended terms to be overwhelming. The approval rate of recommended terms increased as the number and specificity of clinical documents in the corpus increased. It was 60% when there were 199 clinical documents that were not specific to the ontology domain and 90% when there were 2879 documents very specific to the target domain. We found that fewer than 100 recommended synonym groups were also preferred. Approval rates for synonym recommendations remained low varying from 43% to 25% as the number of journal articles increased from 19 to 47. Overall the number of recommended hierarchical relationships was very low although approval was good. It varied between 67% and 31%. Conclusion: SEAM produced a concise list of recommended clinical terms, synonyms and hierarchical relationships regardless of medical domain.

Cite

CITATION STYLE

APA

Doing-Harris, K., Livnat, Y., & Meystre, S. (2015). Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system. Journal of Biomedical Semantics, 6(1). https://doi.org/10.1186/s13326-015-0011-7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free