Knowledge Engineering for Open Science: Building and Deploying Knowledge Bases for Metadata Standards

Mark A. Musen; Martin J. O'Connor; Josef Hardi; Marcos Martínez-Romero

Journal ArticleOPEN ACCESS

Knowledge Engineering for Open Science: Building and Deploying Knowledge Bases for Metadata Standards

AI Magazine (2026) 47(1)

DOI: 10.1002/aaai.70048

2Citations

5Readers

Abstract

For more than a decade, scientists have been striving to make their datasets available in open repositories, with the goal that they be findable, accessible, interoperable, and reusable (FAIR). Although it is hard for most investigators to remember all the “guiding principles” associated with FAIR data, there is one overarching requirement: The data need to be annotated with “rich,” discipline-specific, standardized metadata that can enable third parties to understand who performed the experiment, who or what the subjects were, what the experimental conditions were, and what the results appear to show. Most areas of science lack standards for such metadata and, when such standards exist, it can be difficult for investigators or data curators to apply them. The Center for Expanded Data Annotation and Retrieval (CEDAR) builds technology that enables scientists to encode descriptive metadata standards as templates that enumerate the attributes of different kinds of experiments and that link those attributes to ontologies or value sets that may supply controlled values for those attributes. These metadata templates capture the preferences of groups of investigators regarding how their data should be described and what a third party needs to know to make sense of their datasets. CEDAR templates describing community metadata preferences have been used to standardize metadata for a variety of scientific consortia. They have been used as the basis for data-annotation systems that acquire metadata through Web forms or through spreadsheets, and they can help correct metadata to ensure adherence to standards. Like the declarative knowledge bases that underpinned intelligent systems decades ago, CEDAR templates capture the knowledge of a community of practice in symbolic form, and they allow that knowledge to be applied in a variety of settings. They provide a mechanism for scientific communities to create shared metadata standards and to encode their preferences for the application of those standards, and for deploying those standards in a range of intelligent systems to promote open science.

Cite

CITATION STYLE

APA

Musen, M. A., O’Connor, M. J., Hardi, J., & Martínez-Romero, M. (2026). Knowledge Engineering for Open Science: Building and Deploying Knowledge Bases for Metadata Standards. AI Magazine, 47(1). https://doi.org/10.1002/aaai.70048

Knowledge Engineering for Open Science: Building and Deploying Knowledge Bases for Metadata Standards

Abstract

Cite

Register to see more suggestions