An Extensive Methodology and Framework for Quality Assessment of DCAT-AP Datasets

Bianca Wentzel; Fabian Kirstein; Torben Jastrow; Raphael Sturm; Michael Peters; Sonja Schimmler

Conference ProceedingsOPEN ACCESS

An Extensive Methodology and Framework for Quality Assessment of DCAT-AP Datasets

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2023) 14130 LNCS 262-278

DOI: 10.1007/978-3-031-41138-0_17

1Citations

5Readers

Abstract

The DCAT Application Profile for Data Portals is a crucial cornerstone for publishing and reusing Open Data in Europe. It supports the harmonization and interoperability of Open Data by providing an expressive set of properties, guidelines, and reusable vocabularies. However, a qualitative and accurate implementation by Open Data providers remains challenging. To improve the informative value and the compliance with RDF-based specifications, we propose a methodology to measure and assess the quality of DCAT-AP datasets. Our approach is based on the FAIR and the 5-star principles for Linked Open Data. We define a set of metrics, where each one covers a specific quality aspect. For example, if a certain property has a compliant value, if mandatory vocabularies are applied or if the actual data is available. The values for the metrics are stored as a custom data model based on the Data Quality Vocabulary and is used to calculate an overall quality score for each dataset. We implemented our approach as a scalable and reusable Open Source solution to demonstrate its feasibility. It is applied in a large-scale production environment (data.europa.eu) and constantly checks more than 1.6 million DCAT-AP datasets and delivers quality reports.

Author supplied keywords

Cite

CITATION STYLE

APA

Wentzel, B., Kirstein, F., Jastrow, T., Sturm, R., Peters, M., & Schimmler, S. (2023). An Extensive Methodology and Framework for Quality Assessment of DCAT-AP Datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14130 LNCS, pp. 262–278). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-41138-0_17

An Extensive Methodology and Framework for Quality Assessment of DCAT-AP Datasets

Abstract

Author supplied keywords

Cite

Register to see more suggestions