The DCAT Application Profile for Data Portals is a crucial cornerstone for publishing and reusing Open Data in Europe. It supports the harmonization and interoperability of Open Data by providing an expressive set of properties, guidelines, and reusable vocabularies. However, a qualitative and accurate implementation by Open Data providers remains challenging. To improve the informative value and the compliance with RDF-based specifications, we propose a methodology to measure and assess the quality of DCAT-AP datasets. Our approach is based on the FAIR and the 5-star principles for Linked Open Data. We define a set of metrics, where each one covers a specific quality aspect. For example, if a certain property has a compliant value, if mandatory vocabularies are applied or if the actual data is available. The values for the metrics are stored as a custom data model based on the Data Quality Vocabulary and is used to calculate an overall quality score for each dataset. We implemented our approach as a scalable and reusable Open Source solution to demonstrate its feasibility. It is applied in a large-scale production environment (data.europa.eu) and constantly checks more than 1.6 million DCAT-AP datasets and delivers quality reports.
CITATION STYLE
Wentzel, B., Kirstein, F., Jastrow, T., Sturm, R., Peters, M., & Schimmler, S. (2023). An Extensive Methodology and Framework for Quality Assessment of DCAT-AP Datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 14130 LNCS, pp. 262–278). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-41138-0_17
Mendeley helps you to discover research relevant for your work.