Modeling the Data Provenance of Relational Databases Supporting Full-Featured SQL and Procedural Languages

1Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

Abstract

Data provenance is information about where data come from (provenance data) and how they transform (provenance transformation). Data provenance is widely used to evaluate data quality, trace errors, audit data, and understand references among data. Current studies on data provenance in relational database management systems (RDBMS) still have limitations in supporting full-featured SQL or procedural languages. With these challenges in mind, we present a formal definition of provenance data and provenance transformation for relational data. Then, we propose a solution to support data provenance in relational databases, including provenance graphs and provenance routes. Our method not only solves the complicated problem of modeling provenance in DBMS but also is capable of extending procedural languages in SQL. We also present ProvPg, a PostgreSQL-based prototype database system supporting data provenance in multiple granularities. ProvPg implements extraction, calculation, query, and visualization of provenance. We perform TPC-H tests for ProvPg and PostgreSQL, respectively. Experimental results show that ProvPg addresses the vision of supporting data provenance with little extra computation overhead for the execution engine, which indicates that our model is applicable to lineage tracing applications.

Cite

CITATION STYLE

APA

Tang, D., Zhao, R., Lin, Y., Zhang, T., & Zhang, P. (2023). Modeling the Data Provenance of Relational Databases Supporting Full-Featured SQL and Procedural Languages. Applied Sciences (Switzerland), 13(1). https://doi.org/10.3390/app13010064

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free