Modeling the Data Provenance of Relational Databases Supporting Full-Featured SQL and Procedural Languages

Deyou Tang; Rong Zhao; Yuebang Lin; Tangqing Zhang; Pingjian Zhang

Journal ArticleOPEN ACCESS

Modeling the Data Provenance of Relational Databases Supporting Full-Featured SQL and Procedural Languages

Applied Sciences (Switzerland) (2023) 13(1)

DOI: 10.3390/app13010064

1Citations

22Readers

Abstract

Data provenance is information about where data come from (provenance data) and how they transform (provenance transformation). Data provenance is widely used to evaluate data quality, trace errors, audit data, and understand references among data. Current studies on data provenance in relational database management systems (RDBMS) still have limitations in supporting full-featured SQL or procedural languages. With these challenges in mind, we present a formal definition of provenance data and provenance transformation for relational data. Then, we propose a solution to support data provenance in relational databases, including provenance graphs and provenance routes. Our method not only solves the complicated problem of modeling provenance in DBMS but also is capable of extending procedural languages in SQL. We also present ProvPg, a PostgreSQL-based prototype database system supporting data provenance in multiple granularities. ProvPg implements extraction, calculation, query, and visualization of provenance. We perform TPC-H tests for ProvPg and PostgreSQL, respectively. Experimental results show that ProvPg addresses the vision of supporting data provenance with little extra computation overhead for the execution engine, which indicates that our model is applicable to lineage tracing applications.

Author supplied keywords

Cite

CITATION STYLE

APA

Tang, D., Zhao, R., Lin, Y., Zhang, T., & Zhang, P. (2023). Modeling the Data Provenance of Relational Databases Supporting Full-Featured SQL and Procedural Languages. Applied Sciences (Switzerland), 13(1). https://doi.org/10.3390/app13010064

Modeling the Data Provenance of Relational Databases Supporting Full-Featured SQL and Procedural Languages

Abstract

Author supplied keywords

Cite

Register to see more suggestions