Implementing a data lineage tracker

Colin Puri; Doo Soon Kim; Peter Z. Yeh; Kunal Verma

Conference Proceedings

Implementing a data lineage tracker

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7448 LNCS 390-403

DOI: 10.1007/978-3-642-32584-7_32

0Citations

9Readers

Get full text

Abstract

Everyday business users face the tracking of the origin of information used in calculations and business decisions. Knowing the origin and lineage of data can help in the decision making process, provide a clear audit trail for regulation, and answer key questions such as: who, what, where, when, why, and how. In the case of tracking data lineage, many issues and challenges arise in trying to track and support a heterogeneous enterprise environment. This paper presents one method of tackling data lineage to answer the questions needed for business users, for both new and old applications in a heterogeneous infrastructure environment. Using trace logs from data sources, we show how our system performs by effectively tracking data lineage and determining data flows of information as it moves from one data source to another through the execution of applications. Utilizing SQL and NoSQL systems, we demonstrate the recall and precision of our proposed data lineage tracking system. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Puri, C., Kim, D. S., Yeh, P. Z., & Verma, K. (2012). Implementing a data lineage tracker. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7448 LNCS, pp. 390–403). https://doi.org/10.1007/978-3-642-32584-7_32

Implementing a data lineage tracker

Abstract

Author supplied keywords

Cite

Register to see more suggestions