Provenance as essential infrastructure for Data Lakes

9Citations
Citations of this article
31Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The Data Lake is emerging as a Big Data storage and management solution which can store any type of data at scale and execute data transformations for analysis. Higher flexibility in storage increases the risk of Data Lakes becoming data swamps. In this paper we show how provenance contributes to data management within a Data Lake infrastructure. We study provenance integration challenges and propose a reference architecture for provenance usage in a Data Lake. Finally we discuss the applicability of our tools in the proposed architecture.

Cite

CITATION STYLE

APA

Suriarachchi, I., & Plale, B. (2016). Provenance as essential infrastructure for Data Lakes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9672, pp. 178–182). Springer Verlag. https://doi.org/10.1007/978-3-319-40593-3_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free