Nowadays, big data has become a hot research topic. It gives fresh impetus to the economic and social development. However, the huge value of big data also makes it the focus of attacks. Big data security incidents occur frequently in recent years. The security supervision capacities for big data do not match its important role. Data provenance which describes the origins of data and the process by which it arrived the current state, is an effective approach for data supervision. For the full use of provenance in big data supervision, a provenance model which defines the concepts used to represent the provenance types and relations is required to be built in advance, but current provenance models do not adapt to big data scenarios well. In this paper, we comprehensively consider the characteristics of big data and the requirements of data security supervision, extend the widely used provenance model PROV-DM by subtyping and new relation definition, and propose a big data provenance model (BDPM) for data supervision. BDPM model supports the provenance representation of various data types and diverse data processing modes to represent the entire data transformation process through different components in the big data system, and defines new relations to enrich provenance analysis functions. Based on BDPM model, we introduce the constraints that should be satisfied in the construction of valid provenance graph and present the data security supervision methods via provenance graph analysis. Finally, we evaluated the satisfiability of BDPM model through a case study.
CITATION STYLE
Gao, Y., Chen, X., & Du, X. (2020). A Big Data Provenance Model for Data Security Supervision Based on PROV-DM Model. IEEE Access, 8, 38742–38752. https://doi.org/10.1109/ACCESS.2020.2975820
Mendeley helps you to discover research relevant for your work.