Scalable two-phase co-occurring sensitive pattern hiding using MapReduce

Shivani Sharma; Durga Toshniwal

Journal ArticleOPEN ACCESS

Scalable two-phase co-occurring sensitive pattern hiding using MapReduce

Journal of Big Data (2017) 4(1)

DOI: 10.1186/s40537-017-0064-9

10Citations

18Readers

Abstract

Background: Expansion of Internet and its use for on-line activities such as E-Commerce and social networking are producing large volumes of transactional data. This huge data volume resulted from these activities facilitates the analysis and understanding of global trends and interesting patterns used for several decisive purposes. Analytics involved in these processes expose sensitive information present in these datasets, which is a serious privacy threat. To overcome this challenge, few sequential heuristics have been used in past where volumes of data were comparatively accommodating to these sequential heuristics; the current situation is not that much in-line and often results in high execution time. This new challenge of scalability paves a way for experimenting with Big Data approaches (e.g., MapReduce Framework). We have agglomerated the MapReduce framework with adopted heuristics to overcome this challenge of scalability along with much-needed privacy preservation and yields efficient analytic results within bounded execution times. Methods: MapReduce is a parallel programming framework [16] which provides us the opportunity to leverage largely distributed resources to deal with the Big Data analytics. MapReduce allows the resource of a largely distributed system to be utilized in a parallel fashion. The simplicity and high fault-tolerance are the key features which make MapReduce a promising framework. Therefore, we have proposed a two-phase MapReduce version of these adopted heuristics. MapReduce framework divides the whole data into ŉ' number of data chunks D = {d 1 d ∪ 2 ∪ d 3 ….. ∪ d n } and distributes them over ŉ' computing nodes to achieve the parallelization. The first phase of MapReduce job runs on each data chunk in order to generate intermediate results, which are further sorted and merged in the second phase to generate final sanitized dataset. Results: We conducted three set of experiments, each with five different scenarios corresponding to the different cluster sizes i.e., n = 1,2,3,4,5 where ŉ' is a number of computing nodes. We compared the approaches with respect to real as well as synthetically generated large datasets. For varying data sizes and varying number of computing nodes, it has been observed that sanitization time required by the MapReduce-based algorithm for same size dataset is much less than the sequential traditional approach. Further, the scalability can be improved by using more number of computing nodes. Lastly, another set of experiments explores the change in sanitization time with varying sizes of the sensitive content present in a dataset. We evaluated the effectiveness of proposed approach in different scenarios, with varying cluster size from 1 to 5 nodes. It has been observed that still the execution time of our approach is much less than traditional schemes. Further, no hiding failure, artifactual patterns have been observed during the experiments as well as in terms of misses cost also the MapReduce version performance is same as of traditional approaches. Conclusion: Traditional approaches for data hiding primarily MaxFIA and SWA were lacking with due inability to tackle large voluminous data. To subjugate the new challenge of scalability, we have implemented these basic heuristics with Big Data approach i.e., MapReduce framework. Quantitative evaluations have shown that the fusion of MapReduce framework with these adopted heuristics fulfills its obligatory responsibility of being scalable and many-fold faster for yielding efficient analytic results.

Author supplied keywords

Cite

CITATION STYLE

APA

Sharma, S., & Toshniwal, D. (2017). Scalable two-phase co-occurring sensitive pattern hiding using MapReduce. Journal of Big Data, 4(1). https://doi.org/10.1186/s40537-017-0064-9

Scalable two-phase co-occurring sensitive pattern hiding using MapReduce

Abstract

Author supplied keywords

Cite

Register to see more suggestions