Discovering approximate functional dependencies from distributed big data

Weibang Li; Zhanhuai Li; Qun Chen; Tao Jiang; Zhilei Yin

Conference Proceedings

Discovering approximate functional dependencies from distributed big data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9932 LNCS 289-301

DOI: 10.1007/978-3-319-45817-5_23

6Citations

3Readers

Get full text

Abstract

Approximate Functional Dependencies (AFDs) discovered from database relations have proven to be useful for various tasks, such as knowledge discovery, query optimization. Previous research has proposed different algorithms to discover AFDs from a centralized relational database. However, none of the proposed algorithms is designed to discover AFDs from distributed data. In this paper, we devise a scalable and efficient approach to discover AFDs from distributed big data and not tied to main memory requirements. To improve the efficiency of AFDs discovery, statistics of local data in each site are collected to filter and prune the candidate AFDs set at first. The AFDs are discovered in parallel after data redistribution. We balance the load as much as possible before the redistribution of data and prune the candidate AFDs set quickly after the redistribution of data. We evaluate our approach using real and synthetic big datasets and the results show that our approach is more efficient and scalable on large relations and the number of nodes.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, W., Li, Z., Chen, Q., Jiang, T., & Yin, Z. (2016). Discovering approximate functional dependencies from distributed big data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9932 LNCS, pp. 289–301). Springer Verlag. https://doi.org/10.1007/978-3-319-45817-5_23

Discovering approximate functional dependencies from distributed big data

Abstract

Author supplied keywords

Cite

Register to see more suggestions