In relational databases, functional dependencies discovery is a very important database analysis technology, which has a wide range of applications in knowledge discovery, database semantic analysis, data quality assessment and database design. The existing functional dependencies discovery algorithms are mainly designed for centralized data, which are usually only applicable when the data size is small. With the rapid development of the database scale of the times, the distributed environment function dependence discovery has more and more important practical significance. A functional dependencies discovery algorithm for big data in distributed environment is proposed. The basic idea is to first perform functional dependencies discovery on the sampled data set, and then globally verify the functional dependencies that may be globally established, so that all functional dependencies can be discovered. Parallel computing can be used to improve discovery efficiency while ensuring correctness.
CITATION STYLE
Gu, C., & Cao, J. (2020). Functional Dependency Discovery on Distributed Database: Sampling Verification Framework. In Communications in Computer and Information Science (Vol. 1179 CCIS, pp. 463–476). Springer. https://doi.org/10.1007/978-981-15-2810-1_43
Mendeley helps you to discover research relevant for your work.