In recent years, with the open data movement around the world, more and more open data sets are available. But, the quality of the datasets poses issues for learning models. This study focuses on learning the Bayesian network structure from data sets containing noise. A novel approach called GBNL (Generalized Bayesian Structure Learning) is proposed. GBNL first uses a greedy algorithm to obtain an appropriate sliding window size for any dataset, then it leverages a difference array-based method to quickly improve the data quality by locating the noisy data sections and removing them. GBNL can not only evaluate the quality of the data set but also effectively reduce the noise in the data. We conduct experiments to evaluate GBNL on five large datasets, the experiment results validate the accuracy and the generalizability of this novel approach.
CITATION STYLE
Tang, Y., Chen, Y., & Ge, G. (2019). Generalized Bayesian Structure Learning from Noisy Datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11448 LNCS, pp. 158–169). Springer Verlag. https://doi.org/10.1007/978-3-030-18590-9_11
Mendeley helps you to discover research relevant for your work.