Dirichlet Bayesian network scores and the maximum relative entropy principle

39Citations
Citations of this article
47Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian–Dirichlet (BD) scores; the most famous is the Bayesian–Dirichlet equivalent uniform (BDeu) score from Heckerman et al. (Mach Learn 20(3):197–243, 1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (Proceedings of the 27th international workshop on Bayesian inference and maximum entropy methods in science and engineering, pp 74–84, 2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work [Scutari in J Mach Learn Res (Proc Track PGM 2016) 52:438–448, 2016] that the Bayesian–Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior.

References Powered by Scopus

Modeling by shortest data description

4485Citations
N/AReaders
Get full text

Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

2698Citations
N/AReaders
Get full text

The max-min hill-climbing Bayesian network structure learning algorithm

1474Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Who learns better Bayesian network structures: Accuracy and speed of structure learning algorithms

182Citations
N/AReaders
Get full text

A data-driven Bayesian network model integrating physical knowledge for prioritization of risk influencing factors

38Citations
N/AReaders
Get full text

Score-based Bayesian belief network structure learning in damage risk modelling of mining areas building development

18Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Scutari, M. (2018). Dirichlet Bayesian network scores and the maximum relative entropy principle. Behaviormetrika, 45(2), 337–362. https://doi.org/10.1007/s41237-018-0048-x

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 18

55%

Researcher 9

27%

Professor / Associate Prof. 3

9%

Lecturer / Post doc 3

9%

Readers' Discipline

Tooltip

Computer Science 15

54%

Engineering 7

25%

Decision Sciences 3

11%

Earth and Planetary Sciences 3

11%

Save time finding and organizing research with Mendeley

Sign up for free