Structurally aware discretisation for Bayesian networks

Helen Mayfield; Edoardo Bertone; Oz Sahin; Carl Smith

Conference ProceedingsOPEN ACCESS

Structurally aware discretisation for Bayesian networks

Proceedings - 22nd International Congress on Modelling and Simulation, MODSIM 2017 (2017) 1420-1426

DOI: 10.36334/modsim.2017.k4.mayfield

0Citations

8Readers

Abstract

Bayesian networks represent a versatile probabilistic modelling technique widely used to tackle a range of problems in many different domains. However, they are discrete models, and a significant decision when designing a BN is how to split the continuous variables into discrete bins. Default options offered in most BN packages include assigning an equal number of cases to each bin or assigning equal sized bins. However, these methods discretise nodes independently of each other. When learning probabilities from data, this can result in conditional probability tables (CPTs) with missing or uninformed probabilities because data for particular bin combinations (scenarios) is either missing or scarce. This can result in poor model performance. We propose that the structure of the network is an important determinant in node discretisation, and that the best bin allocations for a simple naïve network will be different to those for more complicated networks that attempt to model relationships between the predictor variables. Furthermore, a good discretisation algorithm should not require the model designer to specify the exact number of bins as a target for discretisation. Rather, it should be flexible in determining bin allocations within limits specified by the model designer. BN performance can be improved if discretisation results in CPTs that contain fewer combinations with insufficient evidence to confidently estimate probabilities. We have developed a structure aware discretisation (SAD) algorithm that minimises the number of missing probabilities in CPTs by taking into account network structure. The algorithm requires some parameters to be set, such as the minimum number of cases in each bin, but determines the exact number of bins and their limits based on the data. It consists of two stages: a structurally unaware discretisation stage (SUD) that distributes the cases into bins until each bin has a minimum number of cases, followed by a structure aware discretisation stage (SAD) that further reduces the number of bins to account for incomplete CPTs. The algorithm was tested on a real life water quality case study, using three different network structures (naïve network, tree augmented network and expert designed network). The results show that both the SUD and SAD stages of the algorithm have potential to improve the discretisation process over equal case discretisation by selecting an appropriate number of bins and their limits. Improvement in performance (area under the receiver operating curve and the true skill statistic) was greatest in non-naïve network structures. A major benefit of SAD is that model designers are not required to specify the exact number of bins, with the algorithm instead balancing the parsimony and precision of the network.

Author supplied keywords

Cite

CITATION STYLE

APA

Mayfield, H., Bertone, E., Sahin, O., & Smith, C. (2017). Structurally aware discretisation for Bayesian networks. In Proceedings - 22nd International Congress on Modelling and Simulation, MODSIM 2017 (pp. 1420–1426). Modelling and Simulation Society of Australia and New Zealand Inc. (MSSANZ). https://doi.org/10.36334/modsim.2017.k4.mayfield

Structurally aware discretisation for Bayesian networks

Abstract

Author supplied keywords

Cite

Register to see more suggestions