In this paper, we analyze the problem of data clustering in domains where discrete and continuous variables coexist. We propose the use of hybrid Bayesian networks with naïve Bayes structure and hidden class variable. The model integrates discrete and continuous features, by representing the conditional distributions as mixtures of truncated exponentials (MTEs). The number of classes is determined through an iterative procedure based on a variation of the data augmentation algorithm. The new model is compared with an EM-based clustering algorithm where each class model is a product of conditionally independent probability distributions and the number of clusters is decided by using a cross-validation scheme. Experiments carried out over real-world and synthetic data sets showthat the proposal is competitive with state-of-the-artmethods. Even though the methodology introduced in this manuscript is based on the use of MTEs, it can be easily instantiated to other similar models, like the Mixtures of Polynomials or the mixtures of truncated basis functions in general.
CITATION STYLE
Fernández, A., Gámez, J. A., Rumí, R., & Salmerón, A. (2014). Data clustering using hidden variables in hybrid Bayesian networks. Progress in Artificial Intelligence, 2(2–3), 141–152. https://doi.org/10.1007/s13748-014-0048-3
Mendeley helps you to discover research relevant for your work.