Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers

Tahmid F. Mehdi; Gurdeep Singh; Jennifer A. Mitchell; Alan M. Moses

Journal ArticleOPEN ACCESS

Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers

Bioinformatics (2019) 35(18) 3232-3239

DOI: 10.1093/bioinformatics/btz064

1Citations

24Readers

Abstract

Mammalian genomes can contain thousands of enhancers but only a subset are actively driving gene expression in a given cellular context. Integrated genomic datasets can be harnessed to predict active enhancers. One challenge in integration of large genomic datasets is the increasing heterogeneity: continuous, binary and discrete features may all be relevant. Coupled with the typically small numbers of training examples, semi-supervised approaches for heterogeneous data are needed; however, current enhancer prediction methods are not designed to handle heterogeneous data in the semi-supervised paradigm. Results: We implemented a Dirichlet Process Heterogeneous Mixture model that infers Gaussian, Bernoulli and Poisson distributions over features. We derived a novel variational inference algorithm to handle semi-supervised learning tasks where certain observations are forced to cluster together. We applied this model to enhancer candidates in mouse heart tissues based on heterogeneous features. We constrained a small number of known active enhancers to appear in the same cluster, and 47 additional regions clustered with them. Many of these are located near heart-specific genes. The model also predicted 1176 active promoters, suggesting that it can discover new enhancers and promoters. Availability and implementation: We created the 'dphmix' Python package: https://pypi.org/project/dphmix/. Supplementary information: Supplementary data are available at Bioinformatics online.

Cite

CITATION STYLE

APA

Mehdi, T. F., Singh, G., Mitchell, J. A., & Moses, A. M. (2019). Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers. Bioinformatics, 35(18), 3232–3239. https://doi.org/10.1093/bioinformatics/btz064

Variational infinite heterogeneous mixture model for semi-supervised clustering of heart enhancers

Abstract

Cite

Register to see more suggestions