Abstract
Modeling correlation structures is a challenge in bioinformatics, especially when dealing with high throughput genomic data. A compound hierarchical correlated beta mixture (CBM) with an exchangeable correlation structure is proposed to cluster genetic vectors into mixture components. The correlation coefficient, ρ, is homogenous within a mixture component and heterogeneous between mixture components. A random CBM with ρ∼f(ρ/π) brings more flexibility in explaining correlation variations among genetic variables. Expectation-Maximization (EM) algorithm and Stochastic Expectation-Maximization (SEM) algorithm are used to estimate parameters of CBM. The number of mixture components can be determined using model selection criteria such as AIC, BIC and ICL-BIC. Extensive simulation studies were conducted to compare EM, SEM and model selection criteria. Simulation results suggest that CBM outperforms the traditional beta mixture model with lower estimation bias and higher classification accuracy. The proposed method is applied to cluster transcription factor-DNA binding probability in mouse genome data generated by Lahdesmaki and others (2008, Probabilistic inference of transcription factor binding from multiple data sources. PLoS One, 3, e1820). The results reveal distinct clusters of transcription factors when binding to promoter regions of genes in JAK-STAT, MAPK and other two pathways.
Author supplied keywords
Cite
CITATION STYLE
Dai, H., & Charnigo, R. (2015). Compound hierarchical correlated beta mixture with an application to cluster mouse transcription factor DNA binding data. Biostatistics, 16(4), 641–654. https://doi.org/10.1093/biostatistics/kxv016
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.