Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization

Sunghwan Kim; Steffi Oesterreich; Seyoung Kim; Yongseok Park; George C. Tseng

Journal ArticleOPEN ACCESS

Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization

Biostatistics (2017) 18(1) 165-179

DOI: 10.1093/biostatistics/kxw039

31Citations

62Readers

Abstract

With the rapid advances in technologies of microarray and massively parallel sequencing, data of multiple omics sources from a large patient cohort are now frequently seen in many consortium studies. Effective multi-level omics data integration has brought new statistical challenges. One important biological objective of such integrative analysis is to cluster patients in order to identify clinically relevant disease subtypes, which will form basis for tailored treatment and personalized medicine. Several methods have been proposed in the literature for this purpose, including the popular iCluster method used in many cancer applications. When clustering high-dimensional omics data, effective feature selection is critical for better clustering accuracy and biological interpretation. It is also common that a portion of "scattered samples" has patterns distinct from all major clusters and should not be assigned into any cluster as they may represent a rare disease subcategory or be in transition between disease subtypes. In this paper, we firstly propose to improve feature selection of the iCluster factor model by an overlapping sparse group lasso penalty on the omics features using prior knowledge of inter-omics regulatory flows. We then perform regularization over samples to allow clustering with scattered samples and generate tight clusters. The proposed group structured tight iCluster method will be evaluated by two real breast cancer examples and simulations to demonstrate its improved clustering accuracy, biological interpretation, and ability to generate coherent tight clusters.

Author supplied keywords

Cite

CITATION STYLE

APA

Kim, S., Oesterreich, S., Kim, S., Park, Y., & Tseng, G. C. (2017). Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization. Biostatistics, 18(1), 165–179. https://doi.org/10.1093/biostatistics/kxw039

Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization

Abstract

Author supplied keywords

Cite

Register to see more suggestions