Clustering in human microbiome sequencing data: A distance-based unsupervised learning model

9Citations
Citations of this article
51Readers
Mendeley users who have this article in their library.

Abstract

Modeling and analyzing the human microbiome allows the assessment of the microbial community and its impacts on human health. The composition of the microbiome can be quantified using 16S rRNA technology into sequencing data, which are usually skewed and heavy-tailed with excess zeros. Clustering methods are useful in personalized medicine to identify subgroups for the stratification of patients. However, there is currently a lack of a standardized clustering method for complex microbiome sequencing data. We propose a clustering algorithm with a specific beta diversity measure that can address the presence–absence bias encountered in sparse count data and effectively measure the sample distances for sample stratification. Our distance measure used for clustering is derived from a parametric-based mixture model producing sample-specific distributions that are conditional on the observed operational taxonomic unit (OTU) counts and estimated mixture weights. The method can provide accurate estimates of the true zero proportions and thus construct a precise beta diversity measure. Extensive simulation studies have been conducted and suggest that the proposed method achieves substantial clustering improvement compared with some widely used distance measures when a large proportion of zeros is presented. The proposed algorithm was implemented for a human gut microbiome study on instances of Parkinson’s disease to identify distinct microbiome states with biological interpretations.

Cite

CITATION STYLE

APA

Yang, D., & Xu, W. (2020). Clustering in human microbiome sequencing data: A distance-based unsupervised learning model. Microorganisms, 8(10), 1–18. https://doi.org/10.3390/microorganisms8101612

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free