Motivation: Data clustering is indispensable for identifying biologically relevant molecular features in large-scale omics experiments with thousands of measurements at multiple conditions. Optimal clustering results yield groups of functionally related features that may include genes, proteins and metabolites in biological processes and molecular networks. Omics experiments typically include replicated measurements of each feature within a given condition to statistically assess featurespecific variation. Current clustering approaches ignore this variation by averaging, which often leads to incorrect cluster assignments. Results: We present VSClust that accounts for feature-specific variance. Based on an algorithm derived from fuzzy clustering, VSClust unifies statistical testing with pattern recognition to cluster the data into feature groups that more accurately reflect the underlying molecular and functional behavior. We apply VSClust to artificial and experimental datasets comprising hundreds to >80 000 features across 6-20 different conditions including genomics, transcriptomics, proteomics and metabolomics experiments. VSClust avoids arbitrary averaging methods, outperforms standard fuzzy c-means clustering and simplifies the data analysis workflow in large-scale omics studies.
CITATION STYLE
Schwämmle, V., & Jensen, O. N. (2018). VSClust: Feature-based variance-sensitive clustering of omics data. Bioinformatics, 34(17), 2965–2972. https://doi.org/10.1093/bioinformatics/bty224
Mendeley helps you to discover research relevant for your work.