Motivation Functional somatic mutations within coding amino acid sequences confer growth advantage in pathogenic process. Most existing methods for identifying cancer-related mutations focus on the single amino acid or the entire gene level. However, gain-of-function mutations often cluster in specific protein regions instead of existing independently in the amino acid sequences. Some approaches for identifying mutation clusters with mutation density on amino acid chain have been proposed recently. But their performance in identification of mutation clusters remains to be improved. Results Here we present a Data-adaptive Mutation Clustering Method (DMCM), in which kernel density estimate (KDE) with a data-adaptive bandwidth is applied to estimate the mutation density, to find variable clusters with different lengths on amino acid sequences. We apply this approach in the mutation data of 571 genes in over twenty cancer types from The Cancer Genome Atlas (TCGA). We compare the DMCM with M 2 C, OncodriveCLUST and Pfam Domain and find that DMCM tends to identify more significant clusters. The cross-validation analysis shows DMCM is robust and cluster cancer type enrichment analysis shows that specific cancer types are enriched for specific mutation clusters. Availability and implementation DMCM is written in Python and analysis methods of DMCM are written in R. They are all released online, available through https://github.com/XinguoLu/DMCM. Supplementary informationSupplementary dataare available at Bioinformatics online.
CITATION STYLE
Lu, X., Qian, X., Li, X., Miao, Q., & Peng, S. (2019). DMCM: A Data-adaptive Mutation Clustering Method to identify cancer-related mutation clusters. Bioinformatics, 35(3), 389–397. https://doi.org/10.1093/bioinformatics/bty624
Mendeley helps you to discover research relevant for your work.