Motivation: The accumulation of somatic mutations plays critical roles in cancer development and progression. However, the global patterns of somatic mutations, especially non-coding mutations, and their roles in defining molecular subtypes of cancer have not been well characterized due to the computational challenges in analysing the complex mutational patterns. Results: Here, we develop a new algorithm, called MutSpace, to effectively extract patient-specific mutational features using an embedding framework for larger sequence context. Our method is motivated by the observation that the mutation rate at megabase scale and the local mutational patterns jointly contribute to distinguishing cancer subtypes, both of which can be simultaneously captured by MutSpace. Simulation evaluations show that MutSpace can effectively characterize mutational features from known patient subgroups and achieve superior performance compared with previous methods. As a proof-of-principle, we apply MutSpace to 560 breast cancer patient samples and demonstrate that our method achieves high accuracy in subtype identification. In addition, the learned embeddings from MutSpace reflect intrinsic patterns of breast cancer subtypes and other features of genome structure and function. MutSpace is a promising new framework to better understand cancer heterogeneity based on somatic mutations. Availability and implementation: Source code of MutSpace can be accessed at: https://github.com/ma-compbio/ MutSpace.
CITATION STYLE
Zhang, Y., Xiao, Y., Yang, M., & Ma, J. (2020). Cancer mutational signatures representation by large-scale context embedding. Bioinformatics, 36, I309–I316. https://doi.org/10.1093/BIOINFORMATICS/BTAA433
Mendeley helps you to discover research relevant for your work.