BACKGROUND: Tumorigenesis is an evolutionary process by which tumor cells acquire mutations through successive diversification and differentiation. There is much interest in reconstructing this process of evolution due to its relevance to identifying drivers of mutation and predicting future prognosis and drug response. Efforts are challenged by high tumor heterogeneity, though, both within and among patients. In prior work, we showed that this heterogeneity could be turned into an advantage by computationally reconstructing models of cell populations mixed to different degrees in distinct tumors. Such mixed membership model approaches, however, are still limited in their ability to dissect more than a few well-conserved cell populations across a tumor data set. RESULTS: We present a method to improve on current mixed membership model approaches by better accounting for conserved progression pathways between subsets of cancers, which imply a structure to the data that has not previously been exploited. We extend our prior methods, which use an interpretation of the mixture problem as that of reconstructing simple geometric objects called simplices, to instead search for structured unions of simplices called simplicial complexes that one would expect to emerge from mixture processes describing branches along an evolutionary tree. We further improve on the prior work with a novel objective function to better identify mixtures corresponding to parsimonious evolutionary tree models. We demonstrate that this approach improves on our ability to accurately resolve mixtures on simulated data sets and demonstrate its practical applicability on a large RNASeq tumor data set. CONCLUSIONS: Better exploiting the expected geometric structure for mixed membership models produced from common evolutionary trees allows us to quickly and accurately reconstruct models of cell populations sampled from those trees. In the process, we hope to develop a better understanding of tumor evolution as well as other biological problems that involve interpreting genomic data gathered from heterogeneous populations of cells.
Mendeley saves you time finding and organizing research
Choose a citation style from the tabs below