It is challenging to identify somatic variants from high-throughput sequence reads due to tumor heterogeneity, sub-clonality, and sequencing artifacts. In this study, we evaluated the performance of eight primary somatic variant callers and multiple ensemble methods using both real and synthetic whole-genome sequencing, whole-exome sequencing, and deep targeted sequencing datasets with the NA12878 cell line. The test results showed that a simple consensus approach can significantly improve performance even with a limited number of callers and is more robust and stable than machine learning based ensemble approaches. To fully exploit the multi-callers, we also developed a software package, SomaticCombiner, that can combine multiple callers and integrates a new variant allelic frequency (VAF) adaptive majority voting approach, which can maintain sensitive detection for variants with low VAFs.
CITATION STYLE
Wang, M., Luo, W., Jones, K., Bian, X., Williams, R., Higson, H., … Zhu, B. (2020). SomaticCombiner: improving the performance of somatic variant calling based on evaluation tests and a consensus approach. Scientific Reports, 10(1). https://doi.org/10.1038/s41598-020-69772-8
Mendeley helps you to discover research relevant for your work.