BACKGROUND: Transcriptomic structural variants (TSVs)-large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally challenging problem. Among all the confounding factors, sample heterogeneity, where each sample contains multiple distinct alleles, poses a critical obstacle to accurate TSV prediction. RESULTS: To improve TSV detection in heterogeneous RNA-seq samples, we introduce the Multiple Compatible Arrangements Problem (MCAP), which seeks k genome arrangements that maximize the number of reads that are concordant with at least one arrangement. This models a heterogeneous or diploid sample. We prove that MCAP is NP-complete and provide a ¼ -approximation algorithm for k = 1 and a ¾ -approximation algorithm for the diploid case ( k = 2 ) assuming an oracle for k = 1 . Combining these, we obtain a 3/16 -approximation algorithm for MCAP when k = 2 (without an oracle). We also present an integer linear programming formulation for general k. We characterize the conflict structures in the graph that require k > 1 alleles to satisfy read concordancy and show that such structures are prevalent. CONCLUSIONS: We show that the solution to MCAP accurately addresses sample heterogeneity during TSV detection. Our algorithms have improved performance on TCGA cancer samples and cancer cell line samples compared to a TSV calling tool, SQUID. The software is available at https://github.com/Kingsford-Group/diploidsquid.
CITATION STYLE
Qiu, Y., Ma, C., Xie, H., & Kingsford, C. (2020). Detecting transcriptomic structural variants in heterogeneous contexts via the Multiple Compatible Arrangements Problem. Algorithms for Molecular Biology, 15(1). https://doi.org/10.1186/s13015-020-00170-5
Mendeley helps you to discover research relevant for your work.