Abstract
Background: Structural variants (SVs) are genomic polymorphisms defned by their length (>50 bp). The usual types of SVs are deletions, insertions, translocations, inversions, and copy number variants. SV detection and genotyping is fundamental given the role of SVs in phenomena such as phenotypic variation and evolutionary events. Thus, methods to identify SVs using long-read sequencing data have been recently developed. Findings: We present an accurate and effcient algorithm to predict germline SVs from long-read sequencing data. The algorithm starts collecting evidence (signatures) of SVs from read alignments. Then, signatures are clustered based on a Euclidean graph with coordinates calculated from lengths and genomic positions. Clustering is performed by the DBSCAN algorithm, which provides the advantage of delimiting clusters with high resolution. Clusters are transformed into SVs and a Bayesian model allows to precisely genotype SVs based on their supporting evidence. This algorithm is integrated into the single sample variants detector of the Next Generation Sequencing Experience Platform, which facilitates the integration with other functionalities for genomics analysis. We performed multiple benchmark experiments, including simulation and real data, representing different genome profles, sequencing technologies (PacBio HiFi, ONT), and read depths. Conclusion: The results show that our approach outperformed state-of-the-art tools on germline SV calling and genotyping, especially at low depths, and in error-prone repetitive regions. We believe this work signifcantly contributes to the development of bioinformatic strategies to maximize the use of long-read sequencing technologies.
Author supplied keywords
Cite
CITATION STYLE
Gaitán, N., & Duitama, J. (2024). A graph clustering algorithm for detection and genotyping of structural variants from long reads. GigaScience, 13. https://doi.org/10.1093/gigascience/giad112
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.