Abstract
A critical step of genome sequence analysis is the mapping of sequenced DNA fragments (i.e., reads) collected from an individual to a known linear reference genome sequence (i.e., sequence-tosequence mapping). Recent works replace the linear reference sequence with a graph-based representation of the reference genome, which captures the genetic variations and diversity across many individuals in a population. Mapping reads to the graph-based reference genome (i.e., sequence-to-graph mapping) results in notable quality improvements in genome analysis. Unfortunately, while sequence-to-sequence mapping is well studied with many available tools and accelerators, sequence-to-graph mapping is a more difcult computational problem, with a much smaller number of practical software tools currently available. We analyze two state-of-the-art sequence-to-graph mapping tools and reveal four key issues. We fnd that there is a pressing need to have a specialized, high-performance, scalable, and low-cost algorithm/hardware co-design that alleviates bottlenecks in both the seeding and alignment steps of sequence-to-graph mapping. Since sequence-to-sequence mapping can be treated as a special case of sequence-to-graph mapping, we aim to design an accelerator that is efcient for both linear and graph-based read mapping. To this end, we propose SeGraM, a universal algorithm/hardware co-designed genomic mapping accelerator that can effectively and efciently support both sequence-to-graph mapping and sequenceto-sequence mapping, for both short and long reads. To our knowledge, SeGraM is the frst algorithm/hardware co-design for accelerating sequence-to-graph mapping. SeGraM consists of two main components: (1) MinSeed, the frst minimizer-based seeding accelerator, which fnds the candidate locations in a given genome graph; and (2) BitAlign, the frst bitvector-based sequence-to-graph alignment accelerator, which performs alignment between a given read and the subgraph identifed by MinSeed. We couple SeGraM with high-bandwidth memory to exploit low latency and highlyparallel memory access, which alleviates the memory bottleneck.
Author supplied keywords
Cite
CITATION STYLE
Cali, D. S., Kanellopoulos, K., Lindegger, J., Bingöl, Z., Kalsi, G. S., Zuo, Z., … Mutlu, O. (2022). SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping. In Proceedings - International Symposium on Computer Architecture (pp. 638–655). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1145/3470496.3527436
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.