For many bioinformatics applications it is crucial to know frequencies of all subsequences of length k (k-mers) constructed from reads (short-reads) that are obtained in process of DNA sequencing. We present an effective parallel algorithm for k-mers counting that is based on nested bucket sort algorithm, whereby sizes of partitions and number of buckets per partition are precomputed. The proposed algorithm is designed for multicore architecture and properly combines MPI framework (OpenMPI) with POSIX threads achieving very good performance. According to our experiments it overcomes existing solutions in running time when compared on the genome of Drosophila melanogaster (SRX040485).
CITATION STYLE
Farkaš, T., Kubán, P., & Lucká, M. (2016). Effective parallel multicore-optimized K-mers counting algorithm. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9587, pp. 469–477). Springer Verlag. https://doi.org/10.1007/978-3-662-49192-8_38
Mendeley helps you to discover research relevant for your work.