Machine learning-driven noise separation in high variation genomics sequencing datasets

3Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Genomics studies have increasingly had to deal with datasets containing high variation between the sequenced nucleotide chains. This is most common in metagenomics studies and polyploid studies, where the biological nature of studied samples requires analysis of multiple variants of nearly identical sequences. The high variation makes it more difficult to determine the correct nucleotide sequences, as well as to distinguish signal from noise, producing digital results with higher error rates than the ones that can be achieved in samples with low variation. This paper presents an original pure machine learning-based approach for detecting and potentially correcting those errors. It uses a generic machine learning-based model that can be applied to different types of sequencing data with minor modifications. As presented in a separate part of this work, these models can be combined with data-specific error candidate selection to apply the models on, for a refined error discovery, but as shown here, can also be used independently.

Cite

CITATION STYLE

APA

Krachunov, M., Nisheva, M., & Vassilev, D. (2018). Machine learning-driven noise separation in high variation genomics sequencing datasets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11089 LNAI, pp. 173–185). Springer Verlag. https://doi.org/10.1007/978-3-319-99344-7_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free