High-throughput sequencing makes possible to process samples containing multiple genomic sequences and then estimate their frequencies or even assemble them. The maximum likelihood estimation of frequencies of the sequences based on observed reads can be efficiently performed using expectation-maximization (EM) method assuming that we know sequences present in the sample. Frequently, such knowledge is incomplete, e.g., in RNA-seq not all isoforms are known and when sequencing viral quasispecies their sequences are unknown. We propose to enhance EM with a virtual string and incorporate it into frequency estimation tools for RNA-Seq and quasispecies sequencing. Our simulations show that EM enhanced with the virtual string estimates string frequencies more accurately than the original methods and that it can find the reads from missing quasispecies thus enabling their reconstruction. © 2011 Springer-Verlag.
CITATION STYLE
Mangul, S., Astrovskaya, I., Nicolae, M., Tork, B., Mandoiu, I., & Zelikovsky, A. (2011). Maximum likelihood estimation of incomplete genomic spectrum from HTS data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6833 LNBI, pp. 213–224). https://doi.org/10.1007/978-3-642-23038-7_19
Mendeley helps you to discover research relevant for your work.