In genome sequencing there is a trend not to complete the sequence of the whole genomes. Motivated by this Muñoz et al. recently studied the (one-sided) problem of filling an incomplete multichromosomal genome (or scaffold) H with respect to a complete target genome C such that the resulting genomic (or double-cut-and-join, DCJ for short) distance between H′ and C is minimized, where H′ is the corresponding filled scaffold. Jiang et al. recently extended this result to both the breakpoint distance and the DCJ distance and to the (two-sided) case when even C has some missing genes, and solved all these problems in polynomial time. However, when H and C contain duplicated genes, the corresponding breakpoint distance problem becomes NP-complete and there has been no efficient approximation or FPT algorithms for it. In this paper, we mainly consider the one-sided problem of filling scaffolds with gene repetitions so as to maximize the number of adjacencies between the two resulting sequences; namely, given an incomplete genome I and a complete genome G, both with gene repetitions, fill in the missing genes to obtain I′ such that the number of adjacencies between I′ and G is maximized. We prove that this problem is also NP-complete and present an efficient 1.33-approximation for the problem. The hardness result also holds for the two-sided problem for which a trivial factor-2 approximation exists. We also present FPT algorithms for some special cases of this problem. © 2011 Springer-Verlag.
CITATION STYLE
Jiang, H., Zhong, F., & Zhu, B. (2011). Filling scaffolds with gene repetitions: Maximizing the number of adjacencies. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6661 LNCS, pp. 55–64). https://doi.org/10.1007/978-3-642-21458-5_7
Mendeley helps you to discover research relevant for your work.