The problem to find the coordinates of n points on a line such that the pairwise distances of the points form a given multi-set of (2) distances is known as PARTIAL DIGEST problem, which occurs for instance in DNA physical mapping and de novo sequencing of proteins. Although PARTIAL DIGEST was - as a combinatorial problem - already proposed in the 1930's, its computational complexity is still unknown. In an effort to model real-life data, we introduce two optimization variations of PARTIAL DIGEST that model two different error types that occur in real-life data. First, we study the computational complexity of a minimization version of PARTIAL DIGEST in which only a subset of all pairwise distances is given and the rest are lacking due to experimental errors. We show that this variation is NP-hard to solve exactly. This result answers an open question posed by Pevzner (2000). We then study a maximization version of PARTIAL DIGEST where a superset of all pairwise distances is given, with some additional distances due to inaccurate measurements. We show that this maximization version is NP-hard to approximate to within a factor of |D|1/2-ε for any ε > 0, where |D| is the number of input distances. This inapproximability result is tight up to low-order terms as we give a trivial approximation algorithm that achieves a matching approximation ratio. © Springer-Verlag Berlin Heidelberg 2003.
CITATION STYLE
Cieliebak, M., Eidenbenz, S., & Penna, P. (2003). Noisy data make the partial digest problem NP-hard. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2812, 111–123. https://doi.org/10.1007/978-3-540-39763-2_9
Mendeley helps you to discover research relevant for your work.