Approximate matching of run-length compressed strings

Veli Mäkinen; Gonzalo Navarro; Esko Ukkonen

Conference Proceedings

Approximate matching of run-length compressed strings

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2001) 2089 31-49

DOI: 10.1007/3-540-48194-x_3

8Citations

10Readers

Get full text

Abstract

We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n, compressed to m′ and n′ runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m′n+n′m) complexity. This approach gives also an algorithm for approximate searching of a pattern ofm letters (m′ runs) in a text of n letters (n′ runs) in O(mm′ n′) time, both for LCS and Levenshtein models. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m′ n′) expected case complexity. Experimental results are provided to support the conjecture.

Cite

CITATION STYLE

APA

Mäkinen, V., Navarro, G., & Ukkonen, E. (2001). Approximate matching of run-length compressed strings. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2089, pp. 31–49). Springer Verlag. https://doi.org/10.1007/3-540-48194-x_3

Approximate matching of run-length compressed strings

Abstract

Cite

Register to see more suggestions