Efficient regular expression matching on compressed strings

Yutong Han; Bin Wang; Xiaochun Yang; Huaijie Zhu

Conference Proceedings

Efficient regular expression matching on compressed strings

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10178 LNCS 219-234

DOI: 10.1007/978-3-319-55699-4_14

1Citations

2Readers

Get full text

Abstract

Existing methods for regular expression matching on LZ78 compressed strings do not perform efficiently. Moreover, LZ78 compression has some shortcomings, such as high compression ratio and slower decompression speed than LZ77 (a variant of LZ78). In this paper, we study regular expression matching on LZ77 compressed strings. To address this problem, we propose an efficient algorithm, namely, RELZ, utilizing the positive factors, i.e., a prefix and a suffix, and negative factors (Negative factors are substrings that cannot appear in an answer.) of the regular expression to prune the candidates. For the sake of quickly locating these two kinds of factors on the compressed string without decompression, we design a variant suffix trie index, called SSLZ. In addition, we construct bitmaps for factors of regular expression to detect potential region and propose block filtering to reduce candidates. At last, we conduct a comprehensive performance evaluation using five real datasets to validate our ideas and the proposed algorithms. The experimental result shows that our RELZ algorithm outperforms the existing algorithms significantly.

Author supplied keywords

Cite

CITATION STYLE

APA

Han, Y., Wang, B., Yang, X., & Zhu, H. (2017). Efficient regular expression matching on compressed strings. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10178 LNCS, pp. 219–234). Springer Verlag. https://doi.org/10.1007/978-3-319-55699-4_14

Efficient regular expression matching on compressed strings

Abstract

Author supplied keywords

Cite

Register to see more suggestions