New perspectives on the prefix array

18Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper we consider the of a string in which and, for i∈>∈1, iff k is the largest integer such that . The prefix array is closely related to the : an integer array [1..n] such that iff the length of the longest border of is k. Border arrays or their variants are used in many string algorithms and prefix arrays can be used directly for pattern-matching. It is well known that for regular strings provides all the information that does; we show however that for indeterminate strings (those containing entries that match a subset of the alphabet) actually provides more information, in fact still enabling all the borders of every prefix of to be specified. Since a lot of the entries of are expected to be zeros, it is natural to represent in compressed form using integer arrays and , where m is the number of nonzero entries in and iff the nonzero entry in occurs in position and takes the value . The expected value of m is n/σ-∈1, where σ is the alphabet size. The straightforward way of computing POS/LEN requires computing first, therefore requires O(n) extra space. We describe two Θ(n)-time algorithms PL1 & PL2 to compute POS/LEN for regular strings using only 8m bytes of storage in addition to the n bytes required for . PL1 requires about one-third the time of the standard border array algorithm MP on English-language strings; PL2 executes faster than MP on both English-language and highly periodic strings on {a,b}. For indeterminate strings, we describe an extension IPL of PL1 that computes POS/LEN in O(n 2) worst-case time (though generally much faster), still using only 8m bytes of additional storage. For both regular and indeterminate strings, the compressed form of can be used for efficient pattern-matching. © 2009 Springer Berlin Heidelberg.

Cite

CITATION STYLE

APA

Smyth, W. F., & Wang, S. (2008). New perspectives on the prefix array. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5280 LNCS, pp. 133–143). Springer Verlag. https://doi.org/10.1007/978-3-540-89097-3_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free