Inverted files versus suffix arrays for locating patterns in primary memory

Simon J. Puglisi; W. F. Smyth; Andrew Turpin

Conference Proceedings

Inverted files versus suffix arrays for locating patterns in primary memory

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2006) 4209 LNCS 122-133

DOI: 10.1007/11880561_11

29Citations

10Readers

Get full text

Abstract

Recent advances in the asymptotic resource costs of pattern matching with compressed suffix arrays are attractive, but a key rival structure, the compressed inverted file, has been dismissed or ignored in papers presenting the new structures. In this paper we examine the resource requirements of compressed suffix array algorithms against compressed inverted file data structures for general pattern matching in genomic and English texts. In both cases, the inverted file indexes g-grams, thus allowing full pattern matching capabilities, rather than simple word based search, making their functionality equivalent to the compressed suffix array structures. When using equivalent memory for the two structures, inverted files are faster at reporting the location of patterns when the number of occurrences of the patterns is high. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Puglisi, S. J., Smyth, W. F., & Turpin, A. (2006). Inverted files versus suffix arrays for locating patterns in primary memory. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4209 LNCS, pp. 122–133). Springer Verlag. https://doi.org/10.1007/11880561_11

Inverted files versus suffix arrays for locating patterns in primary memory

Abstract

Cite

Register to see more suggestions