Tree contraction for compressed suffix arrays on modern processors

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We propose a novel processor-aware compaction technique for pattern matching that is widely-used in databases, information retrieval, and text mining. As the amount of data increases, it is getting important to efficiently store data on memory. A compressed suffix array (CSA) is a compact data structure for in-memory pattern matching. However, CSA suffers from tremendous processor penalties, such as a flood of instructions and cache/TLB misses due to the lack of processor aware design. To mitigate these penalties, we propose a novel compaction technique for CSA, called suffix trie contraction (STC). The frequently accessed suffixes of CSA are transformed to a trie (e.g., a suffix trie), and then inter-connected nodes in the trie are repeatedly ’contracted’ to a single node, which enables lightweight sequential scans in a processor friendly way. In detail, STC consists of two contraction techniques: fixed length path contraction (FPC) and sub-tree contraction (SC). FPC is applied to the parts with a few branches in the trie, and SC is applied to the parts with many branches. Our experiment results indicate that FPC outperforms naive CSA by two orders of magnitude for short pattern queries and by three times for long pattern queries. As the number of branches inside the trie increases, SC gradually becomes superior to CSA and FPC for short pattern queries. Finally, the latency and throughput of STC are 7 times and 72 times better than those of CSA for the TREC test data set at the expense of additional 7.1 % space overhead.

Cite

CITATION STYLE

APA

Yamamuro, T., Onizuka, M., & Honjo, T. (2015). Tree contraction for compressed suffix arrays on modern processors. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9050, pp. 363–378). Springer Verlag. https://doi.org/10.1007/978-3-319-18123-3_22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free