Trie methods for representing text

T. H. Merrett; Heping Shang

Conference Proceedings

Trie methods for representing text

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (1993) 730 LNCS 130-145

DOI: 10.1007/3-540-57301-1_9

4Citations

2Readers

Get full text

Abstract

We propose a new trie organization for large text documents requiring secondary storage. Index size is critical in all trie representations of text, and our organization is smaller than all known methods. Access time is as good as the best known method. Tries can be constructed in good time. For an index of 100 million entries, our experiments show size factors of less than 3, as compared with 3.4 for the best previous method. Our measurements show expected access costs of 0.1 sec., and construction times of 18 to 55 hours, depending on the text characteristics. Our organization can also handle dynamic data, and we give new algorithms for inserting and deleting. It supports searches for general patterns, as well as a variety of special searches, such as proximity, range, longest repetitions and most frequent occurrences.

Author supplied keywords

Cite

CITATION STYLE

APA

Merrett, T. H., & Shang, H. (1993). Trie methods for representing text. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 730 LNCS, pp. 130–145). Springer Verlag. https://doi.org/10.1007/3-540-57301-1_9

Trie methods for representing text

Abstract

Author supplied keywords

Cite

Register to see more suggestions