Similarity searches in the databases of chemical fingerprints are a fundamental task in discovering novel drug-like molecules. Multibit trees have a data structure that enables fast similarity searches of chemical fingerprints (Kristensen et al., WABI'09). A standard pointer-based representation of multibit trees consumes a large amount of memory to index large-scale fingerprint databases. To make matters worse, original fingerprint databases need to be stored in memory to filter out false positives. A succinct data structure is compact and enables fast operations. Many succinct data structures have been proposed thus far, and have been applied to many fields such as full text indexing and genome mapping. We present compact representations of both multibit trees and fingerprint databases by applying these data structures. Experiments revealed that memory usage in our representations was much smaller than that of the standard pointer-based representation. Moreover, our representations enabled us to efficiently perform PubChem-scale similarity searches. © 2012 Springer-Verlag.
CITATION STYLE
Tabei, Y. (2012). Succinct multibit tree: Compact representation of multibit trees by using succinct data structures in chemical fingerprint searches. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7534 LNBI, pp. 201–213). https://doi.org/10.1007/978-3-642-33122-0_16
Mendeley helps you to discover research relevant for your work.