Statistical distribution of chemical fingerprints

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Binary fingerprints are binary vectors used to represent chemical molecules by recording the presence or absence of particular substructures, such as labeled paths in the 2D graph of bonds. Complete fingerprints are often reduced to a compressed format-of typical dimension n = 512 or n -1024-by using a simple congruence operation. The statistical properties of complete or compressed fingerprints representations are important since fingerprints are used to rapidly search large databases and to develop statistical machine learning methods in chemoinformatics. Here we present an empirical and mathematical analysis of the distribution of complete and compressed fingerprints. In particular, we derive formulas that provide good approximation for the expected number of bits set to one in a compressed fingerprint, given its uncompressed version, and vice versa. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Joshua Swamidass, S., & Baldi, P. (2006). Statistical distribution of chemical fingerprints. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3849 LNAI, pp. 11–18). https://doi.org/10.1007/11676935_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free