Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis

19Citations
Citations of this article
41Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: To improve the utility of PubChem, a public repository containing biological activities of small molecules, the PubChem3D project adds computationally-derived three-dimensional (3-D) descriptions to the small-molecule records contained in the PubChem Compound database and provides various search and analysis tools that exploit 3-D molecular similarity. Therefore, the efficient use of PubChem3D resources requires an understanding of the statistical and biological meaning of computed 3-D molecular similarity scores between molecules. Results: The present study investigated effects of employing multiple conformers per compound upon the 3-D similarity scores between ten thousand randomly selected biologically-tested compounds (10-K set) and between non-inactive compounds in a given biological assay (156-K set). When the "best-conformer-pair" approach, in which a 3-D similarity score between two compounds is represented by the greatest similarity score among all possible conformer pairs arising from a compound pair, was employed with ten diverse conformers per compound, the average 3-D similarity scores for the 10-K set increased by 0.11, 0.09, 0.15, 0.16, 0.07, and 0.18 for ST §ssup§ ST-opt §esup§, CT §ssup§ ST-opt §esup§, ComboT §ssup§ ST-opt §esup§, ST §ssup§ CT-opt §esup§, CT §ssup§ CT-opt §esup§, and ComboT §ssup§ CT-opt §esup§, respectively, relative to the corresponding averages computed using a single conformer per compound. Interestingly, the best-conformer-pair approach also increased the average 3-D similarity scores for the non-inactive-non-inactive (NN) pairs for a given assay, by comparable amounts to those for the random compound pairs, although some assays showed a pronounced increase in the per-assay NN-pair 3-D similarity scores, compared to the average increase for the random compound pairs. Conclusion: These results suggest that the use of ten diverse conformers per compound in PubChem bioassay data analysis using 3-D molecular similarity is not expected to increase the separation of non-inactive from random and inactive spaces "on average", although some assays show a noticeable separation between the non-inactive and random spaces when multiple conformers are used for each compound. The present study is a critical next step to understand effects of conformational diversity of the molecules upon the 3-D molecular similarity and its application to biological activity data analysis in PubChem. The results of this study may be helpful to build search and analysis tools that exploit 3-D molecular similarity between compounds archived in PubChem and other molecular libraries in a more efficient way. © 2012 Kim et al.; licensee Chemistry Central Ltd.

Cite

CITATION STYLE

APA

Kim, S., Bolton, E. E., & Bryant, S. H. (2012). Effects of multiple conformers per compound upon 3-D similarity search and bioassay data analysis. Journal of Cheminformatics, 4(1). https://doi.org/10.1186/1758-2946-4-28

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free