Over the last five years, virtual screening of ultralarge synthesis on-demand libraries has emerged as a powerful tool for hit identification in drug discovery programs. As these libraries have grown to tens of billions of molecules, we have reached a point where it is no longer cost-effective to screen every molecule virtually. To address these challenges, several groups have developed heuristic search methods to rapidly identify the best molecules on a virtual screen. This article describes the application of Thompson sampling (TS), an active learning approach that streamlines the virtual screening of large combinatorial libraries by performing a probabilistic search in the reagent space, thereby never requiring the full enumeration of the library. TS is a general technique that can be applied to various virtual screening modalities, including 2D and 3D similarity search, docking, and application of machine-learning models. In an illustrative example, we show that TS can identify more than half of the top 100 molecules from a docking-based virtual screen of 335 million molecules by evaluating 1% of the data set.
CITATION STYLE
Klarich, K., Goldman, B., Kramer, T., Riley, P., & Walters, W. P. (2024). Thompson Sampling─An Efficient Method for Searching Ultralarge Synthesis on Demand Databases. Journal of Chemical Information and Modeling, 64(4), 1158–1171. https://doi.org/10.1021/acs.jcim.3c01790
Mendeley helps you to discover research relevant for your work.