Exposing the Limitations of Molecular Machine Learning with Activity Cliffs

99Citations
Citations of this article
187Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs-pairs of molecules that are highly similar in their structure but exhibit large differences in potency-have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated "activity-cliff-centered" metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs.

References Powered by Scopus

Long Short-Term Memory

76931Citations
N/AReaders
Get full text

Deep learning

63549Citations
N/AReaders
Get full text

ImageNet: A Large-Scale Hierarchical Image Database

51142Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Artificial intelligence for natural product drug discovery

104Citations
N/AReaders
Get full text

Augmenting large language models with chemistry tools

84Citations
N/AReaders
Get full text

Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering

53Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Van Tilborg, D., Alenicheva, A., & Grisoni, F. (2022). Exposing the Limitations of Molecular Machine Learning with Activity Cliffs. Journal of Chemical Information and Modeling, 62(23), 5938–5951. https://doi.org/10.1021/acs.jcim.2c01073

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 52

60%

Researcher 28

33%

Professor / Associate Prof. 6

7%

Readers' Discipline

Tooltip

Chemistry 35

50%

Computer Science 13

19%

Pharmacology, Toxicology and Pharmaceut... 11

16%

Biochemistry, Genetics and Molecular Bi... 11

16%

Article Metrics

Tooltip
Mentions
Blog Mentions: 1
News Mentions: 1
References: 1

Save time finding and organizing research with Mendeley

Sign up for free