Cluster learning-assisted directed evolution

Yuchi Qiu; Jian Hu; Guo Wei Wei

Journal ArticleOPEN ACCESS

Cluster learning-assisted directed evolution

Nature Computational Science (2021) 1(12) 809-818

DOI: 10.1038/s43588-021-00168-y

29Citations

71Readers

Get full text

Abstract

Directed evolution, a strategy for protein engineering, optimizes protein properties (that is, fitness) by expensive and time-consuming screening or selection of a large mutational sequence space. Machine learning-assisted directed evolution (MLDE), which screens sequence properties in silico, can accelerate the optimization and reduce the experimental burden. This work introduces an MLDE framework, cluster learning-assisted directed evolution (CLADE), which combines hierarchical unsupervised clustering sampling and supervised learning to guide protein engineering. The clustering sampling selectively picks and screens variants in targeted subspaces, which guides the subsequent generation of diverse training sets. In the last stage, accurate predictions via supervised learning models improve the final outcomes. By sequentially screening 480 sequences out of 160,000 in a four-site combinatorial library with five equal experimental batches, CLADE achieves global maximal fitness hit rates of up to 91.0% and 34.0% for the GB1 and PhoQ datasets, respectively, improved from the values of 18.6% and 7.2% obtained by random sampling-based MLDE.

Cite

CITATION STYLE

APA

Qiu, Y., Hu, J., & Wei, G. W. (2021). Cluster learning-assisted directed evolution. Nature Computational Science, 1(12), 809–818. https://doi.org/10.1038/s43588-021-00168-y

Cluster learning-assisted directed evolution

Abstract

Cite

Register to see more suggestions