Machine learning-aided scoring of synthesis difficulties for designer chromosomes

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Designer chromosomes are artificially synthesized chromosomes. Nowadays, these chromosomes have numerous applications ranging from medical research to the development of biofuels. However, some chromosome fragments can interfere with the chemical synthesis of designer chromosomes and eventually limit the widespread use of this technology. To address this issue, this study aimed to develop an interpretable machine learning framework to predict and quantify the synthesis difficulties of designer chromosomes in advance. Through the use of this framework, six key sequence features leading to synthesis difficulties were identified, and an eXtreme Gradient Boosting model was established to integrate these features. The predictive model achieved high-quality performance with an AUC of 0.895 in cross-validation and an AUC of 0.885 on an independent test set. Based on these results, the synthesis difficulty index (S-index) was proposed as a means of scoring and interpreting synthesis difficulties of chromosomes from prokaryotes to eukaryotes. The findings of this study emphasize the significant variability in synthesis difficulties between chromosomes and demonstrate the potential of the proposed model to predict and mitigate these difficulties through the optimization of the synthesis process and genome rewriting.

Cite

CITATION STYLE

APA

Zheng, Y., Song, K., Xie, Z. X., Han, M. Z., Guo, F., & Yuan, Y. J. (2023). Machine learning-aided scoring of synthesis difficulties for designer chromosomes. Science China Life Sciences, 66(7), 1615–1625. https://doi.org/10.1007/s11427-023-2306-x

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free