Accelerating Formulation Design via Machine Learning: Generating a High-throughput Shampoo Formulations Dataset

2Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Liquid formulations are ubiquitous yet have lengthy product development cycles owing to the complex physical interactions between ingredients making it difficult to tune formulations to customer-defined property targets. Interpolative ML models can accelerate liquid formulations design but are typically trained on limited sets of ingredients and without any structural information, which limits their out-of-training predictive capacity. To address this challenge, we selected eighteen formulation ingredients covering a diverse chemical space to prepare an open experimental dataset for training ML models for rinse-off formulations development. The resulting design space has an over 50-fold increase in dimensionality compared to our previous work. Here, we present a dataset of 812 formulations, including 294 stable samples, which cover the entire design space, with phase stability, turbidity, and high-fidelity rheology measurements generated on our semi-automated, ML-driven liquid formulations workflow. Our dataset has the unique attribute of sample-specific uncertainty measurements to train predictive surrogate models.

Cite

CITATION STYLE

APA

Chitre, A., Querimit, R. C. M., Rihm, S. D., Karan, D., Zhu, B., Wang, K., … Lapkin, A. A. (2024). Accelerating Formulation Design via Machine Learning: Generating a High-throughput Shampoo Formulations Dataset. Scientific Data, 11(1). https://doi.org/10.1038/s41597-024-03573-w

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free