Predicting the formation of disinfection by-products using multiple linear and machine learning regression

31Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Controlling the formation of disinfection byproducts (DBPs) requires prior knowledge of DBP formation potential. Mathematical models can accurately predict the formation of DBPs and have the advantage of reducing laboratory tests and related costs. Researchers continue to develop new models for specific regions but rarely used external data sets to evaluate the predictive ability of previous models. Most of the models focus on total trihalomethanes (THMs), and the predictive models for emerging DBPs (e.g., chloral hydrate (CH)) are lacking. Moreover, little discussion is available on comparing linear and machine learning (ML) algorithms in predicting the formation of DBPs. This study investigated the predictive models of CH, chloroform, THMs, dichloroacetic acid, trichloroacetic acid, and haloacetic acids based on stepwise multiple linear regression and ML regression using easily monitored water quality parameters (i.e., pH, UV254, and total organic carbon (TOC)). Among these parameters, UV254 is the dominant parameter in predicting the formation of target DBPs and deserves more attention in future studies. Among the models for the target DBPs, the model for CH using stepwise multiple linear regression was shown as follows: LnCH = 8.945 + 0.558 × Ln(UV254) – 2.37 × Ln(pH) + 0.152 × Ln(TOC). The support vector regression (MAPE = 2.578–5.798%, R2 = 0.665–0.802) and random forest regression (MAPE = 2.867–5.346%, R2 = 0.671–0.965) performed better than traditional stepwise linear regression (MAPE = 2.857–6.671%, R2 = 0.602–0.770) in the training and testing set. This emphasized that ML algorithms were viable alternatives to conventional linear regression in the management of DBPs.

Cite

CITATION STYLE

APA

Peng, F., Lu, Y., Wang, Y., Yang, L., Yang, Z., & Li, H. (2023). Predicting the formation of disinfection by-products using multiple linear and machine learning regression. Journal of Environmental Chemical Engineering, 11(5). https://doi.org/10.1016/j.jece.2023.110612

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free