Ensuring safe and clean drinking water for communities is crucial, and necessitates effective tools to monitor and predict water quality due to challenges from population growth, industrial activities, and environmental pollution. This paper evaluates the performance of multiple linear regression (MLR) and nineteen machine learning (ML) models, including algorithms based on regression, decision tree, and boosting. Models include linear regression (LR), least angle regression (LAR), Bayesian ridge chain (BR), ridge regression (Ridge), k-nearest neighbor regression (K-NN), extra tree regression (ET), and extreme gradient boosting (XGBoost). The research’s objective is to estimate the surface water quality of Al-Seine Lake in Lattakia governorate using the MLR and ML models. We used water quality data from the drinking water lake of Lattakia City, Syria, during years 2021–2022 to determine the water quality index (WQI). The predictive performance of both the MLR and ML models was evaluated using statistical methods such as the coefficient of determination (R2) and the root mean square error (RMSE) to estimate their efficiency. The results indicated that the MLR model and three of the ML models, namely linear regression (LR), least angle regression (LAR), and Bayesian ridge chain (BR), performed well in predicting the WQI. The MLR model had an R2 of 0.999 and an RMSE of 0.149, while the three ML models had an R2 of 1.0 and an RMSE of approximately 0.0. These results support using both MLR and ML models for predicting the WQI with very high accuracy, which will contribute to improving water quality management.
CITATION STYLE
Jafar, R., Awad, A., Hatem, I., Jafar, K., Awad, E., & Shahrour, I. (2023). Multiple Linear Regression and Machine Learning for Predicting the Drinking Water Quality Index in Al-Seine Lake. Smart Cities, 6(5), 2807–2827. https://doi.org/10.3390/smartcities6050126
Mendeley helps you to discover research relevant for your work.