Water quality prediction: a data-driven approach exploiting advanced machine learning algorithms with data augmentation

K. Karthick; S. Krishnan; R. Manikandan

Journal ArticleOPEN ACCESS

Water quality prediction: a data-driven approach exploiting advanced machine learning algorithms with data augmentation

Journal of Water and Climate Change (2024) 15(2) 431-452

DOI: 10.2166/wcc.2023.403

4Citations

21Readers

Abstract

Water quality assessment plays a crucial role in various aspects, including human health, environmental impact, agricultural productivity, and industrial processes. Machine learning (ML) algorithms offer the ability to automate water quality evaluation and allow for effective and rapid assessment of parameters associated with water quality. This article proposes an ML-based classification model for water quality prediction. The model was tested with 14 ML algorithms and considers 20 features that represent various substances present in water samples and their concentrations. The dataset used in the study comprises 7,996 samples, and the model development involves several stages, including data preprocessing, Yeo–Johnson transformation for data normalization, principal component analysis (PCA) for feature selection, and the application of the synthetic minority over-sampling technique (SMOTE) to address class imbalance. Performance metrics, such as accuracy, precision, recall, and F1 score, are provided for each algorithm with and without SMOTE. LightGBM, XGBoost, CatBoost, and Random Forest were identified as the best-performing algorithms. XGBoost achieved the highest accuracy of 96.31% without SMOTE and had a precision of 0.933. The application of SMOTE enhanced the performance of CatBoost. These findings provide valuable insights for ML-based water quality assessment, aiding researchers and professionals in decision-making and management.

Author supplied keywords

Cite

CITATION STYLE

APA

Karthick, K., Krishnan, S., & Manikandan, R. (2024). Water quality prediction: a data-driven approach exploiting advanced machine learning algorithms with data augmentation. Journal of Water and Climate Change, 15(2), 431–452. https://doi.org/10.2166/wcc.2023.403

Water quality prediction: a data-driven approach exploiting advanced machine learning algorithms with data augmentation

Abstract

Author supplied keywords

Cite

Register to see more suggestions