Data division effect on machine learning performance for prediction of streamflow

Okan Mert KATİPOĞLU

Journal ArticleOPEN ACCESS

Data division effect on machine learning performance for prediction of streamflow

KATİPOĞLU O

DÜMF Mühendislik Dergisi (2022) 653-660

DOI: 10.24012/dumf.1158748

N/ACitations

9Readers

Abstract

Accurate estimation of streamflow has an important role in water resources management, disaster preparedness and early warning, reservoir operation, and sizing of water structures. In this study, Extreme gradient boosting (XGBoost) and K-Nearest Neighbours (KNN) algorithms are used for the estimation of streamflow. In order to reveal the appropriate model, the raw model and models with optimized parameters were evaluated while the models were being built. In the setup of the models, various training test rates were also tried, and it was investigated which data division showed more effective results. For this purpose, the data were divided into ratios such as 60-40, 70-30, 80-20, and 90-10, respectively, and the model results were compared. Various statistical indicators such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R2) were used when comparing the models. As a result of the analysis, it was determined that the most suitable model for monthly streamflow estimation was obtained by using the optimized Xgboost algorithm and 60-40% data division. The obtained outputs constitute a vital resource for decision-makers regarding water resources planning and flood and drought management.Accurate estimation of stream flow has an important role in water resources management, disaster preparedness and early warning, reservoir operation, and sizing of water structures. In this study, Extreme gradient boosting (XGBoost) and K-Nearest Neighbours (KNN) algorithms are used for modeling river flows. In order to reveal the appropriate model, the raw model and models with optimized parameters were evaluated while the models were being built. In the setup of the models, various training test rates were also tried, and it was investigated which data division showed more effective results. For this purpose, the data were divided into ratios such as 60-40, 70-30, 80-20, and 90-10, respectively, and the model results were compared. Various statistical indicators such as root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) were used when comparing the models. As a result of the analysis, it was determined that the most suitable model for monthly flow estimation was obtained by using the optimized Xgboost algorithm and 60-40% data division. The obtained outputs constitute a vital resource for decision-makers regarding water resources planning and flood and drought management.

Cite

CITATION STYLE

APA

KATİPOĞLU, O. M. (2022). Data division effect on machine learning performance for prediction of streamflow. DÜMF Mühendislik Dergisi, 653–660. https://doi.org/10.24012/dumf.1158748

Data division effect on machine learning performance for prediction of streamflow

Abstract

Cite

Register to see more suggestions