A Machine-Learning Framework for Modeling and Predicting Monthly Streamflow Time Series

7Citations
Citations of this article
31Readers
Mendeley users who have this article in their library.

Abstract

Having a complete hydrological time series is crucial for water-resources management and modeling. However, this can pose a challenge in data-scarce environments where data gaps are widespread. In such situations, recurring data gaps can lead to unfavorable outcomes such as loss of critical information, ineffective model calibration, inaccurate timing of peak flows, and biased statistical analysis in various applications. Despite its importance, predicting monthly streamflow can be a complex task due to its connection to random dynamics and uncertain phenomena, posing significant challenges. This study introduces an ensemble machine-learning regression framework for modeling and predicting monthly streamflow time series with a high degree of accuracy. The framework utilizes historical data from multiple monthly streamflow datasets in the same region to predict missing monthly streamflow data. The framework selects the best features from all available gap-free monthly streamflow time-series combinations and identifies the optimal model from a pool of 12 machine-learning models, including random forest regression, gradient boosting regression, and extra trees regressor, among others. The model selection is based on cross-validation train-and-test set scores, as well as the coefficient of determination. We conducted modeling on 26 monthly streamflow time series and found that the gradient boosting regressor with bagging regressor produced the highest accuracy in 7 of the 26 instances. Across all instances, the models using this method exhibited an overall accuracy range of 0.9737 to 0.9968. Additionally, the use of either a bagging regressor or an AdaBoost regressor improved both the tree-based and gradient-based models, resulting in these methods accounting for nearly 80% of the best models. Between January 1960 and December 2021, an average of 40% of the monthly streamflow data was missing for each of the 26 stations. Notably, two crucial stations located in the economically significant lower Athabasca Basin River in Alberta province, Canada, had approximately 70% of their monthly streamflow data missing. To address this issue, we employed our framework to accurately extend the missing data for all 26 stations. These accurate extensions also allow for further analysis, including grouping stations with similar monthly streamflow behavior using Pearson correlation.

References Powered by Scopus

Random forests

94890Citations
N/AReaders
Get full text

Greedy function approximation: A gradient boosting machine

19870Citations
N/AReaders
Get full text

Extremely randomized trees

6043Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Boosting algorithms for projecting streamflow in the Lower Godavari Basin for different climate change scenarios

4Citations
N/AReaders
Get full text

Determinants of carbon emissions in Africa: new evidence based on machine learning algorithms

1Citations
N/AReaders
Get full text

Deep reinforcement learning for multiple reservoir operation planning in the Chao Phraya River Basin

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Dastour, H., & Hassan, Q. K. (2023). A Machine-Learning Framework for Modeling and Predicting Monthly Streamflow Time Series. Hydrology, 10(4). https://doi.org/10.3390/hydrology10040095

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 4

50%

Researcher 2

25%

Professor / Associate Prof. 1

13%

Lecturer / Post doc 1

13%

Readers' Discipline

Tooltip

Engineering 3

50%

Computer Science 1

17%

Mathematics 1

17%

Economics, Econometrics and Finance 1

17%

Article Metrics

Tooltip
Mentions
News Mentions: 1

Save time finding and organizing research with Mendeley

Sign up for free