This work investigates the added value of ensembles constructed from seventeen lumped hydrological models against their simple average counterparts. It is thus hypothesized that there is more information provided by all the outputs of these models than by their single aggregated predictors. For all available 1061 catchments, results showed that the mean continuous ranked probability score of the ensemble simulations were better than the mean average error of the aggregated simulations, confirming the added value of retaining all the components of the model outputs. Reliability of the simulation ensembles is also achieved for about 30% of the catchments, as assessed by rank histograms and reliability plots. Nonetheless this imperfection, the ensemble simulations were shown to have better skills than the deterministic simulations at discriminating between events and non-events, as confirmed by relative operating characteristic scores especially for larger streamflows. From 7 to 10 models are deemed sufficient to construct ensembles with improved performance, based on a genetic algorithm search optimizing the continuous ranked probability score. In fact, many model subsets were found improving the performance of the reference ensemble. This is thus not essential to implement as much as seventeen lumped hydrological models. The gain in performance of the optimized subsets is accompanied by some improvement of the ensemble reliability in most cases. Nonetheless, a calibration of the predictive distribution is still needed for many catchments. © 2010 Author(s).
Velázquez, J. A., Anctil, F., & Perrin, C. (2010). Performance and reliability of multimodel hydrological ensemble simulations based on seventeen lumped models and a thousand catchments. Hydrology and Earth System Sciences, 14(11), 2303–2317. https://doi.org/10.5194/hess-14-2303-2010