Spatially continuous information of seabed sediments is often required for a variety of activities including seabed mapping and characterisation, prediction of marine biodiversity, and marine environmental planning and conservation. As seabed sediment data is often collected by point sampling, spatially continuous information must then be predicted from the point data. The accuracy of the predicted information is crucial to evidence-based decision making in marine environmental management and conservation. Improving predictive accuracy by identifying the most accurate methods is essential, but also challenging, since the accuracy is often data specific and affected by many factors. Because of the high predictive accuracy of machine learning methods, especially Random Forest (RF), they were introduced into spatial statistics by combining them with existing spatial interpolation methods (SIMs), which resulted in new hybrid methods with improved accuracy. This development opened an alternative source of methods for spatial prediction. These hybrid methods, especially the hybrids of RF with inverse distance weighting (IDW) or ordinary kriging (OK) (i.e. RFOK or RFIDW), showed their high predictive capacity. However, their applications to spatial predictions of environmental variables are still uncommon. Model selection for RF and the hybrid methods is necessary and further test is required. Furthermore, model averaging has been argued to be able to improve predictive accuracy, but no consistent findings were observed in previous studies. In this study, we aim to identify the most accurate methods for spatial prediction of seabed gravel content in the northwest Australian Exclusive Economic Zone. We experimentally examined: 1) whether input secondary variables affect the performance of RFOK and RFIDW; 2) whether the performances of RF, SIMs and their hybrid methods are data specific; and 3) whether model averaging improves predictive accuracy of these methods. For RF and the hybrid methods, up to 21 variables were used as predictors. The predictive accuracy was assessed in terms of relative mean absolute error and relative root mean squared error based on the average of 100 iterations of 10-fold cross-validation. The findings of this study are: • the predictive errors fluctuate with the input secondary variables; • the existence of correlated variables can alter the results of model selection, leading to different models; • the set of initial input variables affects the model selected; • the most accurate model may be missed during the model selection; • RF, RFOK and RFIDW proved to be the most accurate methods in this study, with RFOK preferred; • these methods are not data specific, but their models are, so best model needs to be identified; and • Model averaging is clearly data specific. In conclusion, model selection is essential for RF and the hybrid methods. The best model needs to be identified for individual studies and application of model averaging should also be examined accordingly. RF and the hybrid methods have displayed substantial potential for predicting environmental properties and are recommended for further testing for spatial predictions in environmental sciences and other relevant disciplines. This study provides suggestions and guidelines for improving the spatial predictions of biophysical variables in both marine and terrestrial environments.
CITATION STYLE
Li, J. (2013). Predicting the spatial distribution of seabed gravel content using random forest, spatial interpolation methods and their hybrid methods. In Proceedings - 20th International Congress on Modelling and Simulation, MODSIM 2013 (pp. 394–400). Modelling and Simulation Society of Australia and New Zealand Inc. (MSSANZ). https://doi.org/10.36334/modsim.2013.a9.li
Mendeley helps you to discover research relevant for your work.