Sign up & Download
Sign in

Sensitivity of predictive species distribution models to change in grain size

by Antoine Guisan, Catherine H Graham, Jane Elith, Falk Huettmann
Diversity and Distributions ()

Abstract

Predictive species distribution modelling (SDM) has become an essential tool in biodiversity conservation and management. The choice of grain size (resolution) of environmental layers used in modelling is one important factor that may affect predictions. We applied 10 distinct modelling techniques to presence-only data for 50 species in five different regions, to test whether: (1) a 10-fold coarsening of resolution affects predictive performance of SDMs, and (2) any observed effects are dependent on the type of region, modelling technique, or species considered. Results show that a 10 times change in grain size does not severely affect predictions from species distribution models. The overall trend is towards degradation of model performance, but improvement can also be observed. Changing grain size does not equally affect models across regions, techniques, and species types. The strongest effect is on regions and species types, with tree species in the data sets (regions) with highest locational accuracy being most affected. Changing grain size had little influence on the ranking of techniques: boosted regression trees remain best at both resolutions. The number of occurrences used for model training had an important effect, with larger sample sizes resulting in better models, which tended to be more sensitive to grain. Effect of grain change was only noticeable for models reaching sufficient performance and/or with initial data that have an intrinsic error smaller than the coarser grain size.

Cite this document (BETA)

Available from doi.wiley.com
Page 1
hidden

Sensitivity of predictive species...

DOI: 10.1111/j.1472-4642.2007.00342.x �� 2007 The Authors 332 Journal compilation �� 2007 Blackwell Publishing Ltd www.blackwellpublishing.com/ddi Diversity and Distributions, (Diversity Distrib.) (2007) 13 , 332���340 BIODIVERSITY RESEARCH ABSTRACT Predictive species distribution modelling (SDM) has become an essential tool in biodiversity conservation and management. The choice of grain size (resolution) of environmental layers used in modelling is one important factor that may affect predictions. We applied 10 distinct modelling techniques to presence-only data for 50 species in five different regions, to test whether: (1) a 10-fold coarsening of resolution affects predictive performance of SDMs, and (2) any observed effects are dependent on the type of region, modelling technique, or species considered. Results show that a 10 times change in grain size does not severely affect predictions from species distribution models. The overall trend is towards degradation of model performance, but improvement can also be observed. Changing grain size does not equally affect models across regions, techniques, and species types. The strongest effect is on regions and species types, with tree species in the data sets (regions) with highest locational accuracy being most affected. Changing grain size had little influence on the ranking of techniques: boosted regression trees remain best at both resolutions. The number of occurrences used for model training had an important effect, with larger sample sizes resulting in better models, which tended to be more sensitive to grain. Effect of grain change was only noticeable for models reaching sufficient performance and/ or with initial data that have an intrinsic error smaller than the coarser grain size. Keywords Environmental grain, niche-based modelling, natural history collections, presence- only data, resolution, spatial scale, sample size, species distribution modelling, model comparison, predictive performance. INTRODUCTION Predictive species distribution models (SDMs Guisan & Zimmer- mann, 2000) have become essential tools in biodiversity conservation and management (C��t�� & Reynolds, 2002). Fitting an SDM involves a series of steps, each requiring a number of choices and well-justified decisions (Ferrier et al ., 2002b Guisan & Thuiller, 2005). Grain (resolution) is one important factor that may affect predictions. It is, together with study extent, a component of spatial scale (Wiens, 2002) and a major feature in ecology (e.g. Holling, 1992) and eco- logical modelling (e.g. Huettmann & Diamond, 2006). Important questions therefore include: Is there an optimal grain for fitting SDMs? What is the effect of changing grain size on SDM performance? Choosing a grain size for modelling is partly a technical issue. For instance, grain size is related to the grid cell size of available environmental data (Graham et al ., 2004), characteristics of the species data (e.g. geographical accuracy, sample size, field survey constraints, or autocorrelation structure Guisan & Hofer, 2003 Gottschalk et al ., 2005 Linke et al ., 2005 Huettmann & Diamond, 2006) or computer power (i.e. too many cells may require too demanding computer resources). Grain size is also a crucial eco- logical as well as management issue. Changing the grain size can influence the perception of a phenomenon, such as patterns of presence or abundance (Johnson et al ., 2002 Tobalske, 2002 Wiens, 2002 Graham & Hijmans, 2006), or affect the relevance of the output for management applications (Ara��jo et al ., 2005). Working at the wrong scale can be very inefficient. Data from natural history collections often have significant error, making them difficult to use with fine-grained environmental data (Graham et al ., 2004). However, current georeferencing initiatives (e.g. GBIF, MANIS error calculator 1 , OBIS 2 , and OBIS- SEAMAP 3 ) include an estimate of the maximum error distance 1 http://manisnet.org 2 http://www.iobis.org/welcome.htm 3 http://seamap.env.duke.edu 1 Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland 2 Department of Ecology and Evolution, 650 Life Sciences Building, Stony Brook University, NY 11794, USA 3 School of Botany, The University of Melbourne, Parkville, Victoria, 3010 Australia 4 EWHALE Lab ��� Biology and Wildlife Department, Institute of Arctic Biology, University of Alaska Fairbanks, AK 99775-7000, USA *Correspondence: Antoine Guisan, Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland. Fax: +41 21 692 42 65 E-mail: antoine.guisan@unil.ch ���Miro Dudik, Simon Ferrier, Robert Hijmans, Anthony Lehmann, Jin Li, L��cia G. Lohmann, Bette Loiselle, Glenn Manion, Craig Moritz, Miguel Nakamura, Yoshinori Nakazawa, Jacob McC. Overton, A. Townsend Peterson, Steven J. Phillips, Karen Richardson, Ricardo Scachetti-Pereira, Robert E. Schapire, Stephen E. Williams, Mary S. Wisz, Niklaus E. Zimmermann Blackwell Publishing, Ltd. Sensitivity of predictive species distribution models to change in grain size Antoine Guisan 1 *, Catherine H. Graham 2 , Jane Elith 3 , Falk Huettmann 4 and the NCEAS Species Distribution Modelling Group���
Page 2
hidden
Effect of grain on species distribution models �� 2007 The Authors Diversity and Distributions , 13 , 332���340, Journal compilation �� 2007 Blackwell Publishing Ltd 333 (location uncertainty) with each occurrence. Hence, the modeller can determine what data are appropriate at a given grain size and can choose to increase the grain of environmental data. At larger grains, more occurrence records may be available because the limit on locational accuracy is relaxed to match the new grain size (see, e.g. Engler et al ., 2004). Generally, more accurate pre- dictions can be made with larger numbers (Stockwell & Peterson, 2002 Hernandez et al ., 2006), but also more accurate occurrences (e.g. for plants, Engler et al ., 2004). Hence, it is essential to evaluate the trade-off between the number of occurrence samples available for modelling and the grain size of environmental data. If species records are inaccurate, a set of predictors available at a too fine grain may need to be aggregated to a coarser grain. For example, in recent projections of plant distributions in Europe (Thuiller et al ., 2005), the grain of the input species data was 50 �� 50 km and the grain of available explanatory climatic maps was around 16 �� 16 km (10 ��� �� 10 ��� ). Models were first fitted by aggre- gating climatic map to the coarser grain and then re-projected to the finer climatic grain. Various other approaches have also been proposed to downscale atlas data using species distribution models (McPherson et al ., 2006). Ara��jo et al . (2005) found a reason- able agreement of downscaled predictions with real patterns of occurrences obtained from fine-scale breeding birds atlas data, but they did not formally compare coarse- to fine-grain models. There are many reasons why changing grain size could have an effect on the performance of SDMs. For instance, at a fine grain, the risk that a wrong geographical location of a species record samples a cell representing a different habitat than the one where the species actually occurred increases, while the opposite will be observed when aggregating data towards coarser grains. How- ever, when coarsening the grain, the risk of a forced-matching between environmental conditions that do not occur together but nearby in the field increases and can make the model identify spurious combinations of suitable environmental conditions for a species. This is likely most important for sessile organisms (Guisan & Thuiller, 2005). One way to assess the importance and effects of grain on SDM performance is to conduct analyses where grain is changed, and the qualitative and quantitative effects on models are measured. Yet, surprisingly few examples exist (e.g. Boyce et al ., 2003), suggesting that the effect of coarsening the grain of predictor variables on SDM performance has not been sufficiently tested. By changing grain size when fitting logistic models for the green woodpecker, Tobalske (2002) observed a clear model improvement at their coarse (400 ha) compared to their fine-grain (100 ha) predictions. Ferrier & Watson (1997) tested the effect of grain size on SDMs on 55 species using four techniques and found, on the contrary, degraded model performances at coarser grain. A study was started in 2003 to compare a large range of existing techniques for predicting species distributions from presence- only data and assess the effects of several intrinsic and extrinsic factors on model performance (Elith et al ., 2006). Here, we analyse (1) the effect of a 10-fold coarsening of grain size on the per- formance of SDMs and (2) whether any observed effects are dependent on the type of region, modelling technique, or organism considered. In the absence of any well-documented effect, a 10-fold change was considered a realistic assumption, large enough for most studies and beyond what is otherwise used for aggregations (Huettmann & Diamond, 2006). METHODS Species and environmental data The data used for this modelling study are a subset of the data described in Elith et al . (2006). Species and environment data sets, including climatic, topographical, and soil data, are briefly described, and the detailed names and characteristics are pro- vided in Appendices S1 and S2 in Supplementary Material. These are: 10 bird species and 11 GIS environmental predictors from Ontario (CAN) 6 plant species and 4 vertebrate species and 13 GIS environmental predictors from New South Wales (NSW, Australia) 10 plant species (shrubs and trees) and 13 GIS en- vironmental predictors from New Zealand (NZ) 10 plant species and 10 GIS environmental predictors from South America (SA) and 10 tree species and 13 GIS environmental predictors from Switzerland (SWI). Ten species were selected from each larger data set following three broad criteria: (1) a range of sample sizes, (2) a range of geographical distributions, and, where possible, (3) a range of biological groups or life-forms. Model fitting Ten predictive techniques were used to fit the species distribution models. These were: (1) the DIVA-GIS implementation of BIOCLIM (Busby, 1991), (2) DOMAIN (Carpenter et al. 1993), (3) GLM (generalized linear model McCullagh & Nelder, 1989), (4) GAM (generalized additive model Hastie & Tibshirani, 1986), (5) BRUTO (Hastie et al. 1994), (6) MARS (multivariate adaptive regression splines Friedman, 1991), (7) BRT (boosted regression tree Friedman, 2001), (8) OM-GARP (genetic algorithm for rule-based predictions a new but as yet unreleased version of the one developed by Stockwell & Peters, 1999), (9) GDMSS (the single species version of the generalized dissimilarity model of Ferrier et al ., 2002a), and (10) MAXENT-T (maximum entropy Phillips et al. , 2004, 2006). The first two are profile techniques, the next four belong to the large family of generalized regression techniques, and the last four are all very distinct approaches. All modelling techniques and specific fitting detail are described in more detail in Elith et al . (2006). The models were fitted by experienced modellers, usually by those among the authors who knew the technique best. As the training sets only contained presence data, pseudo-absences were generated for those which techniques required them (all except BIOCLIM and DOMAIN), by taking a random sample of 10,000 sites (Elith et al ., 2006). All techniques were implemented as described in Elith et al . (2006). Model evaluation An independent presence���absence data set was available for each study area, which usually had greater locational accuracy than the presence data used to train the models and allowed for a

Readership Statistics

300 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
26% Ph.D. Student
 
16% Post Doc
 
12% Student (Master)
by Country
 
17% United States
 
15% Brazil
 
7% Germany

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in