Sign up & Download
Sign in

Novel methods improve prediction of species’ distributions from occurrence data

by Jane Elith, Catherine H Graham, Robert P Anderson, Miroslav Dudík, Simon Ferrier, Antoine Guisan, Robert J Hijmans, Falk Huettmann, John R Leathwick, Anthony Lehmann, Jin Li, Lucia G Lohmann, Bette A Loiselle, Glenn Manion, Craig Moritz, Miguel Nakamura, Yoshinori Nakazawa, Jacob McC M Overton, A Townsend Peterson, Steven J Phillips, Karen Richardson, Ricardo Scachetti-Pereira, Robert E Schapire, Jorge Soberón, Stephen Williams, Mary S Wisz, Niklaus E Zimmermann show all authors
Ecography (2006)

Abstract

Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.

Cite this document (BETA)

Available from doi.wiley.com
Page 1
hidden

Novel methods improve prediction of species’ distributions from occurrence data

Novel methods improve prediction of species’ distributions from
occurrence data
Jane Elith*, Catherine H. Graham*, Robert P. Anderson, Miroslav Dudı´k, Simon Ferrier, Antoine Guisan,
Robert J. Hijmans, Falk Huettmann, John R. Leathwick, Anthony Lehmann, Jin Li, Lucia G. Lohmann,
Bette A. Loiselle, Glenn Manion, Craig Moritz, Miguel Nakamura, Yoshinori Nakazawa, Jacob McC. Overton,
A. Townsend Peterson, Steven J. Phillips, Karen Richardson, Ricardo Scachetti-Pereira, Robert E. Schapire,
Jorge Sobero´n, Stephen Williams, Mary S. Wisz and Niklaus E. Zimmermann
Elith, J., Graham, C. H., Anderson, R. P., Dudı´k, M., Ferrier, S., Guisan, A., Hijmans, R. J.,
Huettmann, F., Leathwick, J. R., Lehmann, A., Li, J., Lohmann, L. G., Loiselle, B. A., Manion, G.,
Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J. McC., Peterson, A. T., Phillips, S. J.,
Richardson, K. S., Scachetti-Pereira, R., Schapire, R. E., Sobero´n, J., Williams, S., Wisz, M. S. and
Zimmermann, N. E. 2006. Novel methods improve prediction of species’ distributions from
occurrence data. / Ecography 29: 129/151.
Prediction of species’ distributions is central to diverse applications in ecology, evolution and
conservation science. There is increasing electronic access to vast sets of occurrence records in
museums and herbaria, yet little effective guidance on how best to use this information in the
context of numerous approaches for modelling distributions. To meet this need, we compared 16
modelling methods over 226 species from 6 regions of the world, creating the most comprehensive
set of model comparisons to date. We used presence-only data to fit models, and independent
presence-absence data to evaluate the predictions. Along with well-established modelling methods
such as generalised additive models and GARP and BIOCLIM, we explored methods that either
have been developed recently or have rarely been applied to modelling species’ distributions. These
include machine-learning methods and community models, both of which have features that may
make them particularly well suited to noisy or sparse information, as is typical of species’
occurrence data. Presence-only data were effective for modelling species’ distributions for many
species and regions. The novel methods consistently outperformed more established methods. The
results of our analysis are promising for the use of data from museums and herbaria, especially as
methods suited to the noise inherent in such data improve.
J. Elith (j.elith@unimelb.edu.au), School of Botany, Univ. of Melbourne, Parkville, Victoria, 3010
Australia. / C. H.Graham, Dept of Ecology and Evolution, 650 Life Sciences Building, Stony Brook
Univ., NY 11794, USA. / R. P. Anderson, City College of the City Univ. of New York, NY, USA. /
M. Dudı´k, Princeton Univ., Princeton, NJ, USA. / S. Ferrier, Dept of Environment and Conservation
Armidale, NSW, Australia. / A. Guisan, Univ. of Lausanne, Switzerland. / R. J. Hijmans, The Univ.
of California, Berkeley, CA, USA. / F. Huettmann, Univ. of Alaska Fairbanks, AK, USA. / J. R.
Leathwick, NIWA, Hamilton, NZ. / A. Lehmann, Swiss Centre for Faunal Cartography (CSCF),
Neuchaˆtel, Switzerland. / J. Li, CSIROAtherton, Queensland, Australia. / L. Lohmann, Univ.de Sa˜o
Paulo, Brasil. / B. A. Loiselle, Univ. of Missouri, St. Louis, USA. / G. Manion, Dept of Environment
and Conservation, NSW, Australia. / C. Moritz, The Univ. of California, Berkeley, USA. / M.
Nakamura, Centro de Invest. en Matema´ticas (CIMAT), Me´xico. / Y. Nakazawa, The Univ.of
Kansas, Lawrence, KS, USA. / J. McC. Overton, Landcare Research, Hamilton, NZ. / A. T.
Peterson, The Univ. of Kansas, Lawrence, KS, USA. / S. J. Phillips, AT&T Labs-Research, Florham
Park, NJ, USA. / K. S. Richardson, McGill Univ., QC, Canada. / R. Scachetti-Pereira, Centro de
Refereˆncia em Informac¸a˜o Ambiental, Brazil. / R. E. Schapire, Princeton Univ., Princeton, NJ, USA.
/ J. Sobero´n, The Univ. of Kansas, Lawrence, KA, USA. / S. E.Williams, James Cook Univ.,
Queensland, Australia. / M. S. Wisz, National Environmental Research Inst., Denmark. / N. E.
Zimmermann, Swiss Federal Research Inst. WSL, Birmensdorf, Switzerland.
*The first two authors have contributed equally to this paper.
Accepted 25 January 2006
Copyright # ECOGRAPHY 2006
ECOGRAPHY 29: 129/151, 2006
ECOGRAPHY 29:2 (2006) 129
Page 2
hidden
Detailed knowledge of species’ ecological and geo-
graphic distributions is fundamental for conservation
planning and forecasting (Ferrier 2002, Funk and
Richardson 2002, Rushton et al. 2004), and for under-
standing ecological and evolutionary determinants of
spatial patterns of biodiversity (Rosenzweg 1995, Brown
and Lomolino 1998, Ricklefs 2004, Graham et al. 2006).
However, occurrence data for the vast majority of species
are sparse, resulting in information about species dis-
tributions that is inadequate for many applications.
Species distribution models attempt to provide detailed
predictions of distributions by relating presence or
abundance of species to environmental predictors. As
such, distribution models have provided researchers with
an innovative tool to explore diverse questions in
ecology, evolution, and conservation. For example,
they have been used to study relationships between
environmental parameters and species richness (Mac
Nally and Fleishman 2004), characteristics and spatial
configuration of habitats that allow persistence of
species in landscapes (Arau´jo and Williams 2000, Ferrier
et al. 2002a, Scotts and Drielsma 2003), invasive
potential of non-native species (Peterson 2003, Goolsby
2004), species’ distributions in past (Hugall et al. 2002,
Peterson et al. 2004) or future climates (Bakkenes et al.
2002, Skov and Svenning 2004, Arau´jo et al. 2004,
Thomas et al. 2004, Thuiller et al. 2005), and ecological
and geographic differentiation of the distributions of
closely-related species (Cicero 2004, Graham et al.
2004b).
Most research on development of distribution model-
ling techniques has focused on creating models using
presence/absence or abundance data, where regions of
interest have been sampled systematically (Austin and
Cunningham 1981, Hirzel and Guisan 2002, Cawsey et
al. 2002). However, occurrence data for most species
have been recorded without planned sampling schemes,
and the great majority of these data consist of presence-
only records from museum or herbarium collections that
are increasingly accessible electronically (Graham et al.
2004a, Huettmann 2005, Sobero´n and Peterson 2005).
The main problem with such occurrence data is that the
intent and methods of collecting are rarely known, so
that absences cannot be inferred with certainty. These
data also have errors and biases associated with them,
reflecting the frequently haphazard manner in which
samples were accumulated (Hijmans et al. 2000, Reese et
al. 2005). Thus, the considerable potential of occurrence
data for analysis of biodiversity patterns will only be
realised if we can use them critically. Simultaneous with
increasing accessibility of species’ occurrence data,
environmental data layers of high spatial resolution,
such as those derived from satellite images (Turner et al.
2003) and through sophisticated interpolation of climate
data (Thornton et al. 1997, Hijmans et al. 2005), are
now much more abundant and available. In parallel,
development of methods for efficient exploration and
summary of patterns in large databases has accelerated
in other disciplines (Hastie et al. 2001), but only a few of
these have been applied in ecological studies. Given the
widespread use of distribution modelling, and the
synergy of advances in data availability and modelling
methods, a clear need exists for broad synthetic analyses
of the predictive ability and accuracy of species’ dis-
tribution modelling methods for presence-only data.
There is now a plethora of methods for modelling
species’ distributions that vary in how they model the
distribution of the response, select relevant predictor
variables, define fitted functions for each variable, weight
variable contributions, allow for interactions, and pre-
dict geographic patterns of occurrence (Guisan and
Zimmerman 2000, Burgman et al. 2005, Wintle and
Bardos in press). Initial attempts to analyze presence-
only data used methods developed specifically for that
purpose, based either on calculations of envelopes or
distance-based measures (Go´mez Pompa and Nevling
1970, Rapoport 1982, Silverman 1986, Busby 1991,
Walker and Cocks 1991, Carpenter et al. 1993). Atten-
tion then turned to adapting presence-absence methods
(i.e. those that model a binomial response) to model
presence-only data, using samples of the background
environment (random points throughout the study area),
or of areas designated as ‘‘non-use’’ or ‘‘pseudo-
absence’’ (Stockwell and Peters 1999, Boyce et al. 2002,
Ferrier et al. 2002a, Zaniewski et al. 2002, Keating and
Cherry 2004, Pearce and Boyce in press).
More recently several novel modelling methods have
been proposed that have foundations in ecological and/
or statistical research, that may perform well for
distribution modelling with noisy data, such as pre-
sence-only records. Some of these methods use informa-
tion on the presences of other members of the
community to supplement information for the species
being modelled. Community methods are promising,
especially for rare species, because the additional in-
formation carried by the wider community may help
to inform the modelled relationships. Further, exten-
sive research in the machine-learning and statistical
disciplines has produced methods that are able to
capture complex responses, even with noisy input data.
These have received very little exposure in distribution
modelling, though the work that has been done is
promising (Phillips et al. 2006, Leathwick et al. in press.).
Regardless of the modelling method chosen, a major
problem in evaluation is that species’ distributions
are not known exactly. In many instances evaluation
focuses on predictive performance, some known occur-
rences are withheld from model development (either by
splitting the data set, k-fold partitioning, or bootstrap-
ping; Fielding and Bell 1997, Hastie et al. 2001, Arau´jo
et al. 2005), and accuracy is assessed based on how
well models predict the withheld data (Boyce et al. 2002,
130 ECOGRAPHY 29:2 (2006)

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

510 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
30% Ph.D. Student
 
13% Student (Master)
 
12% Post Doc
by Country
 
20% United States
 
14% Brazil
 
5% Germany