Measuring the accuracy of species distribution models : a review

  • Liu C
  • White M
  • Newell G
  • 6


    Mendeley users who have this article in their library.
  • N/A


    Citations of this article.


Species distribution models (SDMs) are empirical models relating species occurrence to environmental variables based on statistical or other response surfaces. Species distribution modeling can be used as a tool to solve many theoretical and applied ecological and environmental problems, which include testing biogeographical, ecological and evolutionary hypotheses, assessing species invasion and climate change impact, and supporting conservation planning and reserve selection. The utility of SDM in real world applications requires the knowledge of the model’s accuracy. The accuracy of a model includes two aspects: discrimination capacity and reliability. The former is the power of the model to differentiate presences from absences; and the latter refers to the capability of the predicted probabilities to reflect the observed proportion of sites occupied by the subject species. Similar methodology has been used for model accuracy assessment in different fields, including medical diagnostic test, weather forecasting and machine learning, etc. Some accuracy measures are used in all fields, e.g. the overall accuracy and the area under the receiver operating characteristic curve; while the use of other measures is largely restricted to specific fields, e.g. F-measure is mainly used in machine learning field, or is referred to by different names in different fields, e.g. “true skill statistic” is used in atmospheric science and it is called “Youden’s J” in medical diagnostic field. In this paper we review those accuracy measures typically used in ecology. Generally, the measures can be divided into two groups: threshold-dependent and thresholdindependent. Measures in the first group are used for binary predictions, and those in the second group are used for continuous predictions. Continuous predictions may be transformed to binary ones if a specific threshold is employed. In such cases, the threshold-dependent accuracy measures can also be used. The threshold-dependent indices used in or introduced to SDM field include overall accuracy, sensitivity, specificity, positive predictive value, negative predictive value, odds ratio, true skill statistic, F-measure, Cohen’s kappa, and normalized mutual information (NMI). However, since NMI only measures the agreement between two patterns, it cannot differentiate the worse-than-random models from the better-thanrandom models, which reduces its utility as an accuracy measure. The threshold-independent indices used in or introduced to the SDM field include the area under the receiver operating characteristic curve (AUC), Gini index, and point biserial correlation coefficient. The proportion of explained deviance D2 and its adjusted form have been also introduced into SDM field. But this adjusted metric has no theoretical foundation in the context of generalized linear modeling. Therefore, we provide another adjusted form, which was proposed by H. V. Houwelingen based on the asymptotic χ 2 distribution of the log-likelihood statistics. Its superiority over other related measures has been found through previous simulation studies. We also provide another analogous measure, the coefficient of determination R2 , which has had a long history in weather forecast verification and was also recommended for use in medical diagnosis. Though these measures D2 and R2 are routinely used to evaluate generalized linear models (GLMs), we argue that nothing prevents them from being applied to other GLM-like models. In SDM accuracy assessment, discrimination capacity is often considered, but model reliability is frequently ignored. The primary reason for this is that no reliability measure has been introduced into the ecological literature. To meet this need we also suggest that root mean square error be used as a reliability measure. Its squared form, mean square error, has been used in meteorology for a long time, and is called Brier’s score. We also discuss the effect of prevalence dependence of accuracy measures and the precision of accuracy estimates.

Author-supplied keywords

  • accuracy measure
  • performance
  • prediction
  • prevalence
  • species distribution

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in


  • Canran Liu

  • M. White

  • G. Newell

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free