GIS-based niche modeling for mapp...
Ecology, 87(6), 2006, pp. 1458���1464 �� 2006 by the Ecological Society of America GIS-BASED NICHE MODELING FOR MAPPING SPECIES��� HABITAT JOHN T. ROTENBERRY,1,3 KRISTINE L. PRESTON,1 AND STEVEN T. KNICK2 1 Department of Biology and Center for Conservation Biology, University of California, Riverside, California 92521 USA 2 Snake River Field Station, USGS Forest and Rangeland Ecosystem Science Center Boise, Idaho 82706 USA Abstract. Ecological ������niche modeling������ using presence-only locality data and large-scale environmental variables provides a powerful tool for identifying and mapping suitable habitat for species over large spatial extents. We describe a niche modeling approach that identifies a minimum (rather than an optimum) set of basic habitat requirements for a species, based on the assumption that constant environmental relationships in a species��� distribution (i.e., variables that maintain a consistent value where the species occurs) are most likely to be associated with limiting factors. Environmental variables that take on a wide range of values where a species occurs are less informative because they do not limit a species��� distribution, at least over the range of variation sampled. This approach is operationalized by partitioning Mahalanobis D2 (standardized difference between values of a set of environmental variables for any point and mean values for those same variables calculated from all points at which a species was detected) into independent components. The smallest of these components represents the linear combination of variables with minimum variance increasingly larger components represent larger variances and are increasingly less limiting. We illustrate this approach using the California Gnatcatcher (Polioptila californica Brewster) and provide SAS code to implement it. Key words: California Gnatcatcher geographical information systems GIS habitat relationships Mahalanobis D2 niche modeling Polioptila californica principal-components analysis. INTRODUCTION Spatially explicit habitat suitability models provide powerful tools for ecologists and conservation biologists (Scott et al. 2002). Improved Geographical Information Systems (GIS) software and digital environmental layers permit development of new modeling techniques that create multivariate species��� ������niche models������ encompass- ing large geographic areas. These regional niche models incorporate hypotheses about a species��� occurrence relative to various environmental variables that are available as GIS spatial layers. Digital environmental layers such as elevation, slope aspect, precipitation, temperature, soil type, land use, and especially vegeta- tion type (often used as a surrogate for ������habitat type������) may be incorporated into regional niche models. Such models can have direct relevance to the ecology and conservation of targeted species. First, most modeling techniques identify the relative ������importance������ of individual variables (or combinations of variables) in influencing the distribution of a species. Although these ������importances������ are often more statistical than biological, they nonetheless serve as working hypotheses that can guide further, perhaps more experimental, investigation, as well as assist in implementing and evaluating adaptive management decisions. Second, they provide a spatially explicit assessment of habitat suitability. It is one thing to know what variables are important knowing where the appropriate combination of variables occurs can be equally valuable. Third, if the model is robust, predictions about habitat suitability can be extended into areas where there is currently no information about the occurrence of a particular species. Such predictions may help to focus additional survey effort or guide the design of more efficient species��� preserves (e.g., Raxwor- thy et al. 2003). Ecological modelers are faced with several challenges in producing useful predictive models. For example, habitat suitability models typically are created using abundance, density, or presence���absence data collected during surveys for the species of interest (Guisan and Zimmerman 2000, Brotons et al. 2004). However, creation of models encompassing large geographic areas (such as a county, state, or even larger area) generally requires using multiple sources of data, often collected with different survey methodologies. Although large- scale databases for sensitive plant and animal species are available (e.g., government-supported endangered spe- cies databases, various regional- or state-based natural diversity databases, and museum collections��� databases), these typically provide information on the presence of a target species at a point, but rarely document the absence of a species from a surveyed area. To further complicate matters, obtaining ������true absence������ data even with focused surveys can be problematic especially for species that are rare or difficult to detect (Knick and Rotenberry 1998, Dunn and Duncan 2000, Hirzel et al. Manuscript received 26 August 2005 revised 9 December 2005 accepted 20 December 2005. Corresponding Editor: G. M. Henebry. 3 E-mail: john.rotenberry@ucr.edu 1458 TATISTICAL EPORTS
2002, Rotenberry et al. 2002). Another challenge to modeling is to predict a species��� occurrence outside of the original study area or in a situation where the environment is undergoing change. In such cases, the particular combination of habitat characteristics present where the original data were collected may not exist (Knick and Rotenberry 1998, Rotenberry et al. 2002). To meet these challenges, new modeling techniques have been developed to create regional models that predict habitat suitability based solely on locations where a species is present, and that are relatively robust to the inadvertent inclusion of nonrelevant environmental variation (Clark et al. 1993, Knick and Rotenberry 1998, Dettmers and Bart 1999, Stockwell and Peters 1999, Dunn and Duncan 2000, Hirzel et al. 2002, Petersen et al. 2002, Rotenberry et al. 2002). Our objective is to make one of these techniques widely available and easily implementable. Mahalanobis D2 Concisely, Mahalanobis D2 is simply the standardized difference between the values of a set of environmental variables for any point (or rasterized cell or pixel in a GIS layer) and the mean values for those same variables calculated from all points at which a species was detected (Clark et al. 1993, Dunn and Duncan 2000, Rotenberry et al. 2002, Browning et al. 2005). Thus, the more similar in environmental conditions a point is to the species��� mean, the smaller the D2 and the more ������suitable������ the habitat at that point: D2��y�� �� ��y l�� 0 R 1��y l�� ��1�� where H is ������occupied habitat,������ an n 3 p matrix of p variables measured at n points where a species was detected l is the p 3 1 vector of means based on H (i.e., the centroid) and y is the p 3 1 vector of measurements on any point (it may or may not be taken from H). Thus, y l is a vector of deviations of a point from a species��� mean vector R is the p 3 p variance���covariance matrix based on H and D2 is a squared scalar distance, standardized in the R metric. Because D2(y) approximately follows a v��y�� 2 distribu- tion under multinormal assumptions, it can be rescaled to range from 0 to 1 (called a ������p-value������). This rescaling is desirable, as D2 values can otherwise range from near zero to infinity. These p-values may be interpreted as analogous to a posterior probability resulting from a Bayes discriminant function or logistic regression (Dunn and Duncan 2000). Use of D2 to characterize a species��� habitat relation- ship assumes that the original sample reflects the optimal habitat distribution of the animals in the sampled landscape. As a corollary, it assumes that the selection response has been fully characterized (at least in the vicinity of the mean), or in other words, that l and R fully characterize the species��� response to habitat. This implies two additional assumptions: (1) the sampled area contains the full range of habitat variation to which the species responds, and (2) we have identified and measured the appropriate variables (i.e., we have not left out any that are important, and we have not included any that are irrelevant). These assumptions are not always justified. Although D2 performs quite well in many circumstances (e.g., Clark et al. 1993, Knick and Dyer 1997), it may perform poorly when applied to areas not included in the original sample or if applied to landscapes that exhibit nonstationarity in space or time, such as those that are prone to disturbance (whether natural or anthropogenic) or are undergoing restoration or succession (Knick and Rotenberry 1998, Rotenberry et al. 2002). Partitioning Mahalanobis D2 Modeling techniques based on dissimilarity to an optimum configuration may not be ideal for predicting animal occurrence because of the uncertainty associated with defining a biological optimum from distributional data. Instead, identifying a minimum set of basic habitat requirements for a species may be more appropriate for predicting potential animal use in changing environ- ments (Dunn and Duncan 2000, Rotenberry et al. 2002). The performance of D2 is improved by ������partitioning������ it into separate components, each representing an independent set of relationships between a species��� distribution and environmental variables (Dunn and Duncan 2000, Rotenberry et al. 2002). Partitioned D2 for any point y is given as D2��y�� �� Xp j��1 dj2=kj ��2a�� where k1 . . .kk . . . kp are the eigenvalues of R, and dj �� ��y l�� 0 aj where y and l are as previously defined, and aj is the eigenvector associated with kj. This result arises from the spectral decomposition of R (e.g., Seber 1984 see Rotenberry et al. 2002). Alternatively, D2��y�� �� d1 2=k1 �� ::: �� dk 2=kk �� ::: �� dp 2=kp: ��2b�� These distance partitions are additive, and each is associated with an eigenvalue and eigenvector arising from a principal-components analysis (PCA) of the data set H containing the values of the environmental variables from the points at which the species occurred. Unlike regular PCA, however, biological significance is attached to those components with the smallest, rather than the largest, eigenvalues (which in PCA are measures of variance). Dunn and Duncan (2000) and Rotenberry et al. (2002) show the relationship between the partition with the smallest eigenvalue and Pearson���s ������plane of closest fit,������ that plane for which the sums of squares of the perpendiculars from a set of points to the plane is a minimum (Pearson 1901). The variance of these projections of points on a vector normal to such a plane will be a minimum, the same as the variance of June 2006 1459 MAPPING SPECIES��� HABITAT TATISTICAL EPORTS