Predicting species spatial distributions using presence-only data: a case study of native New Zealand ferns
Identification of areas containing high biological diversity (`hotspots') from species presence-only data has become increasingly important in species and ecosystem management when presence/absence data is unavailable. However, as presence-only data sets lack any information on absences and as they suffer from many biases associated with the ad hoc or non-stratified sampling, they are often assumed problematic and inadequate for most statistical modeling methods. In this paper, this supposition is investigated by comparing generalized additive models (GAM) fitted with 43 native New Zealand fern species presence/absence data, obtained from a survey of 19punctuation space875 forested plots, to GAM models and ecological niche factor analysis (ENFA) models fitted with identical presence data and, in the case of GAM models, computer generated `pseudo' absences. By using the same presence data for all models, absence data is isolated as the varying factor allowing different techniques for generating `pseudo' absences used in the GAM models to be analyzed and compared over three principal levels of investigation. GAM models fitted with an environmentally weighted distribution of `pseudo' absences and ENFA models selected environmental variables more similar to the GAM presence/absence models than did the GAM models fitted with randomly distributed `pseudo' absences. Average contributions for the GAM presence/absence model showed mean annual temperature and mean annual solar radiation as the most important factors followed by lithology. Comparisons of prediction results show GAM models incorporating an environmentally weighted distribution of `pseudo' absences to be more closely correlated to the GAM presence/absence models than either the GAM models fitted with randomly selected `pseudo' absences or the ENFA models. ENFA models were found to be the least correlated to the GAM presence/absence models. These latter models were also shown to give the most optimistic predictions overall, however, as ENFA predicts habitat suitability rather than probability of presence this was expected. Summation of species predictions used as a measure of species potential biodiversity `hotspots' also shows ENFA models to give the highest and largest distribution of potential biodiversity. Additionally, GAM models incorporating `pseudo' absences were more highly correlated to the GAM presence/absence model than was ENFA. However, ENFA identified more areas of potential biodiversity `hotspots' similar to the GAM presence/absence model, than either GAM model incorporating `pseudo' absences.