elapid: Species distribution modeling tools for Python

Christopher B. Anderson

Journal ArticleOPEN ACCESS

elapid: Species distribution modeling tools for Python

Anderson C

Journal of Open Source Software (2023) 8(84) 4930

DOI: 10.21105/joss.04930

N/ACitations

12Readers

Abstract

Species distribution modeling (SDM) is based on the Grinellean niche concept: the environmental conditions that allow individuals of a species to survive and reproduce will constrain the distributions of those species over space and time (Grinnell, 1917; Wiens et al., 2009). The inputs to these models are typically spatially-explicit species occurrence records and a series of environmental covariates, which might include information on climate, topography, land cover or hydrology (Booth et al., 2014). While many modeling methods have been developed to quantify and map these species-environment interactions, few software systems include both a) the appropriate statistical modeling routines and b) support for handling the full suite of geospatial analysis required to prepare data to fit, apply, and summarize these models. elapid is both a geospatial analysis and a species distribution modeling package. It provides an interface between vector and raster data for selecting random point samples, annotating point locations with coincident raster data, and summarizing raster values inside a polygon with zonal statistics. It provides a series of covariate transformation routines for increasing feature dimensionality, quantifying interaction terms and normalizing unit scales. It provides a Python implementation of the popular Maxent SDM (Phillips et al., 2017) using infinitely weighted logistic regression (Fithian & Hastie, 2013). It also includes a standard Niche Envelope Model (Nix, 1986), both of which were written to match the software design patterns of modern machine learning packages like sklearn (Grisel et al., 2022). It also allows users to add spatial context to any model by providing methods for spatially splitting train/test data and computing geographically-explicit sample weights. elapid was designed as a contemporary SDM package, built on best practices from the past and aspiring to support the next generation of biodiversity modeling workflows. Statement of need Species occurrence data-georeferenced point locations where a species has been observed and identified-are an important resource for understanding the environmental conditions that predict habitat suitability for that species. These data are now abundant thanks to the proliferation of institutional open data policies, large-scale collaborations among research groups, and advances in the quality and popularity of citizen science applications (GBIF, 2022; iNaturalist, 2022). Tools for working with these data haven't necessarily kept pace, however, especially ones that support modern geospatial data formats and machine learning workflows. elapid builds on a suite of well-known statistical modeling tools commonly used by biogeogra-phers, extending them to add novel features, to work with cloud-hosted data, and to save and share models. It provides methods for managing the full lifecyle of modeling data: generating background point data, extracting raster values for each point (i.e. point annotation), splitting train/test data, fitting models, and applying predictions to rasters. It provides a very high degree of control for model design, which is important for several reasons. Anderson. (2023). elapid: Species distribution modeling tools for Python. Journal of Open Source Software, 8(84), 4930. https://doi.org/10. 21105/joss.04930.

Cite

CITATION STYLE

APA

Anderson, C. B. (2023). elapid: Species distribution modeling tools for Python. Journal of Open Source Software, 8(84), 4930. https://doi.org/10.21105/joss.04930

elapid: Species distribution modeling tools for Python

Abstract

Cite

Register to see more suggestions