Sampling Biodiversity Sampling

  • Ariño A
  • Otegui J
N/ACitations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

The world’s vast biodiversity data repository, the Global Biodiversity Information Facility (GBIF), federates taxonomical databases that, presumably, encompass most of the taxonomically- and geographically-oriented, openly accessible occurrence data. Therefore, GBIF can be considered the best sampling of biodiversity information available in the world. However, to conclude that this very vastness (more than 145 million records) should account a priori for high significance and confidence, may be misleading. For a sample to represent the universe being sampled, a number of statistical assumptions must be made. Most importantly, the samples should ideally be random samples from across any potential pattern of interest. If, on the other hand, samples do have a hidden intrinsic pattern, any resulting observed pattern might not reflect the universe being sampled but, rather, the sampling bias. Thus, using this mass of available data is subject to knowing or estimating what inferences can be drawn now, and what others cannot (yet) be ascertained. Some of the GBIF data’s biases are well known or easy to discover. For instance, the famous geographical concentration of available data from the Western hemisphere, incommensurate with the biodiversity hotspots distribution, can be readily observed in the map of all geolocated records already available at GBIF portal. Other patterns, however, may be suspect but less obviously assessed or even detected. For instance, taxonomical patterns in the data, such as taxon concentration in geographical areas, may require cross-examination of each data provider’s own geographical and taxonomical bias. An obvious practical problem for such studies is the difficulty of putting into play all relevant records. As many data providers and databases are thematic, patterns that transect or combine space, time, and taxa might require the examination of very large tracts of available data. Queries performed on indexes, such as the ones that GBIF supplies, may not answer certain questions being asked. In the most extreme case, one wishing to analyze certain patterns in the data should, ideally, have immediate access to the full dataset. This could amount to downloading all datasets from all data providers, which is impractical for most. In this work, GBIF data were queried using sampling techniques adapted in part from ecological studies, and inferences drawn from these smaller samples were compared to inferences drawn from the full dataset. Also, samples were used to search for specific patterns in GBIF data that may, among other things, (a) influence the use of these data as samples for ecological studies, and (b) very significantly, allow for the discovery of knowledge gaps, identifying data regions worth exploring such as underrepresented taxa, neglected areas, etc. Some of these patterns were already discovered by newly developed exploratory procedures in data from our own database, Zootron 4, during a previous series of analyses presented at TDWG-2006 and TDWG-2007. Other patterns were conceptually new, as they pertain to the multi-dataset provenance of GBIF data. Performing these analyses to detect patterns in GBIF data did not prove especially taxing to the load on GBIF servers, as only standard web queries were performed, requiring no complex table interaction. This is particularly relevant in view of the performance of the database engines under large size asymmetry (see Holetscheck, this issue).

Cite

CITATION STYLE

APA

Ariño, A. H., & Otegui, J. (2008). Sampling Biodiversity Sampling. In A. L. Weitzman & L. Belbin (Eds.), Proceedings of TDWG (2008) (p. 107). Fremantle, AU: Biodiversity Information Standards (TDWG). Retrieved from http://www.tdwg.org/proceedings/article/view/413

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free