PS 87-176
Evaluation of the quality of species distribution data and the corresponding predictive accuracy
Species Distribution Models (SDMs) are widely used to estimate the potential geographical distributions of species for various spatial or temporal extents and continuously assist the ecology, biogeography and biology community in achieving the natural conservation purpose. Modelers can easily and efficiently access the species distribution data from the global biodiversity information databases such as Global Biodiversity Information Facility (GBIF) which provides the species occurrence records of both global and regional scales. However, we found only limited number of the high resolution species geographic locations (e.g. latitude and longitude) from these occurrence data. This imposes a vital concern about the accuracy of the model predictions, given that the species occurrence records are usually the only source of input for the modeling and the documented survey method information is always missing: bias inherited in these occurrence data is difficult to be appropriately addressed in the modeling and the credit of the prediction is often compromised. In this study, we created a metric to quantitatively evaluate the quality of the species occurrences data, based on the two underlying assumptions for the statistical models to comply the foundation of the ecological niche theory: “representativeness” for the entire niche and “equilibrium” along the environmental gradient of the niche. Although those two assumptions are impossible to be completely satisfied in practice, the deviance from the idealized scenarios is measurable through our metric. The objective was to develop a metric focused on the spatial pattern of the occurrence so that modelers can estimate how reliable and accurate the predictions will be by using this metric before casting any of the actual modeling efforts.
Results/Conclusions
The results indicated that the metric is able to evaluate and distinguish the quality of species occurrence data by explicit thresholds revealed from the metric and also provide the useful insights on the model predictive accuracy in terms of calibration and discrimination.