Species abundances and distributions lie at the core of ecology and conservation biology. Being able to predict current and future distributions is invaluable for understanding the processes that underlie distributions and for conservation management. Much progress has been made in the field of distribution modeling in the last decade. Yet the question of how to best evaluate distribution models is still largely open. Evaluations that use the same data for fitting and testing the model (resubstitution) may be overly optimistic, in particular when the model is overfit (i.e., is overly well adapted to the given dataset but will predict poorly onto new datasets). Holding out data from model fitting for testing is a sounder approach but spatial autocorrelation prevents randomly held out data from being truly independent from the data used for fitting the models. This remaining dependence means that models are indirectly, via the fitting data, also fit to the testing data and the danger of overfitting and overly optimistic evaluations of model performance remain. I developed a method based on leave-one-out-cross-validation, which tests distribution models on data not only held-out from model fitting but also spatially independent from data used for model fitting. I used simulated and real data to investigate the performance of the new method.
Results/Conclusions
The new distribution model evaluation method showed promising results. It more accurately reflected true error rates in simulated data than the traditional methods based on resubstitution or randomly held-out data. Applied to real data it showed that the traditional evaluation methods are overly optimistic. With this new technique much more realistic assessments of the models’ ability to predict into new areas and conditions can be achieved, which is critical to their responsible application in conservation management.