COS 44-5
New computational methods for modeling species potential distributions
Modeling species distributions is central to many ecological studies. Accuracy of modeled distributions has increased greatly with recent advanced in methodology. However, some modeling obstacles remain, including (i) that some methods have ill-posed modeling objectives; (ii) most methods for species distribution modeling are sensitive to sampling assumptions of independence and equilibrium, which are often violated, especially by expanding populations; and (iii) many methods require data for parameterization that are difficult or impossible to obtain, such as verified absence of a species within a study site (the “presence-only problem” in species distribution modeling). In this study, I introduce three new methods aimed at overcoming these obstacles and compare them with other, better known methods. First, a “plug-and-play” approach allows the substitution of alternative estimators to measure a quantity proportional to the conditional probability of occurrence given the environment, which I refer to as environmental suitability. Second, I study a little known algorithm for classification (LOBAG) and introduce a variant (LOBAG-OC) for modeling species distributions from presence-only data. Third, I introduce a novel ensemble algorithm, range-bagging, for presence-only modeling.
Results/Conclusions
For the plug-and-play approach, a special case (the regularized multivariate Gaussian) is shown to be numerically stable and provide good performance at low computational cost. Additionally, this method is relatively insensitive to noise in the form of irrelevant variables. Indeed, I show through simulation that this method performs well even when informative variables are in the minority. LOBAG, LOBAG-OC, and range-bagging were all found to perform well in comparison to other methods when the environmental range of observations was stationary. These methods performed poorly when species were expanding into new environmental conditions. Results of this study yield two general insights. First, ensemble learning considerably reduces the variance in estimated models. Second, discriminative performance of some presence-only methods is almost as good as leading methods for presence-background modeling of species distributions. These results suggest that species distribution modeling needs not be impeded by widely recognized obstacles.