Species distribution models are an important tool for guiding our understanding of ecological systems and how to manage them. A recently introduced method merges two popular but previously distinct classes of species distribution models: site occupancy models (OD) and boosted regression trees (BRT). The new method (OD-BRT) allows for the simultaneous treatment of two major challenges in estimating the distribution of a species from data: 1) many species are detected imperfectly on surveys, resulting in systematic false negatives in the data (addressed by OD, but not BRT), and 2) ecological systems are extremely complex, so it may be difficult to specify in advance which predictor variables are relevant and what their relationships to the distribution may be (addressed by BRT, but not OD). We use synthetic data to compare OD-BRT to its predecessor methods (OD and BRT) in terms of sensitivity to tuning parameters, the amount of training data required, species occupancy rate, and detection probability. We evaluate performance based on the ability of the models to estimate the true covariate relationships and based on predictive AUC for raw observations and for the true occupancy status.
Results/Conclusions
In accordance with common practice in species distribution modeling, BRT interprets observations as occupancy; thus, when detection probabilities are less than one, predictions from BRT are systematically biased below true occupancy. Predictions from OD and OD-BRT do not show this bias. Additionally, partial dependence plots of the true relationships of the covariates to the synthetic distributions show that in the presence of nonlinearities and interactions, OD-BRT outperforms OD, without requiring prior knowledge of the form of the relationships. We describe a process for selecting tuning parameters for OD-BRT (tree depth, number of iterations, and step length), which is nearly identical to that for standard applications of BRT. Finally, the OD model has been shown to produce biased estimates in cases with insufficient data; this insufficiency may result from low occupancy rates, low detection probabilities, too few sites, or too few visits to the sites. We show that the OD-BRT model has similar bias, and we explore the relationship of the bias to each of these contributing factors for both models. In addition to simulation results comparing OD, BRT, and OD-BRT, this poster also provides tutorial material on the OD-BRT method and the accompanying R package.