Combining modeling techniques, large data sets, and streamlined visualization tools to explore background point selection for cheatgrass models: VisTrails and the Software for Assisted Habitat Modeling (SAHM)
Cheatgrass (Bromus tectorum) is an invasive annual grass in the western United States that is most problematic in the Great Basin and Columbia Plateau. The species invades perennial sagebrush shrublands and increases fire frequency. Recent research highlighting the difficulty in modeling this generalist species suggest evaluating multiple modeling techniques to better understand the species. The VisTrails System is an open-source provenance management and scientific workflow system designed to integrate the best of both scientific workflow and scientific visualization systems. Another distinguishing feature of VisTrails is a comprehensive provenance infrastructure that maintains detailed history information about the steps followed and data derived in the course of an exploratory task. We have incorporated five modeling techniques into a VisTrails package called the Software for Assisted Habitat Modeling (SAHM) and are using these techniques to better organize and explore the modeling of cheatgrass. With over 18,000 presence points and less than 5,000 absence points we have focused our initial efforts on the best techniques for background (pseudo-absence) point selection in these cheatgrass models.
How do different methods for background point selection affect model results?
Five statistical models were run with four different background selection approaches. The five models are Maxent, Boosted Regression Trees, Logistic Regression, Multivariate Adaptive Regression Splines, and Random Forest. We explored four different methods to create pseudo-absences including randomly within a box around the presence locations, within a minimum convex polygon (MCP) calculated using 100% of the presence points, randomly within a binary kernel density estimator (KDE) surface using 99% of the presence points and an adhoc optimization method, and within a continuous KDE surface generated with a 99% of the presence points.
The VisTrails workflow allows us to easily run, compare and visualize the results of all five models and all four background selection techniques. The model assessment metrics are similar, but visual comparison shows a difference in the models. The continuous KDE has the most restrictive model calibration region and it also has the most extrapolation within the western United States. Boosted regression tree models seem to be the most sensitive to background selection. We can use these results to determine which models and background selection techniques may work best for generating models of the generalist, cheatgrass.