COS 185-5 - All for one or one for all? Mapping many species individually vs. simultaneously with random forest

Friday, August 10, 2012: 9:20 AM
E142, Oregon Convention Center
Emilie B. Henderson, Institute for Natural Resources, Oregon State University, Portland, OR, Janet Ohmann, Pacific Northwest Research Station, USDA Forest Service, Matt Gregory, Forest Science, Oregon State University, Corvallis, OR and Heather M. Roberts, Department of Forest Ecosystems and Society, Oregon State University, Corvallis, OR
Background/Question/Methods

Many problems in landscape management and conservation planning require information on plant species and communities over large regions. Much research has been devoted to mapping species individually using species distribution models (SDMs), and to mapping discrete community types, but techniques for mapping multiple species simultaneously are less known. How do maps of community composition differ when assembled by overlaying many individual species maps, vs. by imputing an intact community of many species to each map unit? We mapped distributions and abundances of 27 tree species over 96,000 sq km in the western USA using three approaches: SDMs using machine learning (random forest) to yield binary (RF_b) and continuous (RF_c) predictions, and mapping all species simultaneously using nearest-neighbor imputation based on the random forest algorithm (NN), which can yield binary and continuous predictions from for a suite of species from a single model.  Response variables were species occurrences or abundances on 1,468 field plots. Spatial predictors were from Landsat imagery and climate and topography layers. We evaluated map quality for species presence/absence (kappa), species abundance (root mean square difference, RMSD, RF_r and NN only), and overall composition (bray-curtis distance between observed and predicted communities).

Results/Conclusions

Mean kappa for 27 species was nearly always best for the RF_b models (0.52), compared to NN (0.33) and RF_c (0.06). RMSDs were slightly better for RF_c (0.26) than for NN (0.30).  RF_c’s strength with RMSDs and weakness with kappas reflect its tendency to predict very small values instead of zeros.  This yields good maps of species abundance, but bad maps of species absences.  The mean bray-curtis distance for binary predictions was better for RF_b (0.33) than NN (0.50), but ranks were reversed for continuous predictions (NN 0.24 vs. RF_c 0.45). These findings suggest that assembled maps are more accurate than imputed maps in one dimension at a time. Imputed maps successfully represent abundances of many species while simultaneously delineating species ranges -- and providing the most accurate representation of community composition.