SYMP 4-3 - Hierarchical statistical models for ecological data: Combining explanation and prediction

Tuesday, August 7, 2012: 8:55 AM
Portland Blrm 251, Oregon Convention Center
Andrew M. Latimer, Plant Sciences, University of California Davis, Davis, CA, Cory Merow, Quantitative Ecology Group, Smithsonian Environmental Research Center, Edgewater, MD and Adam M. Wilson, Ecology & Evolutionary Biology, Yale University, New Haven, CT
Background/Question/Methods

Ecologists increasingly use hierarchical statistical modeling to incorporate ecological processes into phenomenological statistical models. These models range from primarily phenomenological ones that use conditional probability to pull apart distinct processes affecting observed patterns (e.g. modeling presence or absence, then detection given presence), to dynamic approaches that fit process models to data (e.g. assimilating time series data into dynamic population models). Ideally, these models can embody best of the “two cultures” of explanation and prediction. On the other hand, some have criticized them as overly complex, cumbersome and data hungry. In an era of big data sets, are hierarchical statistical models an important tool – can they predict better than traditional statistical methods (e.g. logistic regression, etc.) while remaining more explanatory than algorithmic methods (e.g. MAXENT, etc.)? Should the answer depend entirely on how well the models predict phenomena outside of the range of the input data? As a basis for answering these questions I present results from two case studies using hierarchical Bayesian models to analyze two kinds of “big data” – 1) large-scale plant species distributions including abundance and presence-only data; and 2) population-level responses to environmental conditions using time series data sets from weather stations and soil moisture probes.

Results/Conclusions

If predictive power is the basis for model evaluation, then for some problems algorithmic approaches are clearly attractive. Usually their key advantage is their ability to rapidly search through high-dimensional model space, as in the case of predicting species distribution patterns. Nonetheless, we show that hierarchical statistical models can in some cases predict species distributions comparably well. More importantly, most very large data sets in the “data rich era” are spatiotemporal data sets in which analysis seems to require some form of explanatory model. For example, we present hierarchical statistical models to relate weather data to ecosystem properties such as soil moisture and then to population responses. This requires some form of system understanding, plus some basis for deciding what summaries of the data to include as candidate explanatory variables. We show that hierarchical statistical models can usefully integrate such spatiotemporal data sets by using them to predict soil moisture patterns and plant population responses in California oak savannah and South African shrubland.