COS 33-1
From quadrats to continents: predicting species composition with a multiscale model
Many areas of ecology depend on knowing where species occur and where they could occur. We often rely on statistical methods, such as species distribution models (SDMs), to make predictions about areas that cannot be sampled directly. There is often a mismatch, however, between SDMs, which tend to focus on one species and spatial scale at a time, and the multi-species, multi-scale data on which they are based. Reducing the complexity of the data, (e.g. by modeling one species at a time or by lumping data from nearby quadrats together) can make SDMs more tractable, but this approach sacrifices their ability to address co-occurrence patterns among species and spatial patterns in species turnover.
To address these kinds of questions, we need a model that explicitly accommodates complex correlation structure among species and among sites. Here, I present a probabilistic graphical model from the machine learning literature called a Markov random field (MRF), which is able to capture the complexities of noisy, nonindependent data. I have adapted this approach to ecological contexts with hundreds of co-occurring species and nested sampling effort (e.g. nested quadrats), where it can help ecologists address a number of important questions that would not be accessible with simpler models.
Results/Conclusions
The MRF model produces a joint probability distribution over multiple species assemblages in a region. This allows it to recognize, for instance, that while two species of interest are likely to occur in the same general area, they are unlikely to occur in the same quadrat. The model performs these complex inferences using a set of latent (unobserved) variables, which each perform a similar role as the random effects in a multilevel regression model. Since these latent variables respond to patterns at multiple spatial scales, they can help us make inferences about the scales at which ecological processes occur. The model can also help identify spatio-temporal "guilds" that tend to co-occur more than would be expected by chance, which can lead to novel hypotheses about community assembly. Finally, the MRF can also help us make better use of partial samples (e.g. if sampling is interrupted partway through a transect), since it can use whatever data has been collected to infer what species might have been observed if sampling had been completed.