Patterns of community composition are causally related to the traits and evolutionary histories of the species involved. This fact has recently inspired a variety of research questions in community ecology. For example, are phylogenetically related species more likely to co-occur in particular environmental conditions? From a methodological perspective, such questions require three data matrices: (1) community data among sites, (2) species characteristics, and (3) site characteristics. However, although there are well-established frameworks in statistical ecology for relating two data matrices, ecologists are less clear about how to relate three. Consequently, there has been much recent work on developing three-matrix statistical methods. Much of this recent methodological work focuses on hypothesis testing and pattern detection, with little emphasis on prediction. But the successful application of basic ecological research to real-world problems largely depends on the extent to which such research leads to reliable predictions and forecasts. We searched the statistics and machine learning literature for methods that can be used to simultaneously analyze three data matrices, one of which is treated as a response to be predicted by the other two. We analyzed urban bird community data to explore the utility of the methods we found.
Results/Conclusions
We discovered an extension of generalized linear models (GLMs)―called generalized bilinear models (GBMs)―which allow for predictive three-matrix analyses. GBMs extend GLMs, which characterize species-specific responses to site characteristics. Unlike these traditional approaches, GBMs have trait-specific parameters instead of species-specific ones. Therefore when there are fewer traits than species, GBMs are more parsimonious than GLMs and consequently tend to have improved prediction success (GBMs of the urban bird data had lower AIC values than traditional GLMs). This improved prediction is caused by a shift in the inferential focus from species to traits. Consequently, each species is a replicate for inferring trait effects, just as each site is a replicate for inferring environmental effects; thus the multivariate nature of community data becomes an asset rather than a difficulty. On the other hand, the price for this improved prediction is that model specification is more difficult. For example, chosen traits must regulate the relationships between species co-occurrence and the chosen environmental variables. A mismatch between traits and environmental variables can lead to clear violations of model assumptions. But such violations can be informative, because they highlight species that are not well-characterized by the measured traits. Such mismatches occurred in our bird community models.