How to be happy when your data are SAD

Rominger, Andrew J.; Rominger, Andrew J.

Background/Question/Methods

The species abundance distribution (SAD) is a fundamental pattern of biodiversity; any theory of biodiversity, in addition to whatever other patterns it may predict, should also make meaningful and accurate predictions of the SAD. While the SAD is likely insufficient to differentiate mechanistic hypotheses of community assembly its universality begs prediction and indeed can be used to reject some, but not all, theories of biodiversity. Reviews of theories predicting SADs have been presented ad nauseam. Here we sketch an outline of these theories for the sake of tradition but go one step further and use arguments from information theory, probability theory, simulation and empirical data to make explicit judgment about which should be abandoned and which warrant further consideration and investigation. We also use an extensive simulation experiment to investigate the most robust means for analyzing species abundance data.

Results/Conclusions

Information theoretic methods for comparing models of the SAD are the most robust means available; this needs no validation by simulation. Using these methods we find strongest support for the negative binomial distribution. To directly evaluate the fit of one SAD model to data we find that a z-score based on the log likelihood is the most efficient method (i.e. (logLik_obs - E[logLik_theoretical]) / SD[logLik_theoretical] where logLik_obs is the observed log likelihood and logLik_theoretical refers to the theoretical SAD calculated analytically or by simulation). Additionally, random subsampling SAD data has no meaningful effect on the identification of the most supported model given the data. Conversely, binning species abundance data hugely distorts any ability to correctly estimate models. Taking these results together we find that the log-normal distribution with associated veil line argument is invalid and the recently proposed gambin model can never be realistically estimated from data; both should be abandoned. The negative binomial has a strong tradition in ecology and should be re-invigorated. Using read data and the sketch of a probabilistic, non-neutral community assembly model, we conclude by exploring the causes and consequences of the wide support for the negative binomial.

Meeting Information

Additional Information

COS 80-6 - How to be happy when your data are SAD