The assembly of ecological communities depends in part on the evolutionary histories of their members. Knowledge of this history in the form of phylogenetic information is now directly incorporated into analyses of community structure. These analyses, however, have not treated the identity of species and their phylogenies as parameter estimates, acknowledging and incorporating statistical uncertainty in them. This runs counter to the prevailing trend in systematics; the last decades have seen a dramatic increase in the statistical inference of phylogenies, objective species delimitation methods, and rigorous quantification of uncertainty. Here we address this issue with phylogenetic community structure analyses as currently practiced. We developed a Bayesian approach to quantifying community structure, modeling both the uncertainty in the identification of species and their phylogeny, and carrying that uncertainty through to analysis of community structure. We estimate the joint posterior probability of species identities (using a mixed Yule-coalescent model) and the tree (using a relaxed-clock model) using a Markov chain Monte Carlo simulation in R (to be released as an R extension). We then conduct significance testing of metrics of phylogenetic community structure (such as MPD and MNTD) under null models for a set of samples from the Markov chain, producing a posterior distribution of Z scores. We apply this method to 3 types of environmental sequence data (COI sequence from arthropods (both prey and commensals), 25S sequences from yeasts, and 16S from bacteria) obtained from the fluid of the pale pitcher plant Sarracenia alata, across its range and compare the results to the traditional method using point estimates.
Results/Conclusions
Traditional measures of phylogenetic community structure relying on point estimates of species identity and phylogenetic relatedness indicate a generalized pattern of overdispersion at the range-wide scale. Initial estimates of uncertainty in trees and species identities suggest that it is substantial. For example, the COI dataset, containing the longest sequences and most phylogenetic information, has a point estimate of 95 species, but only 77 of those are identified with >95% probability. The significance of incorporating this uncertainty into metrics of community structure is as yet unclear. At the time of abstract submission, the significance tests were being computed. It is, however, likely to influence conclusions that would otherwise be drawn.