Microbial communities are typically large, diverse, and complex, and identifying the processes structuring them is a key challenge. While taxonomic diversity has long been used to explore competing hypotheses in ecology, the explosion in the availability of phylogenetic data provides a potentially more powerful framework than taxonomic diversity alone. On the other hand, we currently lack a general theoretical framework for understanding patterns of phylogenetic diversity. Sampling theory is a case in point: if we consider a local community of co-occuring organisms as a sample from a larger regional pool, taxonomic sampling theory provides us with an analytical framework to predict local patterns of diversity. In contrast, to compare local phylogenetic diversity with (for example) a null hypothesis of random sampling, we typically take random samples directly from a metacommunity tree. This 'brute force' approach scales extremely poorly with tree size, and we also need to repeat the process for each new hypothesis. Here, we address two questions: (1) can we develop a computationally-efficient, analytical phylogenetic sampling theory? (2) Applying this sampling theory to human microbiome data, what is the impact of metacommunity size on distinguishing different community assembly hypotheses, and does this impact vary across different body habitats? We adapt mathematical methods from taxonomic sampling theory, and at the center of our approach is a novel phylogenetic analogue of the Species Abundance Distribution, which we term the Edge-length Abundance Distribution (EAD). Computing the EAD for a given metacommunity tree, we can make predictions for local phylogenetic diversity under multiple hypotheses, including environmental filtering (consistent with clustered sampling), local competitive exclusion (consistent with overdispersed sampling), and random sampling.
Results/Conclusions
We present our new phylogenetic sampling theory, and find that the Edge-length Abundance Distribution takes a power-law form across multiple, distinct microbial communities. Focusing on publicly-available human microbiome data, we use our sampling theory to test the impact of the choice of metacommunity and local community on various community assembly hypotheses. For some habitats we find that local diversity is consistent with overdispersed sampling for one choice of reference community, but consistent with clustered sampling when using a larger reference metacommunity, demonstrating a clear impact of metacommunity scale. Overall, our approach provides a new starting point to explore multiple community assembly hypotheses for large microbial communities.