COS 33-9 - The utility of C-score analysis for examining bacterial co-occurrence patterns in large sequencing datasets

Tuesday, August 7, 2012: 10:50 AM
F151, Oregon Convention Center
Lee F. Stanish, Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO, Teresa M. Legg, INSTAAR, Dept. Ecology & Evolutionary Biology, University of Colorado, Boulder, CO, Diana R. Nemergut, INSTAAR, Environmental Studies, University of Colorado, Boulder, CO, Sean P. O'Neill, INSTAAR, Ecology and Evolutionary Biology, University of Colorado at Boulder, Boulder, CO and Antonio Gonzalez-Pena, Chemistry, University of Colorado, Boulder, CO
Background/Question/Methods

The recent development of high-throughput gene sequencing techniques have stimulated research into understanding the ecological processes structuring microbial community assembly. Using the theoretical frameworks established for macrobial communities, such as taxon co-occurrence analysis, microbial ecologists have found analogous patterns shaping microbial communities. Similar to macrobial communities, checkerboard score (C-score) analysis has found that microbial communities tend to show less taxon co-occurrence than expected, indicating a shared pattern in community dynamics. Yet, the analytical tools developed for evaluating taxon co-occurrence were designed for macrobial data sets with well-defined species distinctions and sampling schemes, whereas there is less consensus over species definitions and sampling protocols in microbial ecology. Thus, it is not clear if the classical co-occurrence models can be applied directly to large microbial sequencing datasets.

The goal of this research is to investigate the ubiquity of non-random co-occurrence patterns in large bacterial pyrosequencing datasets, as measured by the C-score. In addition, we investigated the relationships between C-scores and sampling depth for data sets ranging in community diversity. We used bacterial 16S rRNA gene pyrosequence datasets comprised of five or more samples per site, as well as synthetic communities, which represented a gradient in co-occurrence levels, as well as local, regional, and global geographic scales. After filtering our datasets to include operational taxonomic units (OTUs) with a minimum count of 10 and a minimum sample occurrence of three, we calculated C-scores using two commonly used null models implemented in the R project vegan package.

Results/Conclusions

For the majority of our data sets, C-scores were significantly different than the simulated null models, indicating non-random co-occurrence patterns. However, whether the checkerboard score was significantly higher or lower than the null model relied on null model choice and sequencing depth. In general, the standardized effect size (SES) values calculated by the equiprobable null assumption increased with increasing sample depth, while SES values using the sequential swap model varied uniquely with sequence depth for each data set. The global dataset was more robust than local or regional datasets in relation to sampling depth, indicating that caution should be used when applying the C-score to local or regional datasets. Our results therefore highlight the importance of choosing the appropriate null model when evaluating taxon co-occurrence patterns in bacterial communities using large sequencing datasets.