PS 42-92
Genome size correlations with ecological traits in bacteria
In animals, body size correlates with life span, metabolism, population, range, and ecological niche. However, very little is known about whether these correlations extend to microbes. We examine whether bacterial genome size (as a proxy for body size) may similarly relate to ecological function. The Integrated Microbial Genome (IMG) database includes information about genome sequences, numbers of genes, and associated ecological traits for 5,009 species of bacteria. Unlike eukaryotes, bacterial genomes contain little non-coding DNA, so that genome size and gene number are linearly related. If having more genes allows for more complex functions to be encoded, this would suggest a linear correlation between genome size and ecological and metabolic complexity in bacteria. We aimed to test the connection between functional complexity and genome size via ecological metadata in the IMG database. We used techniques from machine learning to perform exploratory analysis of the IMG data and the relationships between phenotypic features and bacterial genome sizes.
Results/Conclusions
The distribution of genome sizes exhibited a heavy-tailed distribution, reminiscent of the canonical body size distribution observed in mammals, birds and other major animal groups. Our initial results indicate that larger bacterial genomes do exhibit greater metabolic diversity, suggesting a pattern of environmental generalism that is also observed in animals. For example, we found that aerobic bacteria have a heavier-tailed distribution than facultative and anaerobic species; all of the genomes that contained more than 8000 genes were aerobic. This result coincides with previous research showing that free-living bacteria tend to have the largest, most complex genomes. In addition, our initial results showed different genome size distributions in pathogenic vs. non-pathogenic bacteria. These results suggest many additional questions regarding the applicability of ecological theory to the microbiome. We continue to develop a set of tools from machine learning to impute missing data and predict ecological traits for newly discovered bacterial strains when the existing data set is sparse.