PS 18-41 - New genetic tools for estimating long term changes in forest composition

Tuesday, August 9, 2011
Exhibit Hall 3, Austin Convention Center
Theresa C. Brenberg1, Alexandria N. Colaco1, Scott J. Emrich2, Shawn T. O'Neil3 and Jason S. McLachlan4, (1)Biological Sciences, University of Notre Dame, (2)Computer Science and Engineering, University of Notre Dame, (3)Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR, (4)Department of Biology, University of Notre Dame, Notre Dame, IN
Background/Question/Methods

Since the last glacial maxima, forest species compositionacross North America has shifted due to climate change. Previously, identificationof pollen from lake core sediments served as the primary basis for reconstructingthese past forest communities. Pollen data, however, is an inexact indicator of speciespresence because it often has a wide dispersal range and some species morphotypes areindistinguishable based on physical assessment alone. Extracting ancient DNA (aDNA)from Holocene sediments allows for tree species to be identified with more spatialcertainty by guaranteeing identification of species local to the area of sedimentation. Thismethod also mitigates the problem of differentiating between closely related species.However, correctly determining haplotypes from mixed DNA assemblages from bulklake sediments can be a challenge, especially as recovered aDNA fragment lengths areoften damaged and short. We investigated whether new bioinformatics tools could helpus identify the suite of DNA haplotypes in genetic assemblages extracted from ancientsediments.

Results/Conclusions

We sequenced and aligned DNA from various tree speciesfound in New England using a barcoding chloroplast DNA region that is conservedand polymorphic across most plant phyla (matK). To analyze the accuracy with whichthese sequences can be used to differentiate species, we simulated the sort of aDNAsequence mixtures we find in aDNA assemblages from sediments. We created artificialconstructs of short sequence fragments based on the composition of forests surroundingNew England lakes and adding the sort of sequence error introduced by damage in aDNAand in next generation sequencing. Hapler is a machine learning algorithm developedto identify haplotype mixtures in population-sampled genetic data. As coverage (theoverlap of sequenced region fragments) increased, the ability of Hapler to correctlyidentify haplotype identity also increased. We found that an overlap of even 2 fragmentsdramatically reduced the error of misidentification down to approximately 20%. Usingfour or more areas of overlap led to an accuracy of identification of 90% or greater. Weestimate that DNA barcode region amplifications with next generation sequencing ofaDNA samples will have an average coverage of 6 fragments, which would yield near90% accuracy. Using new bioinformatics algorithms such as Hapler, relatively lowcoverage of aDNA can still give a good representation of species presence, providing avaluable tool for the improved reconstruction of past forest compositions.

Copyright © . All rights reserved.
Banner photo by Flickr user greg westfall.