SYMP 23-5
Linking big data across scales to forecast plant community dynamics
“Big data” has been used to study vegetation distributions and dynamics for decades; thirty years ago geographers were coping with the challenge of developing Geographic Information Systems that could handle the anticipated NASA’s MODIS-mission data stream. Remotely sensed data have always been “big” since the dawn of the earth resources remote sensing era. But not all big data in plant ecology – pixels vs. rasters vs. plots vs. pressed plants – are the same. Understanding plant distributions and vegetation dynamics driven by global change requires integrating environmental data from many sources: biodiversity records, vegetation plots, environmental maps and species demographic parameters and other traits. We will discuss some of the challenges we have encountered in using these data types, and in linking them in an integrated modeling framework to predict impacts of global change on plant communities.
Results/Conclusions
Biodiversity records of species localities are aggregated into regional and global databases. A community of researchers has developed to address common issues of locational uncertainty, geocoding errors, and taxonomic errors and revisions. Those who have developed these databases focus on interoperability, and support a philosophy of flagging and exposing errors rather than deleting data. While users should pay careful attention to quality issues associated with data from multiple sources, an open data infrastructure like GBIF provide sufficient information for careful data screening. Environmental maps, ranging from historical and future climate grids, to digital terrain models, to remotely sensed data products, tend to have sufficient metadata. While these are large datasets, each observation (grid cell or pixel) is from the same source or procedure with well-characterized properties. In contrast, we have found that using some databases that aggregate other types of plant data, ranging from plots to population parameters, has been challenged by very uneven geographical and taxonomic coverage, and difficulties with both pulling and pushing data. This may be because these databases were developed to meet a particular research objective, or because they may not have sufficient institutional support for an easily accessible data archive in perpetuity