COS 133-8 - Supporting data synthesis in ecology: The Environmental Data Initiative (EDI)

Thursday, August 10, 2017: 10:30 AM
B114, Oregon Convention Center
Robert Waide1, James W. Brunt1, Duane Costa1, Corinna Gries2, Paul C. Hanson2, Margaret C O'Brien3, Mark S. Servilla1, Colin A. Smith2 and Kristin L. Vanderbilt1, (1)Biology, University of New Mexico, Albuquerque, NM, (2)Center for Limnology, University of Wisconsin, Madison, WI, (3)Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA

Ecology has become an increasingly data-driven discipline as researchers address questions across multiple scales and domains. The ability for ecologists to find and then synthesize all the data they need can be challenging. To help mitigate this issue, the Environmental Data Initiative (EDI) was funded by the National Science Foundation in 2016 to 1) provide an open data repository for NSF-funded programs including Long Term Research in Environmental Biology (LTREB), Macrosystems Biology (MSB), Organization for Biological Field Stations (OBFS), and Long Term Ecological Research (LTER; 2) support and train researchers in those communities to archive high-quality data and metadata; and 3) develop best practices for data formatting and documentation that will make data easier to discover and integrate. The EDI Data Repository is an extension of the Provenance Aware Synthesis Tracking Architecture (PASTA) developed originally to house LTER data. Data and metadata undergo many quality checks before being uploaded into PASTA and data contributors are encouraged to evaluate data before making a final archive. All data are documented with the Ecological Metadata Language standard and are available through either a web-browser or directly through web-services.


EDI now curates more than 42,000 data packages in the repository, with DOIs assigned to each package and registered with DataCite. Thesa data are also federated and discoverable through DataONE. Some researchers have preferred that EDI staff generate the structured metadata required for ingestion into the repository, while others have been trained by EDI to use the R statistical package to create metadata themselves. New repository extensions, developed by EDI, support archiving data from other ecological communities, and EDI is now developing tools to streamline data documentation. EDI has collaborated with ecologists and data specialists to define best practices for creating standard formats and metadata content for archiving population and community data sets. EDI’s focus on documenting and teaching information management best practices has been successful in increasing the number of ecological data packages archived and the number of ecologists with basic information management skills. EDI is also developing a forum to support the exchange of technical skills learned from years of “hands-on” information management experience. Adoption of these best practices and skills, descriptions of which are freely accessible on the EDI website, will accelerate the pace of data synthesis by making these data more compatible and easily integrated.