DataONE: A virtual data center for biology, ecology, and the environmental sciences

Wednesday, August 5, 2009

PS 49-93: DataONE: A virtual data center for biology, ecology, and the environmental sciences

William Michener¹, Suzie Allard², Paul Allen³, Peter Buneman⁴, Randy Butler⁵, John Cobb⁶, Robert Cook⁶, Patricia Cruse⁷, Ewa Deelman⁸, David DeRoure⁹, Cliff Duke¹⁰, Mike Frame¹¹, Carole Goble¹², Stephanie Hampton¹³, Donald Hobern¹⁴, Peter Honeyman¹⁵, Jeffery Horsburgh¹⁶, Viv Hutchison¹⁷, Matt Jones¹³, Steve Kelling¹⁸, Jeremy Kranowitz¹⁹, John Kunze⁷, Bertram Ludaescher²⁰, Maribeth Manoff², Ricardo Pereira²¹, Line Pouchard⁶, Robert Sandusky²², Ryan Scherle²³, Mark S. Servilla¹, Kathleen Smith²³, Carol Tenopir², Dave Vieglais²⁴, Von Welch⁵, Jake Weltzin²⁵, and Bruce Wilson⁶. (1) University of New Mexico, (2) University of Tennessee, (3) Cornell University, (4) University of Edinburgh, (5) University of Illinois - Urbana Champaign, (6) Oak Ridge National Laboratory, (7) University of California - California Digital Library, (8) University of Southern California, (9) University of Southampton, (10) Ecological Society of America, (11) U.S. Geological Survey - National Biological Information Infrastructure, (12) University of Manchester, (13) National Center for Ecological Analysis and Synthesis, (14) Atlas of Living Australia, (15) University of Michigan, (16) Utah State University, (17) US Geological Survey, (18) Cornell Lab of Ornithology, (19) The Keystone Center, (20) University of California - Davis, (21) Taxonomic Databases Working Group (Campinas, Brazil), (22) University of Illinois - Chicago, (23) National Evolutionary Synthesis Center, (24) University of Kansas, (25) USA National Phenology Network

Background/Question/Methods

Data about life on earth and the environment are often unavailable or unusable for numerous reasons. Those data that are available are broadly dispersed and can be difficult to discover and use. Because of the multiple data and metadata standards employed, integration and analyses have been difficult to achieve. As well, when analyses are completed, sharing and replication of workflows and results pose the next challenge.

DataONE is being designed and constructed to address four key challenges:

1. Data loss—by preserving at-risk (orphaned) biological/ecological/environmental data from individual scientists

2. Scattered data sources—by facilitating discovery and access of data through a single easy-to-use portal

3. Data deluge–by providing a toolbox that empowers scientists and organizations to more easily and effectively manage, analyze, and synthesize data

4. Poor data practices—by creating an informatics-literate workforce through innovative outreach and training efforts (e.g., best-practice videos, podcasts, on-line certificate programs, downloadable best practice guides and exemplars of data management plans)

Results/Conclusions

DataONE will enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it.

The system is designed around a nucleus of three existing data centers (coordinating nodes) and a broad array of data holdings such as those maintained by libraries, research networks, and academic and governmental organizations (member nodes). The cyberinfrastructure promotes the discovery and access of data by providing one-stop shopping for data and metadata (information about the data that enables its use) about Earth’s biota and environments. DataONE provides tools (e.g., metadata management and scientific visualization tools as part of an “investigator’s toolbox”), training, and outreach to scientists and students in a concerted effort enabling and promoting data preservation, data stewardship, and data sharing. Through a series of working group meetings, computer and information scientists are engaged in developing and promulgating ontologies that will facilitate data integration and simplify creation of complex scientific workflows. The DataONE portal simplifies the process of acquiring and using appropriate scientific workflow software like Kepler and Taverna, as well as publishing and sharing new workflows via mechanisms such as myExperiment that allows workflows to be re-used and possibly adopted for other uses.

See more of PS 49 - Ecoinformatics
See more of Posters

See more of The 94th ESA Annual Meeting (August 2 -- 7, 2009)