PS 49-93 - DataONE: A virtual data center for biology, ecology, and the environmental sciences

Wednesday, August 5, 2009
Exhibit Hall NE & SE, Albuquerque Convention Center
William Michener1, Suzie Allard2, Paul Allen3, Peter Buneman4, Randy Butler5, John Cobb6, Robert Cook7, Patricia Cruse8, Ewa Deelman9, David DeRoure10, Cliff Duke11, Mike Frame12, Carole Goble13, Stephanie Hampton14, Donald Hobern15, Peter Honeyman16, Jeffery Horsburgh17, Viv Hutchison18, Matt Jones14, Steve Kelling19, Jeremy Kranowitz20, John Kunze8, Bertram Ludaescher21, Maribeth Manoff2, Ricardo Pereira22, Line Pouchard6, Robert Sandusky23, Ryan Scherle24, Mark S. Servilla25, Kathleen Smith24, Carol Tenopir2, Dave Vieglais26, Von Welch5, Jake Weltzin27 and Bruce Wilson28, (1)DataONE, University of New Mexico, Albuquerque, NM, (2)University of Tennessee, (3)Cornell University, (4)University of Edinburgh, (5)University of Illinois - Urbana Champaign, (6)Oak Ridge National Laboratory, (7)Environmental Sciences Division & Climate Change Science Institute, Oak Ridge National Laboratory, Oak Ridge, TN, (8)University of California - California Digital Library, (9)University of Southern California, (10)University of Southampton, (11)Ecological Society of America, (12)U.S. Geological Survey - National Biological Information Infrastructure, (13)University of Manchester, (14)National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, (15)Atlas of Living Australia, (16)University of Michigan, (17)Utah State University, (18)Core Science Systems, US Geological Survey, Denver, CO, (19)Information Science, Cornell Lab of Ornithology, Ithaca, NY, (20)The Keystone Center, (21)University of California - Davis, (22)Taxonomic Databases Working Group (Campinas, Brazil), (23)University of Illinois - Chicago, (24)National Evolutionary Synthesis Center, (25)Biology MSC03 2020, University of New Mexico, Albuquerque, NM, (26)University of Kansas, (27)US Geological Survey, Tucson, AZ, (28)University of Minnesota
Background/Question/Methods

Data about life on earth and the environment are often unavailable or unusable for numerous reasons.  Those data that are available are broadly dispersed and can be difficult to discover and use.  Because of the multiple data and metadata standards employed, integration and analyses have been difficult to achieve. As well, when analyses are completed, sharing and replication of workflows and results pose the next challenge.

DataONE is being designed and constructed to address four key challenges:

1.    Data loss—by preserving at-risk (orphaned) biological/ecological/environmental data from individual scientists

2.    Scattered data sources—by facilitating discovery and access of data through a single easy-to-use portal

3.    Data deluge–by providing a toolbox that empowers scientists and organizations to more easily and effectively manage, analyze, and synthesize data

4.    Poor data practices—by creating an informatics-literate workforce through innovative outreach and training efforts (e.g., best-practice videos, podcasts, on-line certificate programs, downloadable best practice guides and exemplars of data management plans)

Results/Conclusions

DataONE will enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it.

The system is designed around a nucleus of three existing data centers (coordinating nodes) and a broad array of data holdings such as those maintained by libraries, research networks, and academic and governmental organizations (member nodes). The cyberinfrastructure promotes the discovery and access of data by providing one-stop shopping for data and metadata (information about the data that enables its use) about Earth’s biota and environments.  DataONE provides tools (e.g., metadata management and scientific visualization tools as part of an “investigator’s toolbox”), training, and outreach to scientists and students in a concerted effort enabling and promoting data preservation, data stewardship, and data sharing. Through a series of working group meetings, computer and information scientists are engaged in developing and promulgating ontologies that will facilitate data integration and simplify creation of complex scientific workflows. The DataONE portal simplifies the process of acquiring and using appropriate scientific workflow software like Kepler and Taverna, as well as publishing and sharing new workflows via mechanisms such as myExperiment that allows workflows to be re-used and possibly adopted for other uses.

Copyright © . All rights reserved.
Banner photo by Flickr user greg westfall.