Data about life on earth and the environment are often unavailable or unusable for numerous reasons. Those data that are available are broadly dispersed and can be difficult to discover and use. Because of the multiple data and metadata standards employed, integration and analyses have been difficult to achieve. As well, when analyses are completed, sharing and replication of workflows and results pose the next challenge.
DataONE is being designed and constructed to address four key challenges:
1. Data loss—by preserving at-risk (orphaned) biological/ecological/environmental data from individual scientists
2. Scattered data sources—by facilitating discovery and access of data through a single easy-to-use portal
3. Data deluge–by providing a toolbox that empowers scientists and organizations to more easily and effectively manage, analyze, and synthesize data
4. Poor data practices—by creating an informatics-literate workforce through innovative outreach and training efforts (e.g., best-practice videos, podcasts, on-line certificate programs, downloadable best practice guides and exemplars of data management plans)
Results/Conclusions
DataONE will enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it.
The system is designed around a nucleus of three existing data centers (coordinating nodes) and a broad array of data holdings such as those maintained by libraries, research networks, and academic and governmental organizations (member nodes). The cyberinfrastructure promotes the discovery and access of data by providing one-stop shopping for data and metadata (information about the data that enables its use) about Earth’s biota and environments. DataONE provides tools (e.g., metadata management and scientific visualization tools as part of an “investigator’s toolbox”), training, and outreach to scientists and students in a concerted effort enabling and promoting data preservation, data stewardship, and data sharing. Through a series of working group meetings, computer and information scientists are engaged in developing and promulgating ontologies that will facilitate data integration and simplify creation of complex scientific workflows. The DataONE portal simplifies the process of acquiring and using appropriate scientific workflow software like Kepler and Taverna, as well as publishing and sharing new workflows via mechanisms such as myExperiment that allows workflows to be re-used and possibly adopted for other uses.