Towards data integration: Access and sharing of biodiversity, ecological, and environmental data for science and decision-making

Elizabeth Martín, Core Science Analytics, Synthesis & Libraries (CSASL), United States Geological Survey (USGS), Gainesville, FL
Annie Simpson, Core Science Analytics, Synthesis, & Libraries (CSASL), United States Geological Survey (USGS), Reston, VA

Use of data from multiple sources is becoming more prevalent as many of the environmental issues of our day are increasingly complex and require interdisciplinary approaches. Before data can be integrated, these data need to be discovered, accessed, and retrieved in user-friendly formats. Government-wide efforts in the United States such as the Open Data Initiative and the Big Data Research and Development Initiative are aimed at increasing the discoverability and accessibility of the large volume of data maintained by federal agencies. One of the open data activities of relevance to ecologists is the U.S. Ecoinformatics-based Open Resources and Machine Accessibility (EcoINFORMA) initiative <http://ecosystems.data.gov>. EcoINFORMA is expanding the availability and interoperability of federal and non-federal biodiversity, ecosystems, and ecosystem services data via focused communities of practice organized into the following data resource hubs: Biodiversity Hub - Biodiversity Information Serving Our Nation (BISON) <http://bison.usgs.ornl.gov>; Ecosystem Services Hub - EnviroAtlas <http://enviroatlas.epa.gov/enviroatlas>; and Land Cover Dynamics Hub - the Multi-Resolution Land Characteristics Consortium (MRLC) <http://www.mrlc.gov>; which are each providing interoperable data, standards, tools and applications for data reuse. Data provided by these and other existing national and global biodiversity informatics programs are available to ecologists for their use.


Availability of these data resources is possible due to the development and implementation of standards (vocabularies, frameworks, processes) and new technologies that facilitate data integration. Some of these standards were developed / implemented because of a need for data to inform decision-making. Examples of standards relevant to sharing of ecological data include: the Darwin Core (DwC) schema <http://rs.tdwg.org/dwc>, a standard reference terminology with definitions for sharing biodiversity datasets; the Integrated Taxonomic Information System (ITIS) <http://www.itis.gov>, a standardized nomenclature of species (or other taxa) names and a hierarchical taxonomic classification for plants, animals, fungi, and microbes of North America and the world; the U.S. National Vegetation Classification (NVC) <http://usnvc.org>, a standardized hierarchical classification of vegetation types for the United States; the Military Grid Reference System (MGRS) and U.S. National Grid (USNG) <https://griffingroups.com/groups/profile/39935/mgrs-and-usng-grid-standards>, a global and national grid reference system (respectively) for representing and sharing geographic locations on the Earth hierarchically. Finally, the use of web and map services to deliver machine-readable data are enabling near-real-time data mashups, visualizations, and analyses, and are offering new opportunities to utilize these data resources in research and decision-making. These freely available federal and non-federal applications provide valuable data integration tools for ecological research.