OPS 2-14
Informatics at NEON: Ensuring a standardized and interoperable information ecosystem
The National Ecological Observation Network (NEON) will collect ecological data at an unprecedented scale, including data from a variety of heterogeneous data streams. These include hundreds of different sensor measurement streams, observational and organismal sampling, hyperspectrometery and lidar data, and higher level synthetic data products. NEON is faced with the unique challenge of how to standardize data across all of these vastly different types. Beyond just presenting information to our diverse consumers in a standardized way we have to ensure that our data products are interoperable with each other, and with existing community standards. To meet these challenges, we have to use a diverse array of modern data synthesis tools and approaches that may be unfamiliar to most ecologists. These include a systems engineering approach to generating standardized requirements, adopting standards and building ontologies, and technical tools like document databases, network databases, and SPARQL endpoints.
Results/Conclusions
We have adopted a wide variety of community data and metadata standards to share information with our broader user base. Sensor data and hyperspectrometry products will be shared in common formats such as HDF5 and NetCDF4. Metadata will rendered for users as HTML, but also downloadable as XML conforming to the ISO-19115-2 standard. Observational data products that resemble typical ecological observations (e.g. breeding bird surveys) will be shared in ASCII formats and metadata provided in Ecological Metadata Language (EML) to match other data sources such as the NCEAS Knoweledge Network for Biocomplexity (KNB). Beyond just sharing data and metadata in community standards we are developing an internal NEON controlled vocabulary (and later ontology) that will facilititate interoperability across all NEON data products (e.g. in any NEON data product that has a "sampleID", it will mean the same thing.) We are managing these standards with technologies such as Neo4J to store ontological relationships and CouchDB to serve as a document repository for definitions all fields related to each data product. This will allow us to generate human digestable web pages for all of the data products we offer, as well as provide a RESTful API for machine to machine data sharing about available data from NEON.