Informatics serving network science: Data standardization and interoperability at NEON
Large research networks are critically dependent on data accessibility and usability, without which networks risk becoming storehouses of unmined data. The National Ecological Observatory Network (NEON) faces the major challenge of data standardization across an extremely heterogeneous range of data, including observational and organismal sampling, high-frequency sensor measurements, and remotely sensed hyperspectrometry and LiDAR. Interoperability for this range of data includes both consistency with existing standards in specialized fields, and internal consistency between different data types within NEON. One of the major goals of NEON and other networks is to facilitate integrative science that incorporates data from a wide variety of scales and/or sources, and informatics strategies are critical to achieving that goal.
Different parts of NEON’s heterogeneous data are at different maturity levels, with remote sensing data currently made available via community standard HDF5 formats, and observational and in situ sensor data made available via ASCII formats. Near-term goals include delivery of sensor data in HDF5, association of observational data with standardized metadata in Ecological Metadata Language (EML), and development of standardized spatial metadata. NEON is also establishing an internal controlled vocabulary that will ensure standardized meaning and unit definition for given terms (e.g. plotID, PAR) across the observatory. Taxonomic information is derived where possible from existing databases, such as the USDA PLANTS database. For unit definition purposes NEON is following the standards and unit names established by EML and the Long Term Ecological Research Network (LTER). The controlled vocabulary will be made available to the community on a gradual basis, concurrent with its development internally.