Emery R. Boose1, Aaron M. Ellison1, Leon J. Osterweil2, Lori Clarke2, Rodion Podorozhny3, Alexander Wise2, Julian L. Hadley1, and David R. Foster1. (1) Harvard University, (2) University of Massachusetts, (3) Texas State University
Advances in sensor network technology promise a paradigm shift in environmental research. The ability to conduct simultaneous measurements over broad areas at high sampling rates and to process such measurements in real time will facilitate environmental modeling and forecasting. But significant challenges remain for analyzing, documenting, and managing streaming data. Various strategies can enhance the quality of real-time data and metadata. The sensor network can be designed to minimize missing or questionable data through use of duplicate sensors or complimentary measurements (so that critical values can be both measured and modeled). The data processing system can be designed to support real-time quality control, modeling, and gap filling, as well as critical post-processing tasks such as correction for sensor drift. More sophisticated methods are required to ensure that the resulting datasets are reproducible. We are developing cyberinfrastructure tools that support precise description and execution of the “scientific process” used to create a dataset, based on a formal process definition called an “analytic web.” This approach guarantees dataset reproducibility by providing (1) a complete audit trail of all artifacts used or created in the process, and (2) detailed process metadata that precisely describes all sub-processes. It also supports rigorous testing for logical and statistical errors and propagation of measurement errors. Application of these tools is illustrated in the design of a sensor network to provide real-time integration of meteorological, hydrological, eddy flux, and tree physiological measurements to study the movement of water through a forest ecosystem.