OOS 87-10
Breaking communication gaps: Models talking with ecologists, the data, and each other

Friday, August 14, 2015: 11:10 AM
327, Baltimore Convention Center
Michael Dietze, Earth and Environment, Boston University, Boston, MA

Models play a critical role in synthesizing our understanding of ecosystems and making forward projections into novel conditions. Increasingly, models are being used as a scaffold for data-driven synthesis and are central to ecological forecasting. However, models remain inaccessible to most ecologists, in large part due to the informatics challenges of managing the flows of information in and out of such models. Managing the communication between models and data involves three distinct challenges: dealing with the volume of Big Data; processing unstructured and uncurated ‘long tail’ data; and the need to capture a range of uncertainties in model-data comparisons and formal data-model assimilation. Finally, model development has long been an academic cottage industry, with different models lacking compatible formats for inputs, outputs, and settings. This has lead to massive redundancies and minimal reproducibility. As a result, the pace of model improvement has been glacial.


PEcAn (pecanproject.org), a tool box for model-data ecoinformatics, tackles many of these communication gaps. Users interact will all models through an intuitive web-based Google-Map-based interface, a single API, and standardized file formats. Standardization allows the development of common, reusable tools for processing inputs, visualizing outputs, and automating analyses. PEcAn includes state-of-the-art Hierarchical Bayes tools for model parameterization, data assimilation, and uncertainty analysis, and leverages Brown Dog tools for processing uncurated data. A PostGIS database tracks all inputs, outputs, and model runs, greatly increasing reproducibility and reliability. Finally, database syncs and file sharing across PEcAn allow models to talk to each other and enables the community to effectively analyze many models distributed across a global network.