SYMP 11-1
Statistical software and ecology: Paths and impediments for scientific progress and thought

Wednesday, August 13, 2014: 8:00 AM
Gardenia, Sheraton Hotel
Aaron Ellison, Harvard Forest, Harvard University, Petersham, MA
Background/Question/Methods

Ecologists routinely address many challenging questions that relate to the accelerating pace at which humans are transforming the biosphere. Such questions include: How many individuals comprise a functioning population, and how many species are present? Where do these individuals and species occur in space, how do they interact, and how does their distribution and abundance change through time? How do energy and nutrients flow through ecosystems? Do energy and nutrient fluxes alter the distribution and abundance of organisms, or vice versa? What “services” do humans derive from ecosystems, and how are human activities altering ecosystem processes and affecting the provisioning of these services? Do actions and activities at local scales have effects and impacts at larger—regional, continental, even global—scales? Data to address these questions are “big”: they come from a wide range of sources, in many heterogeneous forms, and, increasingly, from remotely-operated sensors. With appropriate statistical methods and software tools, we can use all of these streams of Big Data to address these questions, advance ecological science, and to provide realistic forecasts of environmental change that can inform policy decisions.

Results/Conclusions

Development of new statistical theory and methods is occurring rapidly. The presentation of these new methods now is accompanied routinely by scripts, compiled code, or complete software packages that facilitate their use and adoption by ecologists. However, most ecologists are not programmers, computer scientists, or statisticians, and it is rarely easy for us to choose among similar or competing scripts or packages. In addition, regular updates of operating systems and software, and routine or dramatic changes to code present real challenges for generating reproducible findings. Descriptive metadata and scientific workflow tools are helping to address some of these challenges, but important gaps remain. These gaps include conscious efforts to maintain back-compatibility of database management systems and executable software; robust tools for reconstructing derived datasets from high-frequency, remotely-sensed, streaming data; reliable solutions for version control; and interoperability among platforms and tools. Filling these (and other) gaps will require not only refinement of existing software tools but also new research and development of new software.