COS 81-1 - Reproducible science case studies in ecology: Lessons learned

Wednesday, August 9, 2017: 8:00 AM
C120-121, Oregon Convention Center
Corinna Gries, Center for Limnology, University of Wisconsin, Madison, WI, Matthew B. Jones, National Center for Ecological Analysis and Synthesis, Santa Barbara, CA, Patricia A. Soranno, Fisheries and Wildlife, Michigan State University, East Lansing, MI and Scott L. Collins, Department of Biology, University of New Mexico, Albuquerque, NM

Some say science is in a reproducibility crisis, meaning that it is almost impossible to reproduce published research results. Complete reproducibility is a lofty goal but a continuum of practices provides achievable steps towards that goal. Clearly, most ecological field sampling may not be reproduced due to variability among sites and changing environmental conditions over time. Hence, steps towards reproducibility generally start with published data, which may be improved with documentation and published data processing and analytical code. This culminates in a complete workflow (raw data to published results) being publicly available for each scientific publication. We present three pioneering case studies of ecological research and introduce tools to achieve documentation of a complete data analysis workflow. The projects are (1) a small group project using publicly available raw data, (2) a larger project which set out with no plans for reproducibility, and (3) a multi-investigator Macrosystems project which developed effective open collaboration policies at the outset and tested tools and approaches to documenting all data manipulations.


While tools supporting steps towards reproducible science are in their infancy, all three projects found that training and leadership are important to change the culture of collaboration, and increase awareness and willingness to open the research process to public scrutiny. Data are never perfect and scientific code is usually not considered to be ready for publishing by the author. Not all contributors of raw data to a synthesis project are willing to release control over their data and measures have to be taken to respect and accommodate that contingency. Planning for reproducibility from the beginning leads to better approaches to requesting data and overall documentation of the process. Not everyone can write open source data processing code and many commercial and/or proprietary programs are used during data analysis rendering the workflow non-reproducible. Currently, making a workflow even understandable is considered extra work without payback. However, those who have embraced open science to a certain degree report the time saving benefits of re-using cleaned and harmonized datasets, facilitated by a welcoming and extremely helpful open source coding community.

We acknowledge over 30 members of the respective research teams who have shared their experiences while working through these new and unfamiliar open science approaches.