IGN 12-5 - How to replicate a data analysis

Tuesday, August 8, 2017
C124, Oregon Convention Center
Emery R. Boose1, Aaron M. Ellison1, Elizabeth Fong2, Matthew K Lau1, Barbara S. Lerner2, Thomas Pasquier3 and Margo Seltzer3, (1)Harvard Forest, Harvard University, Petersham, MA, (2)Computer Science, Mt. Holyoke College, South Hadley, MA, (3)Computer Science, Harvard University, Cambridge, MA
Nearly all ecological studies involve analyzing data with a computer. To replicate an analysis we need access to the original data and software used. But that may not be enough. For example, for an analysis done in R, we may be unable to understand or execute the original script (perhaps because of poor documentation or code rot) or to replicate inputs that were generated at runtime (e.g. data downloaded from the web). Data provenance tools help to solve this problem by capturing inputs, outputs, intermediate values, and individual steps when a script executes, enhancing both transparency and reproducibility.