PS 57-166 - Connecting theory and data demands reevaluation of experimental design

Wednesday, August 8, 2012
Exhibit Hall, Oregon Convention Center
Brian D. Inouye, Rocky Mountain Biological Laboratory, Crested Butte, CO
Background/Question/Methods

An increased emphasis on the role of experimental manipulations in Ecology developed while statistical methods were dominated by linear statistical models and computational power put severe limits on the range of potential analyses.  More recently, increased computational power, new algorithms, and shifts in the influence of alternative statistical paradigms have all affected the range of statistical inferences that are possible.  Ecologists have adopted many new approaches to analyses, including complex models based on maximum likelihood and Bayesian methods.  However, there has been comparatively little discussion of changing the ways in which ecologists collect data in order to take full advantage of the strengths of these analyses.  While some authors have advocated a shift towards 'regression based' approaches and away from 'ANOVA based' approaches, this doesn't answer the necessary questions of how to allocate effect among treatments, treatment combinations, replicates, or blocks, nor how many treatments and combinations to use.  Answering these questions will allow better connections between experiments and relevant ecological theory.

Results/Conclusions

I evaluated the performance of a wide range of experimental designs, based on analyses of simulated datasets where the underlying mechanisms that generated the data were known.  Experimental designs vary greatly in their ability to distinguish among (multiple) alternative hypotheses for the shapes of functions relating two quantities (i.e. population densities and traits, physiological traits and individual performance, nutrient levels and community or ecosystem responses).  An experimental design was rated highly if it was consistently able to identify correct patterns in the data with a high degree of support (based on likelihood differences), and rated poorly if it was unable to distinguish among hypotheses or consistently misidentified patterns.  Given constraints upon total experimental effort, designs that were on average rated highly contained some replication within each treatment, an intermediate number of treatments, and unevenly spaced treatments along axes.  Exact rankings depended upon how stochasticity was incorporated and the range of hypotheses being compared. Finding a single best type of experimental design often required prior knowledge of the approximate range of responses and/or certainty that the 'true' mechanism is contained within the set of hypotheses being tested.  This suggests the importance of having pilot data before conducting large experiments.