SYMP 9-4 - The statistical evaluation of ecological theory – A continuum of refutation/confirmation

Tuesday, August 9, 2016: 3:10 PM
Grand Floridian Blrm D, Ft Lauderdale Convention Center
Ken Aho, Department of Biological Sciences, Idaho State University, Pocatello, ID, Dewayne Derryberry, Mathematics and Statistics, Idaho State University, Pocatello, ID and Teri Peterson, Management, Assistant Professor

Overarching perspectives in statistical hypothesis testing and model evaluation (frequentist/information-theoretic/Bayesian) have led to methods that vary widely with respect to their underlying philosophy and intended purpose.  Unfortunately, these constraints are often poorly understood by ecologists, leading to model misapplication and misinterpretation.  For example, many theoretical frameworks in ecology are attempts to mathematically depict particular natural conditions; for instance, “default”, or “no effect” patterns.  This is done because an explicit effect (often 0) can be set for H0, whereas HA can define only “some effect” distinct from H0.  As a result, many ecologists have quantified the validity of said models/predictions by setting them as null hypotheses in frequentist significance tests.  Frequentist significance tests, however, no not allow empirical confirmation of null (or alternative) hypotheses. Instead, under the conventional severe-falsificationist framework we “reject” or provisionally “fail to reject H0.

We can compare the perspectives of widely disparate statistical methods using the likelihood ratio test statistic, X2 (two times the difference in HA and H0 log-likelihoods).  Under a conventional significance testing perspective the line of demarcation between H0 and HA is the critical value X2 = 1.962.  Conversely, for AIC and BIC this line is X2 = 2, and X2 = log(n), respectively. For practical and comparative purposes, I. J. Good’s “Bayes/non-Bayes compromise” has the demarcation line X2 = Φ-1{1 – 0.25/ (n)0.5}, where Φ-1 (p) denotes the probit function at probability p.  These lines can be used to graphically demonstrate the divergence of the aforementioned methods as n increases. 

By considering effect size, and thus defining the distribution of X2 under HA, our approach can also be used to intuitively demonstrate the strong consistency of BIC and Good’s compromise in model selection.  Strong consistency requires that as sample size approaches infinity the true model, from a group of models, will be selected.  Our approach can also be extended to consider the behavior of metrics for models with widely differing numbers of parameters.


The demarcation lines/surfaces for significance testing, AIC, BIC, and Good’s compromise represent locations along a conceptual continuum of hypothesis refutation/confirmation.  AIC and significance testing do not consider sample size, thus as sample size grows large these methods will reject H0 with probability 1.  These methods are therefore strongly refutative. On the other hand, BIC and Good's compromise demand more evidence against H0 for rejection as sample size increases. These methods were intended to confirm the correct hypothesis, HA or H0.