Background/Question/Methods The statistical literature is full of lengthy and often heated debate between Bayesian and frequentist statisticians on many topics, but perhaps no concept is both as controversial and as widely-used as the p-value. The widespread use (and well-documented misuse) of p-values in the literature is perhaps one of the most frequently cited arguments for the use of Bayesian statistics. The p-value describes the probability of observing data when a certain hypothesis is true, rather than the probability that a certain hypothesis is true given the data. When that same hypothesis is not true, however, the p-value may or may not give useful information about the likelihood of a hypothesis. This can give rise to Lindley’s paradox, in which the p-value indicates the null hypothesis should be rejected (i.e., p < 0.05), when in fact the null hypothesis is more likely to be true than the alternative. Our objective was to determine how often Lindley’s paradox arises in practice and to assess the overall severity of this problem. To do this, we randomly sampled p-values and sample sizes reported in the journal “Ecology” in 2009 and used this information to estimate the probability of the null and alternative hypotheses, assuming both were equally likely before observing the data.
Results/Conclusions
We found that for approximately 10% of significant (i.e., p < 0.05) p-values reported in 2009, the null hypothesis was more probable than the alternative. Furthermore, while the average of all significant p-values reported was 0.0083 ± 0.015 (mean ± standard deviation), the average of the probabilities that the null was true was nearly 20 times higher (0.160 ± 0.214, mean ± standard deviation). The difference between the p-value and the probability of a true null hypothesis increased as sample size increased. Furthermore, non-significant p-values (p > 0.05) were frequently used as evidence in favor of the null hypothesis, but these values also did not indicate the probability of the null hypothesis. These results indicate that either 1) a large number of null hypotheses rejected by frequentist methods may in fact be true, or 2) researchers are strongly biased toward testing null hypotheses that are likely to be rejected. We conclude that p-values should be interpreted with caution in ecology, especially in studies with large sample sizes.