To study abundance and its environmental determinants, one generally needs to characterize the shape of its statistical distribution, for instance through the use of a distribution model. In particular, when inference is based on a limited amount of data, having an a priori idea of the data distribution is particularly helpful.
We want to find a distribution model which could be applied generally for freshwater fish abundance data. This model should reflect data’s main properties: discretion, overdispersion, high proportion of zeros. We also wish to study the influence of various factors (e.g. species, site, mean abundance, sample size) on the performance of distribution models, so as to discuss the way environment and behaviour might influence the distribution of fish. Moreover, we illustrate a few consequences to considering an appropriate model for abundance count data, rather than a normal approximation, through the examination of confidence intervals around the estimate of mean abundance.
We study the distribution of 12 freshwater fish species of the Rhône basin, using a huge dataset consisting in repeated samples (each consisting of 20 to 180 counts) collected by electrofishing between 1985 and 2007. We fit four different models to each of our 2258 samples : a zero-inflated Poisson, a negative binomial, a zero-inflated negative binomial, and a two-part Pareto distribution models. For each sample, we fit these four models by maximum-likelihood and select one model according to the BIC criterion. We carry out logistic ANOVAs to assess the influence of factors such as species, site, mean abundance on the choice of one model among the four considered here. We calculate confidence intervals around mean abundance according to our best-performing distribution model, or through the normal approximation, and compare the results obtained through both methods.
Results/Conclusions
Overall, the negative binomial is the most often selected distribution model( 46% of samples). However, the goodness of fit depends a lot on sample features such as mean abundance: in particular, the zero-inflated Poisson model is selected in 56% of the samples whose mean abundance is weaker than 0.6 individual per point. Binomial negative-based confidence intervals are generally wider than confidence intervals based on a Gaussian distributional assumption (92% of samples), in particular when there are very few non-null counts. Besides, they are asymmetric and much longer on their right side, reflecting that mean abundance might actually be a lot higher than the observed mean, given the shape of the distribution.