When conducting community research, ecologists must collect a large enough sample of individuals to be representative of the complete community. However, an increasingly large sample size may result in diminishing returns in improving any pattern revealed by the data. If smaller sample sizes produce the same community patterns as larger, researchers can save time and money. My goal is to determine the minimum required sample size for abundance-based community research employing multivariate statistical techniques, and examine how estimates of community differentiation at various subsampled populations are sensitive to richness and evenness.
In addition to 25 real datasets, we subsampled two types of artificial datasets. First, we assigned abundance values to datasets consisting of 5 or 10 samples; richness values of 10, 20, or 50; and median evenness values of 0.57 or 0.94. Within these 12 combinations of parameters, we varied rank abundance structures to create 141 assigned datasets. Second, we randomly generated 210 simulated datasets based on a hypothetical environmental gradient and varying numbers of samples and taxa. These two types of artificial datasets allow for systematic examination of the effects of richness and evenness.
Results/Conclusions
Each sample within each dataset was randomly subsampled 1000 times to a series of incrementally diminishing proportions (50%-2.5%) of the original median sample size (smallest subsample=20-40 individuals). Each of the 1000 subsampled datasets was correlated with its corresponding complete dataset using three multivariate methods: the Mantel test; correlations of non-metric multidimensional scaling (NMDS) axis scores; and Procrustes Randomization Tests (PROTEST) of NMDS. With the exception of a few outliers, the 210 simulated and 25 real datasets all had goodness-of-fit statistics (Mantel Tests R-statistics, NMDS correlation r-values, and PROTEST m2-values) >0.80 for all sample sizes >50 individuals. When the sample size was <50 individuals, the goodness-of-fit statistics decreased, plummeting at sample sizes <25 individuals. Evenness strongly influenced goodness-of-fit statistics within the 141 assigned datasets; datasets with a median evenness of 0.94 correlated more poorly than datasets with median evenness of 0.57. This suggests that high-evenness datasets require larger samples sizes. Among the simulated datasets, those with more taxa showed less variation in goodness-of-fit statistics. However, datasets with only a few taxa still have goodness-of-fit statistics >0.80. Thus, for datasets with moderate to low evenness, sample sizes >50 produce the same community patterns as larger sample sizes, demonstrating that smaller sample sizes are sufficient when using multivariate techniques to compare abundance-based communities.