PS 43-128 - Avoiding errors in error analyses: How to propagate uncertainty in regression models

Friday, August 12, 2016
ESA Exhibit Hall, Ft Lauderdale Convention Center
Ruth D. Yanai, Forest and Natural Resources Management, SUNY College of Environmental Science and Forestry, Syracuse, NY, Hannah L. Buckley, Department of Ecology, Lincoln University, Canterbury, New Zealand, Bradley S. Case, Department of Informatics and Enabling Technologies, Lincoln University, Canterbury, New Zealand and Richard C. Woollons, School of Forestry, University of Canterbury, New Zealand

Quantifying uncertainty is important to establishing the significance of comparisons, to making predictions with known confidence, and to identifying priorities for improvement. Calculations of forest biomass and elemental content require many measurements and models, each contributing uncertainty to the final estimates, and thus the overall uncertainty is difficult to quantify correctly. While sampling error is commonly reported, error due to uncertainty in the regression used to estimate biomass from tree diameter is usually not quantified. Some published estimates purporting to include uncertainty due to the regression models have reported uncertainty in forest carbon stores based on the uncertainty in the prediction of individuals, ignoring uncertainty in the mean, while others have propagated error in the mean while ignoring individual variation.


Using the simple case of the calcium concentration of sugar maple leaves, we illustrate the difference between the variation among individuals (the standard deviation) and the uncertainty in the mean (the standard error) and the declining importance in the prediction of individual concentrations as the number of individuals increases. For allometric models, the parallel statistics are the prediction interval (or the residual variation in the model fit) and the confidence interval (describing the uncertainty in the best fit model). The effect of propagating these two sources of error is illustrated using the mass of sugar maple foliage, which can be multiplied by calcium concentration to obtain the calcium content of leaf litter. The uncertainty in individual tree biomass was large for plots with few trees; for plots with 30 trees or more, the uncertainty in individuals was less important than the uncertainty in the mean, for the data sets used in this example. The most correct analysis will take both sources of uncertainty into account, but for practical purposes, country-level reports of uncertainty in carbon stocks, as required by the IPCC, can ignore the uncertainty in individuals. Ignoring the uncertainty in the mean will lead to exaggerated estimates of confidence in estimates of biomass and carbon and nutrient contents.