Uncertainty due to gap-filling in long-term hydrologic datasets
Most long-term datasets contain gaps, which we define as missing or unusable data. In hydrologic datasets, causes of gaps include loss or contamination of samples, routine maintenance, equipment malfunction, extreme weather, and breaks in the funding cycle. When calculating precipitation inputs or streamflow outputs of water or solutes, it is not possible to simply omit missing values. Methods for filling gaps vary widely, and the uncertainty associated with filling gaps is not commonly reported. For five long-term hydrologic studies, we described the frequency and causes of gaps in the volume and solute concentrations of precipitation and streamflow. To quantify the uncertainty associated with different gap-filling methods, we created a series of artificial gaps, and compared the estimates with measured values. Our case studies include datasets from four LTER sites (Hubbard Brook, Coweeta, HJ Andrews and Sevilleta) in the US and Gomadansan Experimental Forest in Japan.
Specific causes of missing data varied among our study sites, but broad generalities can be made. For streamflow, equipment failure or maintenance accounted for 75% of missing flow values at Hubbard Brook and Coweeta. In streamflow datasets, problems with chemical observations are less common than gaps in flow. Conversely, in precipitation datasets, chemistry gaps are much more common than gaps in volume. Over 60% of chemistry gaps in rain gauge data were due to contamination of samples. Based on a 95% prediction interval, we determined that volume and chemistry gap filling using regression models created an uncertainty of 2-8% around annual nitrate input estimates for a rain gauge at the Sevilleta. Gaps in streamflow created an uncertainty up to 8% at Gomadansan. The uncertainty associated with gap-filling should be reported when using precipitation and streamflow datasets to calculate nutrient budgets. We recommend that researchers perform this relatively simple analysis to quantify the uncertainty associated with their missing data. Performing this analysis will also allow researchers to select the method which minimizes gap-filling uncertainty at their site.