Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water quality measures. The IVs used for these analyses are traditionally measured at the same time as the water quality sample. We investigated the improvement in empirical modeling performance by using IVs that had been temporally synchronized with the FIB response variable. We first examined the univariate relationship between multiple “aspects” of each IV and the response variable to find the single aspect of each IV most strongly related to the response. Aspects are defined by the temporal window and lag (relative to when the response is measured) over which the IV is averaged. Models were then formed using the “best” aspects of each IV. Employing iterative cross-validation, we examined the average improvement in the mean squared error of prediction, MSEP, for a testing dataset after using our temporal synchronization technique on the training data. We compared the MSEP values of three methodologies: predictions made using unsynchronized IVs (UNS), predictions made using synchronized IVs where aspects were chosen using a Pearson correlation coefficient (PCC), and predictions using IV aspects chosen using the PRESS statistic (PRS).
Results/Conclusions
Averaging over 500 randomly-generated testing datasets, the MSEP values using the PRS technique were 50% lower (p < 0.001) than the MSEP values of the UNS technique. The average MSEP values of the PCC technique were 26% lower (p < 0.001) than the MSEP values of the UNS technique. We conclude that temporal synchronization is capable of significantly improving predictive models of FIB levels in recreational swimming waters.