The water quality of the Nation’s estuaries is attracting increasing scrutiny in light of burgeoning coastal population growth and enhanced delivery of nutrients via riverine flux. The USEPA has evaluated water quality in US estuaries in the National Coastal Assessment (NCA) and National Aquatic Resource Surveys (NARS) programs. Here we report on a Random Forest (RF) modelling investigation of the survey data to identify the dominant predictor variables affecting surface chlorophyll concentrations in designated regions, paying particular attention to the nutrient measures employed (TN and TP vs DIN and DIP) and regional scale used in the assessment. We also examine model results for indications of change in chlorophyll concentrations over time.
The estuarine water quality data used for RF modelling were collected at over 7800 randomly selected sites surveyed from 2000 to 2006 (NCA) and in 2010 & 2015 (NARS). The sites were sampled once during the summer period using consistent collection and assessment methods. Water quality measures included temperature, salinity, pH, Secchi depth, and concentrations of dissolved oxygen, chlorophyll a, dissolved inorganic nitrogen (DIN) and dissolved inorganic phosphorous (DIP). Total nitrogen (TN) and total phosphorus (TP) were measured nationwide in 2005 and later years.
To date, the survey programs evaluate and report conditions nationally and in four large-scale regions. For our study, we identified 26 sub-regions to investigate the effect of region size. The RF regression modelling technique used here is a machine learning algorithm that produces unbiased estimates of error and robust measures of variable importance by extensively bootstrapping predictor variables and employing training/test subsets of data.
The RF model results arse remarkably robust (typical model statistics: mean squared error = 0.22 and adjusted R2 = 0.685). Based on the percent mean decrease in accuracy, TN, TP, and TN/TP ratio are important predictor variables for chlorophyll concentration, while DIN and DIP are relatively unimportant. Likewise, the finer-scale subregions are significant predictors, while the four large-scale regions are not. Additionally, chlorophyll levels are significantly greater in 2010 and 2015 than in earlier years. These results can be used to improve assessment and reporting methods for estuarine surveys. Future work will investigate how chlorophyll concentrations in sub-regions respond to changes in nutrient levels.