PS 47-64
An R package to simplify analysis of long term vegetation monitoring data at the Shenandoah National Park

Wednesday, August 12, 2015
Exhibit Hall, Baltimore Convention Center
M. H. H. Stevens, Department of Biology, Miami University, Oxford, OH
Wendy B. Cass, Shenandoah National Park, National Park Service, E. Luray, VA
Abigail R. Hyduke, Shenandoah National Park, National Park Service, E. Luray, VA
Wendy W. Hochstedler, Shenandoah National Park, National Park Service, E. Luray, VA
Jing Zhang, Department of Statistics, Miami University, Oxford, OH
Alan B. Williams, National Park Service, Luray, VA

Descriptive data of communities, such as long term monitoring data, can be challenging to summarize, analyze and display. Logistical difficulties include differing needs for database management vs. analysis, missing data elements in large data sets, data entry errors, duplicate records, and periodically updated data sets. Summary and display difficulties include the exploratory and unanticipated nature of the uses of the data, display scales appropriate for both very rare and very abundant taxa, and conflict between best practices vs. convention. Analysis difficulties include spatial and temporal pattern and dependence, responses and predictors collected at a variety of spatial and temporal scales, a wide variety of response variables with non-standard distributions, such as zero-inflated, count and categorical data. Our goal was to create an R package, npsr, to help manage, analyze, and display long term vegetation monitoring data for Shenandoah National Park (SHEN).  The package would be used by analysts and technicians with varied skill levels in statistics and programming. The package takes advantage of RStudio, and requires the Gibbs sampler JAGS, and several other packages, most notably dplyr, ggplot2, R2jags, survey, and tweedie.


Our R package, npsr, includes functions for data management, processing or cleaning, documentation, as well as data display, summary, and analysis for the wide variety of data types in the long term vegetation monitoring data from SHEN. Data processing functions help find duplicate and missing tag numbers, and organize data by selected taxonomic levels. Data display functions simplify plotting raw data and statistical summaries. Data analysis functions allow design-based analyses for simple summaries and also more complex model-based inference using Gibbs sampling (Bayesian approaches) for projection and generation of credible intervals using highly non-normal distributions. Several vignettes describe data analysis for standard reports and also for assessment of ecological integrity.