A solution to the problem of separation in site-occupancy models
Site-occupancy models (SOMs) use repeated-survey data to disentangle probabilities of species detection and occurrence so that ‘true’ site occupancy, as well as covariate effects on occurrence and detection probabilities, can be estimated. One common problem of standard SOMs is that maximum-likelihood estimation (MLE) fails to converge if a key covariate predicts detection/non-detection data perfectly or near-perfectly; when such ‘separation’ occurs, the maximum-likelihood estimate for that covariate does not exist, precluding further inference. A penalized score equation within the likelihood function is often used to overcome this problem in ordinary logistic regression, but we lack an established statistical protocol for MLE-based SOMs. Here, we illustrate how a penalized likelihood can be integrated into a single-species SOM in the R package ‘unmarked’ to improve inferences about covariate effects on occurrence and detection probabilities. To explore whether a ‘penalized SOM’ can estimate unbiased probabilities, we simulated true occupancy with (i) highly aggregated site-covariate characteristics then imposing imperfect detection to simulate observed incidences, and with (ii) highly aggregated detection-covariate characteristics to simulate detected incidences. We then compared occupancy and detection probability estimates derived from (i) a standard single-species SOM and (ii) a penalized SOM.
A penalized SOM substantially improved our ability to estimate occupancy and detection probabilities and reduced variation around the estimates under high covariate separation. Models that included separated covariates and other important environmental covariates successfully converged using a penalized SOM and arrived at more accurate occupancy probabilities and improved detection probabilities than a standard SOM. The utility of penalized likelihood was sensitive to the extent of covariate separation (complete or quasi-complete), and the size of the study (number of sites and number of observations per site). The penalized-likelihood site-occupancy model performed well with both categorical and continuous covariates. Our results demonstrate the importance of considering the distribution of observed covariates when including them in site-occupancy models, and illustrate the inappropriateness of reporting estimates derived from traditional methods when there are data separation issues. While penalized likelihood was integrated into the simplest SOM, we see potential to incorporate this correction into more complex models, from dynamic occupancy and abundance to species distribution and niche models. Overall, our study shows that using an occupancy model with penalized likelihood can improve inference about key covariates affecting site-occupancy.