Random walks are ubiquitous in theoretical ecology and feature prominently in theoretical work on population genetics, population dynamics and animal movement. We have been studying the performance of model-selection criteria, specifically different bootstrap AIC criteria, for multivariate random walk models with hidden states. These are state-space models in which the underlying random walks are observed with error (and thus are hidden) and in which the distribution of the observation error is unknown. In models without hidden states, the over-fitting bias correction for the Kullback-Leibler distance can be approximated by the number estimated parameters K (or some simple function of K to adjust for sample size), and thus the classic form of AIC is -2(maximum log-likelihood) + 2K. In multivariate random walk models with hidden states, the number of estimated parameters is not directly related to the over-fitting bias because the number and assumed structure of the hidden random walks affects the ability of the model to over-fit the data and this determines the bias correction term. Another problem with model selection for random walk models with hidden states is that the number of estimated parameters can quickly dwarf the number of data points -- thus small-sample criteria designed for state-space time-series models are required.
We studied the statistical properties of the two small-sample AIC, bootstrap-AIC (AICb) and improved-AIC (AICi), that have been developed for state-space time-series models. We used simulations of multivariate interacting random walks to compare the properties of these AIC to the true Kullback-Leibler distances, which the AICb and AICi are designed to estimate. The original papers on these criteria studied the performance for univariate hidden random walks, and our work extends this to a study of performance for multivariate hidden random walks with complex relationships between the hidden random walks. We performed simulation studies on the ability of these state-space AIC estimators to select the "true" model out of a diverse set of candidate models. These simulation studies were focused on the ability to robustly estimate the underlying spatial population structure (for example, panmictic versus multiple independent subpopulations) using data sets consisting of multiple spatially-distinct monitoring stations. We found that when the observation errors are low (similar to what we see in many mammalian data sets), AICb was able to consistently identify the correct hidden spatial structure. However, not surprisingly, as observation error increased, panmictic structures were chosen, incorrectly, over the correct models with subpopulation structure.