COS 106-6 - Inferring epidemiological parameters and dynamics in structured populations from sequence data

Wednesday, August 8, 2012: 3:20 PM
D138, Oregon Convention Center
David A. Rasmussen, Biology Department, Duke University, Durham, NC, Erik M. Volz, Department of Epidemiology, University of Michigan, Ann Arbor, MI and Katia Koelle, Biology, Duke University, Durham, NC
Background/Question/Methods

Models with some form of population structure commonly arise in epidemiology and population ecology. Well-known examples in epidemiology include spatially structured and age-structured models. The key feature of these models is that host populations are structured into different states and susceptibility and transmissibility may vary across states. Host population structure can therefore create heterogeneities in pathogen transmission rates, making the study of disease dynamics difficult. These difficulties can be compounded by the fact that researchers rarely have reliable observational data on each subpopulation of interest, making statistical model fitting especially challenging. One possible way forward is to use new statistical methods that allow traditional epidemiological models to be fit to pathogen sequence data. If a genealogy can be reconstructed from sequence data, epidemiological models can be fit to genealogies via a coalescent model that links the underlying disease dynamics to the genealogy. If sequences are sampled from different subpopulations, structured coalescent models can be used to incorporate population structure. However, structured coalescent models have rarely been applied to populations undergoing complex dynamics and it remains an open question if and when it is possible to infer parameters relating to population structure and population dynamics from a genealogy.

Results/Conclusions

Using recently developed structured coalescent models that can accommodate more complex population dynamics, we have developed a general statistical framework for fitting structured epidemiological models to genealogies. Specifically, by combining simulation-based statistical methods with Bayesian MCMC methods, we are able to fit a wide class of stochastic, nonlinear epidemiological models with different forms of population structure to genealogies. From simulated data sets, we are able to accurately infer parameters relating to the structure and dynamics of pathogens circulating within and among different host populations. We also discuss the limits of inferring population structure in non-stationary populations from genealogies and what kinds of sequence data are necessary for accurate estimates. We then show how sequence data can be combined with more traditional sources of data such as time series to achieve more reliable estimates when sequence data alone are insufficient for the inference task at hand.