Tuesday, August 3, 2010: 2:10 PM
306-307, David L Lawrence Convention Center
Steve Kelling, Information Science, Cornell Lab of Ornithology, Ithaca, NY
Background/Question/Methods Increasing public engagement in volunteer science, through data collection or processing, can both raise the public’s awareness of science and gather useful scientific information. While the payoffs of citizen science are potentially large, achieving them requires new approaches to data gathering, management, and analysis that can only result from strong cross-disciplinary collaborations. Here we present once such collaboration between outreach and informatics specialists, ecologists, computer scientists, and statisticians that has resulted in accurate models of bird distributions based on habitat associations across much of North America. Unique is our ability to describe the dynamics of seasonal bird distributions and associated patterns of habitat use.Our source of bird distribution data is eBird (http://www.ebird.org), an online checklist program that currently gathers more than 100,000 checklists monthly from a hemisphere-wide network of contributors. For this analysis data from 320,000 eBird locations were linked to 500 environmental variables (e.g., land cover, geographic, climatic, and surveys gathering human demographic information). We used a “data driven” semi-parametric modeling methodology that combines the strengths of traditional parametric modeling (effective latent process modeling) with non-parametric models developed in machine learning (support for exploratory analysis of poorly-understood phenomena using large data sets). Results/Conclusions
The result is that we can (1) accurately describe seasonal changes in species distributions, (2) identify regional differences in organism’s migratory movements, and (3) discover seasonal differences in habitat associations. Quantitative measures of our model’s predictive performance provide accuracy scores of 83% and 88%, and AUC scores of 0.88 and 0.84. This combined with expert opinion confirm the accuracy of habitat-based distribution maps for many species of birds. Additionally we identify population-level patterns of habitat use for within-season variation, and across-season preference changes for many species.While the potential of citizen contribution to science is large, achieving them requires new approaches to data management and analysis. Our results were made possible through a data intensive scientific workflow that uses the Internet and cyberinfrastructure to gather and vet data carefully, which is processed through novel analytical methods merged from machine learning, and statistics. The result is estimates of arrival and departure dates and habitat associations of bird populations that will facilitate our understanding of how organisms respond to broad-scale environmental variation such as changing biotic environments, or variation in weather and climate. Our approach of data collection, synthesis, analysis, and visualization will serve as a hallmark for future research initiatives, with broad applicability across many scientific domains.