Evidence for the historical development of and recent trends in data-intensive ecology
Like many fields of science, ecology matured during a time period (the twentieth century) that had very different technologies and methodologies than we have today. Many of ecology’s established practices and norms developed within the constraints of that time as well, including a) the high cost of publishing, b) the ability to sample few organisms or ecosystems, resulting in small sample sizes and the need for statistics to account for those small sample sizes, c) the need to differentiate ecology from its largely-descriptive beginnings, d) the culture of limited to no data sharing, and e) the lack of computing resources. Not surprisingly, the current research landscape related to these factors is vastly different, even from just 20-30 years ago. And yet, it appears that the research norms related to publication and funding, statistics and inference, and data analysis and sharing are lagging behind the changing realities of doing science in an data-intensive era. In this study, we ask the question: is ecology becoming increasingly data-intensive? And, if so, is there evidence for the increasing use of data-intensive methods related to methodological integration, iteration, exploratory analyses, and non-hypothetico-deductive approaches?
We answered the above questions by analyzing trends in the published literature in Ecology, Journal of Ecology, and The American Naturalist. We studied three historical time periods starting in 1925 and more recent research from 1989 to 2014. We used an automated text mining approach to collect data such as numbers of samples and other specific words related to the above topics. In addition, we used an unsupervised machine-learning approach called topic modeling to uncover topical relationships among articles, and to quantify trends related to data processing, analysis, and inference. In general, we found that sample sizes are increasing in ecology. In addition, although we found some changes related to data analysis and inference, hypothetico-deductive approaches still dominate the field. Strictly hypothetico-deductive approaches can present problems for studying the often more complex questions that require large datasets. Therefore, we recommend that many established norms and practices related to data, analysis, and inference warrant revisiting as ecology transforms to an increasingly data-intensive discipline.