Big data initiatives for agroecosystems
The United States Department of Agriculture invests roughly $2 billion annually in scientific research. This intramural and extramural effort creates a substantial amount of digital scientific data in fields ranging from genomics to hydrology to agronomy to social and economic analysis. These data are required to be open and machine-readable under federal mandates. Infrastructure is needed to comply with these mandates and to facilitate large-scale, multi-institutional and often long-term projects currently underway. At the same time, they must be well curated and effectively managed in order to support repurposing for future, often multi-disciplinary research. In this talk we describe several initiatives being undertaken at the National Agricultural Library (NAL) in support of agricultural research, in collaboration with partners inside and outside the Department.
NAL has developed a workspace for research groups associated with the i5k initiative, which aims to sequence the genomes of all insect species known to be important to worldwide agriculture, food safety, medicine, and energy production; all those used as models in biology; the most abundant in world ecosystems; and representatives in every branch of insect phylogeny. The LCA Commons provides open access life cycle assessment (LCA) datasets and tools for researchers studying sustainable methods in crop and livestock production. In support of the Long-Term Agroecosystem Research Initiative, NAL has begun to work with the LTAR network to design and prototype methods for providing integrated access both to decades of legacy data from the 18 selected sites as well as to planned future data streams, many of which will be streaming. Finally, NAL has prototyped the Ag Data Commons, a general catalog and repository for agricultural data which can promote effective discovery of and add value to often widely distributed and seemingly disparate datasets. Ag Data Commons makes a special effort to link to PubAg, NAL's growing repository of open agricultural literature, and to leverage its tools. These efforts are still overcoming challenges such as lack of standards and scarce resources. However, early efforts to link to existing databases and repositories, including those serving related fields, and to apply best practices indicate the feasibility of the vision. Developing specialized resources and interconnecting them using existing and future technology addresses the need to cost-effectively support future big-data research in agroecosystems.