OOS 79-6
TraitBank: An open digital repository for organism traits

Thursday, August 13, 2015: 3:20 PM
337, Baltimore Convention Center
Katja Schulz, Encyclopedia of Life, Smithsonian Institution National Museum of Natural History, Washington, DC
Jennifer Hammock, Encyclopedia of Life, Smithsonian Institution National Museum of Natural History, Washington, DC
Cynthia Parr, National Agricultural Library, UDSA
Background/Question/Methods: Easy access to large amounts of data about the distribution, ecology, life history, physiology and morphology of species has the potential to transform biodiversity research. However, most of the data generated so far are not easily integrated or repurposed due to a lack of standardization in how scientists talk about the characteristics of organisms, how they describe the context of their observations, and how they document the methods with which the data were collected. TraitBank (eol.org/traitbank) addresses this impediment by linking information aggregated from diverse sources to community-developed ontologies and controlled vocabularies. These post hoc semantic annotations improve the discoverability and queriability of the data and provide interoperability with other semantic resources. TraitBank collects information about the characteristics of animals, plants, fungi, and microbes. It covers many different topics and includes species traits that have been identified as Essential Biodiversity Variables by the Group on Earth Observations Biodiversity Observation Network (GEO BON), e.g., measures of body size, phenology, migratory behavior, and physiological traits like thermal tolerance and metabolic rate. Data can be downloaded via csv files or a JSON-LD service. Reuse and redistribution of data with attribution to the original sources is encouraged.

Results/Conclusions: TraitBank currently serves over 11 million measurements and facts for more than 1.7 million taxa. These data are mobilized from major biodiversity information systems (e.g., International Union for Conservation of Nature, Ocean Biogeographic Information System, Paleobiology Database), literature supplements (e.g., Dryad Digital Repository, Ecological Archives, Pangaea), label data from natural history collections, and legacy/unpublished data sets. Each record is accompanied by available metadata on provenance, measurements methods, sampling parameters, etc. TraitBank organizes distributed knowledge from heterogeneous sources into a lightweight, scalable semantic framework that supports retrieval and reuse for a variety of applications, ranging from large-scale synthetic analyses of biodiversity to linked data products and hands-on data science in the classroom. It complements taxon or subject-specific knowledge management systems by filling gaps (both in taxonomic and trait space), by recruiting new types of data (e.g., from text-mining, citizen-science, and specimen data digitization efforts) and by integrating knowledge across the entire tree of life and multiple scientific domains. The emerging semantic framework will facilitate data discovery, support queries across data sets, and advance data integration and exchange among projects, thus making more biodiversity data available for use in scientific and policy-oriented applications.