Friday, August 7, 2009

PS 78-25: The Biodiversity Heritage Library: An expanding international collaboration

Constance A. Rinaldo, Harvard University and Catherine Norton, Marine Biological Laboratory/Woods Hole Oceanographic Institute.

Background/Question/Methods

The Biodiversity Heritage Library (BHL; http://www.biodiversitylibrary.org/), one of the cornerstones of the Encyclopedia of Life (eol.org), now contains nearly 13 million digitized pages of 12,000 titles comprised of 32,000 volumes of the published literature of biodiversity held in the collections of major natural history libraries. The BHL has made this literature available for open access and responsible use as a part of a global “biodiversity commons.” The BHL partnership is working with the global taxonomic community, publishers, organizations such as JSTOR and BIOONE, and the Internet Archive, to ensure that the biodiversity literature is available to all, from students to scientists with diverse interests.

This poster will describe the BHL with particular focus on the taxonomic tools available and the development of international partnerships and expanded collaborations with scientists. BHL-Europe has now formed and participation by other countries and projects will augment the available literature and provide redundant repositories and mirror sites. New tools such as the PDF-generator, article repository, updated search interface and social networking tools will be reviewed. Results/Conclusions

The BHL Portal is a transformative research environment for scientific inquiry. The tools and information in the BHL have accelerated research in life sciences. Users reach the BHL through a free, service-based portal formed by coupling existing databases with digitized, searchable images and OCR text. The array of tools for taxonomically intelligent services has expanded. These tools are designed to overcome the problem of common name versus scientific name and changes of names over time and provide easy access to the available literature on a particular organism. This scientific reference system for investigating literature offers a model that reflects, and also amplifies, scientists' use of the natural history literature. In 2008, name finding statistics showed that 30 million name string occurrences were extracted from the BHL with 4.4 million of these being unique. Of these, 23.7 million have been verified by NameBank. Enhancement of OCR tools or manual text correction is crucial to the further development of data mining. Social networking tools may be utilized to expand OCR text correction to users and for tagging maps and illustrations. Scientists are providing bibliographies for taxonomic groups that can be incorporated into the selection process. Organizations have offered already digitized articles for deposit to the BHL. Hence, an article repository has been developed together with tools to extract pages of text, from BHL content (PDF-generator).