Monday, August 6, 2007 - 4:00 PM

COS 19-8: Defining and assessing data quality in online ecological information systems

James W. Brunt, Mark S. Servilla, Inigo San Gil, and Duane Costa. University of New Mexico

The cost of making inferences and decisions based on poor quality data is quite
high. Consequently, with the proliferation of online sources of ecological data,
management of data quality and the quality of associated data management
processes has become of critical importance. Data quality is reported in the
literature as a multi-dimensional concept. Many proposals have been made for
characterizing these dimensions for general application. It is difficult,
however, to define a set of data quality dimensions suitable for every context.
In fact, the importance of knowledge about different data quality dimensions
varies by the role of the stakeholder. For example, scientists are primarily
concerned with accuracy and completeness, data managers more so and with
timeliness and accessibility, while scientists as data consumers are primarily
concerned with relevancy. In this paper we define a conceptual framework for
ecological data quality that takes into account the differences between
collection, management, storage, and use. We propose a set of data quality
dimensions for ecology aimed at capturing the most important aspects of data
quality for each of these areas for the ecologist. Secondly, We evaluate the
relevancy of the two most widely accepted metadata standards, Ecological
Metadata Language (EML) and the Biological Data Profile (BDP) for their
usefulness in assessing data quality. Lastly, we present the results of analyses
of several online information sources for a subset of the proposed data quality
dimensions.