Tuesday, August 3, 2010

PS 42-137: Ecoinformatics: Integrating software applications for diversity analysis

Vahe A. Ohanyan, Ohio University and Brian C. McCarthy, Ohio University.

Background/Question/Methods

Ecoinformatics is considered a sub-discipline of bioinformatics. The goal of ecoinformatics is to develop algorithms to improve data communication and management, as well as to create and expand software tools to control, store, distribute and demonstrate ecological information. Applications developed in bioinformatics can become quite useful in ecological data analysis, particularly in vegetation ecology.  Vegetation ecologists are familiar with the challenges of analyzing large data sets. Thus ecoinformatics is a good tool to accelerate the problem-solving process. Moreover, it becomes possible to automate much of the data analysis step and introduce higher levels of quality control. Program applications like R and Kepler are well suited for these purposes. Both are open source computer applications and may be used either alone or in an integrated fashion. R is already used by many biologists for statistical analysis, while the lesser known Kepler is designed to automate the dataflow process. Both are user-friendly, and may be easily interfaced. These software applications have been used for certain types of diversity and community data managing projects. Here we present an example of models developed for specific types of data analysis in vegetation ecology.

Results/Conclusions

For diversity analysis, existing packages and codes were not usable and a new code was developed that integrates R and Kepler. In order to use the code developed for this specific project, one needs only supply a standard data set in spreadsheet format as input. The application automatically generates output, which may be in different formats. Two models were developed for different types of plant community data. “Model 1” utilizes percent cover data and produces two tables, one histogram, and a rank-abundance plot. The first table contains frequency, relative frequency, cover, relative cover, importance values, and relative importance value; while the second table includes calculated diversity indices (i.e., species richness, Simpson’s index, Shannon-Weiner, and Brillouin indices). “Model 2” was developed to handle data based upon the point-centered quarter (PCQ) sampling method with data supplied including DBH (diameter breast height) and the distance (distance of each plant from the point). This model produces two tables and a graph. The first table contains calculated diversity indices (as above), while the second table contains density, relative density, basal area, relative basal area, frequency, relative frequency, importance value, and relative importance value.  We expect a broad range for future applications of these models in both research and teaching.