Caroline Brophy1, David J. Gibson2, and John Connolly1. (1) University College Dublin, (2) Southern Illinois University
Background/Question/Methods Several statistical issues can arise in the analysis of reproductive data. The relationship between reproductive output and plant biomass regularly follows a log–log allometric regression. However, some plants may not reproduce and their zero reproductive values violate the assumptions of this standard regression methodology. Except when there are outlier zero values that do not follow this relationship such as from large, non-reproducing plants, truncated regression allows zero values to be incorporated appropriately in the allometric relationship. Here, we present a framework for dealing with these issues. This methodology combines a truncated regression model and a logistic regression model in a mixture model framework. The truncated regression component allows for non-reproducing values that follow the same allometric relationship as reproducing plants and the second logistic regression component allows for inclusion of the outliers. First, we present an illustration of the methodology to a dataset examining reproductive allocation in the annual plant Sinapis arvensis. We then analyse the applicability of the methods to a range of datasets in which reproductive output was measured in one of several ways for a variety of plants. We generalize the framework to deal with peculiarities that occurred across the datasets and uniquely within individual datasets.
Results/Conclusions We developed our methodology to assess S. arvensis plants growing in competition at varying density and CO2 levels. We found strong evidence for a group of non-reproducing individuals that did not follow the allometric relationship between reproductive allocation and total biomass and CO2. Our datasets from other reproductive output experiments included 6% to 66% non-reproductive individuals per population. We adapted our methodology to deal with issues that arose in applying our framework to each dataset. Reproductive output was measured as a count response (e.g., flower number) in six datasets and we changed the distributional assumptions of the response as the normality assumptions associated with continuous responses (e.g., seed biomass) were no longer suitable. There were repeated measurements in six datasets and we either included additional error terms in our model to allow for the correlation between the repeated responses, or analysed each time point separately. We found evidence of a group of outlier, non-reproducing individuals in five datasets. Measurements were at the stand level in one dataset and our framework was able to deal with this without change. Overall, we found that our framework was robust in handling the wide range of peculiarities that arise with reproductive data.