Tip dating of phylogenetic trees is a growing discipline that uses sequence data isolated at different points in time to co-estimate the timing of evolutionary events. Phylogenetic inferences performed on such time-structured sequence data represent a powerful tool for evolutionary and ecological studies. However, such inferences are only valid if there is enough and consistent temporal signal in the data. It is crucial to test the temporal signal in the temporal data and its consistency among samples prior to any tip-dating inference{Murray, 2015 #16}. There are two tests available to evaluate the reliability of Bayesian inferences from time-structured data: the Date-Randomization Test (DRT), which assesses the temporal signal in the dataset, and the Leave-One-Out Cross-Validation” (LOOCV), which tests for the consistency between independently calibrated sequences. Here, we introduce TipDatingBeast, an R package built to assist the implementation of various phylogenetic tip-dating tests using BEAST. We apply the BEAST functions to an empirical dataset and supply practical guidance for results interpretation. We performed both the DRT and the LOOCV tests to the “influenza” dataset provided with the BEAST tutorials. This dataset is an alignment of 21 sequences from the Influenza A virus sampled between 1997 and 2004 with 1,698 nucleotides length.
Results/Conclusions
The comparison of the 95% HPD (highest posterior density) of the substitution rate estimated on both the original and date-randomized datasets generated by the TipDatingBeast package and ran in BEAST 1.8 has no overlap between them. Thus, the “influenza” dataset successfully passes the DRT test. The 95% HPD of the age estimated for each sample with the LOOCV dataset generated by the TipDatingBeast package and ran in BEAST 1.8 contains the original sampling date age in 18 of the 21 samples. In the remaining samples, the original sampling dates slightly deviate from the estimated 95% HPD age distribution. These results confirmed that it is appropriated to use a tip-dating approach in the influenza dataset. These two analyses are step by step described and all necessary scripts are available in the TipDatingBeast package tutorial. Considering the impressive increase in availability and use of heterochronous datasets, the functionality provided with this R package can help BEAST users to establish rigorous tip-dating practices and procedures. The TipDatingBeast package is dynamic by nature and will be updated as newer versions of BEAST are released. New functions can also be added to the package as new needs arise.