Machine learning provides a promising approach to ecological niche modeling (ENM). Particularly, we submit that the ecological problems of “presence-only” and “presence-absence” modeling may be mapped to the machine learning tasks of one-class and multiple class classification, respectively. We take advantage of this equivalence to propose new methods for ENM. We estimate the performance of these new methods and compare accuracy of one-class and two-class methods using a detailed collection of 106 tree species from 550 sample points representing occurrences and associated environmental data from a region in Switzerland. Classification algorithms were implemented in MATLAB (R2009a) using PR Tools and DD Tools toolboxes, with accuracy assessments completed in MATLAB and analysis in R (2.11.1). We completed analysis of 17 algorithms, 10 one-class methods and 7 two-class methods. Accuracy of models was assessed using the Area Under the Receiver Operator Characteristic (AUC), sensitivity, and precision.
Results/Conclusions
The number of occurrence records in the training set was variable (minimum of 5, maximum of 155), and the distribution was positively skewed. Visual inspection of the number of training points predicting the AUC value showed a high degree of variability at lower sample size regardless of the algorithm or classification scheme. In general, one-class methods had much higher measures of sensitivity, and less spread in measures of precision. Linear mixed effect models show a strong effect of sample size, dependent on algorithm more than classification scheme. Our results indicate that while more information is provided to the two class classifications, there is not a large discrepancy in accuracy between classification schemes, and some one-class methods are superior. A potential biological explanation for this result is the prevalence of sink habitats in natural metapopulation systems.