The use of niche-based distributional modeling has been hampered for several critical applications (e.g., invasive species, effects of climatic change) by questions of transferability. Transferability is the ability of a model to make accurate predictions when projected across space and/or time. Because datasets from other time periods seldom exist, we assess transferability across space in the Caribbean spiny pocket mouse Heteromys anomalus using the maximum entropy algorithm, Maxent. Models were calibrated in north-central South America and the islands of Trinidad and Tobago, and then applied to the Río Magdalena valley (where an isolated set of occurrence records exists). We partitioned the 124 records from the calibration region in two ways (randomly and geographically), both times dividing records into four bins of equal sample size. Geographically partitioned records provide evaluation data that are spatially independent of the calibration data. We ran models in a delete-1 jackknife fashion, each time using a different bin as the evaluation data for a model calibrated using the other three (k-fold cross-validation). To tune for optimal performance, we varied the regularization multiplier (a parameter that protects against overfitting). We assessed performance using the area under the curve (AUC) of the receiver operating characteristic (ROC) plot.
The two ways of partitioning occurrence records led to apparent (but likely artifactual) differences in estimates of performance in the calibration region but similar estimates after being transferred to the Río Magdalena valley. Models based on randomly partitioned records showed substantially higher average AUC in the calibration region than those made with geographically partitioned records—but little difference in the transferred region. When examined visually, the models made via the two methods indicated similar patterns of suitable and unsuitable habitat. Likely, the higher estimates for models made with randomly partitioned records were due to spatial non-independence of calibration and evaluation records. Regarding the effects of parameter tuning, varying the regularization multiplier did not affect estimates of average AUC in the calibration region substantially. However, it led to marked differences in average AUC in the transferred region, with highest performance at intermediate values of regularization (including the default setting). Hence, at least for this species (with many occurrence records), tuning did not lead to higher performance than default settings. In summary, to gain realistic estimates of performance and transferability, researchers should evaluate models based on spatially independent data from regions not used in model calibration.