Grass-dominated ecosystems cover one-third of Earth’s land surface, influence key biogeochemical processes, and serve as a major food sources. Unfortunately, studying responses of grasslands to environmental change in paleorecords is limited by the coarse taxonomic resolution of grass pollen grains. Grass is rarely identified below the family level. To address this issue, we utilized Superresolution Structured Illumination Microscopy (SR-SIM) to produce sectioned images of nanoscale grass pollen surface ornamentation that cannot be readily captured by conventional optical microscopy methods (e.g., brightfield transmitted light). These images of pollen grains comprise of spatial patterns and non-spatial morphological properties. Convolutional neural networks (CNNs) are among the best machine learning techniques from image classification, so they can be employed to handle the former, while a shallow neural network can be used to learn properties of the latter. To assess the ability of our machine learning techniques to classify species and morphotypes, we imaged 60 grass species to train the models. Furthermore, majority of these grass species represent the modern grass species diversity on Mt Kenya in East Africa. Using our models, we will identify grass assemblages in modern lake surface sediments and from a 25,000 yr sediment core from Lake Rutundu.
Preliminary tests of our machine learning models seem to suggest that binary classifiers, while faster to train, prove to be ineffective compared to a full k-way classifier. Binary classifiers rank the correct species worse when more species are taken into consideration. With a single-slice classifier and a 50-layer feed-forward residual neural network, it is possible to get roughly 42% accuracy. The 50-layer network was chosen over deeper networks (that likely had too many parameters and would take longer to train) and smaller networks that would have to be trained from scratch. To further improve accuracy, we combined predictions from multiple slices of the same pollen stack and simply averaged the output vectors. These output vectors, upon being transformed into a vote vector based on rank, improve classification substantially. A 2-layer neural network trained on the a combined 44-dimenstional vector classifies a pollen grain with roughly 67.5% accuracy. We suspect that including metadata (e.g., volume, area, phylogeny) into our models may also further increase its accuracy. The accurate classification of pollen grains in fossil and subfossil samples remains inconclusive until the models are improved and additional grass pollen from fossil samples is imaged.