OOS 9-3
Assessing student ecological understanding using text analysis and machine learning

Monday, August 10, 2015: 2:10 PM
340, Baltimore Convention Center
Luanna B. Prevost, Dept. of Integrative Biology, University of South Florida, Tampa, FL

Tracing matter and energy is an important ecological principle, yet students often find this challenging. Written assessments can provide insight into students understanding of challenging ecological concepts, but are time-consuming to grade which restricts their use, especially in large-enrollment courses.  This research investigates how text analysis and machine learning approaches perform compared to human-human coding agreement in assessing students’ understanding about matter and energy flow in an ecosystem. Responses to online homework questions were collected from 170 students, and coded using a rubric containing 15 concept codes that included biological concepts (e.g. heat loss), as well as misconceptions (e.g. matter converted into energy). Based on its combination of conceptual codes, each student response was assigned to one of four mental models:  scientific (referencing only scientific principles), narrative (describing food webs), naïve (containing only misconceptions) or mixed. Text analysis used IBM SPSS Modeler to extract words from students’ responses and place them into categories. Categories were then used to predict students’ mental models using a CART classification model. Machine learning using LightSide eliminated the intermediate step of category formation by extracting information directly from students writing to create a predictive model for each of the 15 concept codes.


Human coding achieved an interrater reliability of 0.7-0.9 Kappa (substantial to almost perfect agreement). Classification of text analysis categories predicted human coding with varying agreement. The mental model “narrative approach” produced the strongest computer-human agreement (0.8 precision; 0.7 recall). Fewer than 5% of students had a completely naïve mental model and this model performed poorly, suggesting that a larger sample size is needed to improve this model. Machine learning also performed comparable to human coding (0.7-0.8 Kappa). These results demonstrate that both text analysis and machine learning approaches can attain similar levels of agreement to human-human coding and can be used to assess student writing about energy flow in ecosystems.