Discrimination of the contextual features of top performers in scientific literacy using a machine learning approach

Periodical
Research in Science Education
Volume
51
Year
2021
Page range
129-158
Relates to study/studies
PISA 2015

Discrimination of the contextual features of top performers in scientific literacy using a machine learning approach

Abstract

Science excellence is associated not only with a student's inherent aptitude but also a range of contextual factors. The objective of this paper was to identify the most important contextual characteristics of top performers in scientific literacy, by simultaneously considering factors at the PISA questionnaire-based student, family, and school levels. The data were based on the science scores of 380,771 PISA 2015 secondary students from 58 countries/economies, of whom 25,181 were top performers at proficiency level 5 or 6, as well as the responses of students and school principals to PISA questionnaires. Overall, 141 contextual variables (derived from the questionnaire responses) were ranked according to their relevance to top performers through a machine learning algorithm--specifically, support vector machine recursive feature elimination (SVM-RFE). An optimal set of 20 features (factors/variables) was then selected from the ranked list due to the high accuracy of these features in classifying and predicting top performers compared to non-top performers based on the support vector machine (SVM) classifier. The research findings indicate that the quality of teachers' instructional practices, parents' educational/occupational status, disciplinary climate, time spent on and involvement in learning, schools' mass media facilities/equipment, the quantity of teachers in the school, and students' self-efficacy played the most predictive roles in the target students' superior performance in science. The features identified in this study may provide important information for the future studies on students' performance in science literacy.