Predicting science achievement scores with machine learning algorithms

Periodical
Neural Computing and Applications
Volume
35
Year
2023
Page range
21201–21228
Relates to study/studies
PISA 2015
PISA 2018

Predicting science achievement scores with machine learning algorithms

A case study of OECD PISA 2015-2018 data

Abstract

In this study, the performance of machine learning methods was examined in terms of predicting the science education achievement scores of the students who took the exam for the next term, PISA 2018, and the science average scores of the countries, using PISA 2015 data. The research sample consists of a total of 67,329 students who took the PISA 2015 exam from 13 randomly selected countries (Brazil, Chinese Taipei, Dominican Republic, Estonia, Finland, Hungary, Italy, Japan, Lithuania, Luxembourg, Peru, Singapore, Turkiye). In this study, multiple linear regression, support vector regression, random forest, and extreme gradient boosting (XGBoost) machine learning algorithms were used. For the machine learning process, a randomly determined part from the PISA-2015 data of each country researched was divided as training data and the remaining part as testing data to evaluate model performance. As a result of the research, it was determined that the XGBoost algorithm showed the best performance in estimating both PISA-2015 test data and PISA-2018 science academic achievement scores in all researched countries. Furthermore, it was determined that the highest PISA-2018 science achievement scores of the students who participated in the exam, estimated by this algorithm, were in Luxembourg (r = 0.600, RMSE = 75.06, MAE = 59.97), while the lowest were in Finland (r = 0.467, RMSE = 79.38, MAE = 63.24). In addition, the average PISA-2018 science scores of the countries were estimated with the XGBoost algorithm, and the average science scores calculated for all the countries studied were estimated with very high accuracy.