Exploring the multiverse of analytical decisions in scaling educational large-scale assessment data

Periodical
European Journal of Investigation in Health Psychology and Education
Volume
12
Year
2022
Issue number
7
Relates to study/studies
PISA 2018

Exploring the multiverse of analytical decisions in scaling educational large-scale assessment data

 A specification curve analysis for PISA 2018 mathematics data

Abstract

In educational large-scale assessment (LSA) studies such as PISA, item response theory (IRT) scaling models summarize students' performance on cognitive test items across countries. This article investigates the impact of different factors in model specifications for the PISA 2018 mathematics study. The diverse options of the model specification also firm under the labels multiverse analysis or specification curve analysis in the social sciences. In this article, we investigate the following five factors of model specification in the PISA scaling model for obtaining the two country distribution parameters; country means and country standard deviations: (1) the choice of the functional form of the IRT model, (2) the treatment of differential item functioning at the country level, (3) the treatment of missing item responses, (4) the impact of item selection in the PISA test, and (5) the impact of test position effects. In our multiverse analysis, it turned out that model uncertainty had almost the same impact on variability in the country means as sampling errors due to the sampling of students. Model uncertainty had an even larger impact than standard errors for country standard deviations. Overall, each of the five specification factors in the multiverse analysis had at least a moderate effect on either country means or standard deviations. In the discussion section, we critically evaluate the current practice of model specification decisions in LSA studies. It is argued that we would either prefer reporting the variability in model uncertainty or choosing a particular model specification that might provide the strategy that is most valid. It is emphasized that model fit should not play a role in selecting a scaling strategy for LSA applications.