Comparing the score interpretation across modes in PISA

Periodical
Large-scale Assessments in Education
Volume
11
Year
2023
Relates to study/studies
PISA 2015

Comparing the score interpretation across modes in PISA

An investigation of how item facets affect difficulty

Abstract

Background: Mode effects, the variations in item and scale properties attributed to the mode of test administration (paper vs. computer), have stimulated research around test equivalence and trend estimation in PISA. The PISA assessment framework provides the backbone to the interpretation of the results of the PISA test scores. However, an identified gap in the current literature is whether mode effects have affected test score interpretation as defined by the assessment framework, and whether the interpretations of the PBA and CBA test scores are comparable.

Methods: This study uses the 2015 PISA field trial data from thirteen countries to compare test modes through a construct representation approach. It is investigated whether item facets defined by the assessment framework (e.g., different cognitive demands) affect item difficulty comparably across modes using a unidimensional two-group generalized partial credit model (GPCM).

Results: Linking the assessment framework to item difficulty using linear regression showed that for both maths and science domains, item categorisation relates to item difficulty, however for the reading domain no such conclusion was possible. In comparing PBA to CBA in representations across the three domains, maths had one facet with a significant difference in representation, reading had all three facets significantly different, and for science, four out of six facets had significant differences.

Modelling items labelled "mode invariant" in PISA 2015, the results indicated that in every domain, two facets showed significant differences between the test modes. The graphical inspection of difficulty patterns confirmed that reading shows stronger differences while the patterns of the other domains were quite consistent between modes.

Conclusions: The present study shows that the mode effects on difficulty vary within the task facets proposed by the PISA assessment framework, in particular for reading. These findings shed light on whether the comparability of score interpretation between modes is compromised. Given the limitations of the link between the reading domain and item difficulty, any conclusions in this domain are limited. Importantly, the present study adds a new approach and empirical findings to the investigation of the cross-mode equivalence in PISA domains.