Comparing the score interpretation across modes in PISA

An investigation of how item facets affect difficulty

Abstract

Background: Mode effects, the variations in item and scale properties attributed to the mode of test administration (paper vs. computer), have stimulated research around test equivalence and trend estimation in PISA. The PISA assessment framework provides the backbone to the interpretation of the results of the PISA test scores. However, an identiﬁed gap in the current literature is whether mode effects have affected test score interpretation as deﬁned by the assessment framework, and whether the interpretations of the PBA and CBA test scores are comparable.

Methods: This study uses the 2015 PISA ﬁeld trial data from thirteen countries to compare test modes through a construct representation approach. It is investigated whether item facets deﬁned by the assessment framework (e.g., different cognitive demands) affect item diﬃculty comparably across modes using a unidimensional two-group generalized partial credit model (GPCM).

Results: Linking the assessment framework to item diﬃculty using linear regression showed that for both maths and science domains, item categorisation relates to item diﬃculty, however for the reading domain no such conclusion was possible. In comparing PBA to CBA in representations across the three domains, maths had one facet with a signiﬁcant difference in representation, reading had all three facets signiﬁcantly different, and for science, four out of six facets had signiﬁcant differences.

Modelling items labelled "mode invariant" in PISA 2015, the results indicated that in every domain, two facets showed signiﬁcant differences between the test modes. The graphical inspection of diﬃculty patterns conﬁrmed that reading shows stronger differences while the patterns of the other domains were quite consistent between modes.

Conclusions: The present study shows that the mode effects on diﬃculty vary within the task facets proposed by the PISA assessment framework, in particular for reading. These ﬁndings shed light on whether the comparability of score interpretation between modes is compromised. Given the limitations of the link between the reading domain and item diﬃculty, any conclusions in this domain are limited. Importantly, the present study adds a new approach and empirical ﬁndings to the investigation of the cross-mode equivalence in PISA domains.

Comparing the score interpretation across modes in PISA

Comparing the score interpretation across modes in PISA

TOP