Assessment of reading performance, fundamental conditions for measuring reading

The thesis consists of three studies that trace implications of integrating theoretical perspectives on the development and validation of reading measures.

During the last 50 years the field of reading research has seen radical changes in the way reading is conceptualized. New theoretical perspectives strongly influence the way reading is measured. There has, however, been a delay in the implementation of these theoretical perspectives in test development. The tendency to confront different theoretical perspectives on reading has recently been replaced by a willingness to integrate different perspectives. This has resulted in a situation where several core concepts (for instance literacy, depth of meaning, motivation and engagement in reading) are unsettled or ascribed different meanings by researchers. When it comes to assessment, this situation demands close attention from both researchers and test constructors. If hidden, implicit or unsettled assumptions form the basis for development of assessment instruments a veil is drawn between theory and empirical results. This can result in poor quality of the inferences made on the basis of test scores. In this thesis challenges for measurement of reading, in a time with integration of perspectives are identified and explored. The thesis consists of three studies that trace implications of integrating theoretical perspectives on the development and validation of reading measures.

The focus of study 1 and 2 was on item format, a recurring subject in research and discussion about the assessment of reading. Multiple-choice (MC) is simple and economical in terms of testing time and scoring costs.  In relation to large-scale assessments, it has often been discussed whether the use of constructed-response (CR) items may be justified despite the extra expense involved. The decisive factor has been “value added”: the extent to which the inclusion of CR items improves our assessment of reading comprehension in a valid and reliable way. The main aim of the first paper was to explore how the operationalization of depth of understanding in the Progress in International Reading Study (PIRLS) corresponds to the description of reading literacy given in the PIRLS framework. In PIRLS, the CR format is the bearer of central notions included in the theoretical foundation of the test, and this determines how the CR items should be designed and scored to be compatible with the underlying theory of reading comprehension. The study focused on the scoring guides and the relationship between these and the text and items. In the second paper the main aim was to explore the interaction between item format and motivation. Specifically it focused on how motivation contributes differently to reading-comprehension scores depending on item format; and whether students with different levels of reading motivation profit differently depending on item format. The aim of the third study was to explore possible uses of eye-tracking methodology in process-oriented reading test validation. The challenges researchers face in selecting empirical indicators of reading comprehension were highlighted. Research questions where eye tracking methodology may support validation of assessment methods were discussed. Results from a small scale eye-tracking study where students read and answered questions about a multimodal text were used as example material.

Different data sources were used. The first study was a critical analysis of the relationship between the PIRLS Framework 2001 and the PIRLS Revised Scoring Guides 2001. The study also draws on a representative sample of Norwegian 10 year olds (the Norwegian sample in PIRLS 2001). The sample consisted of 815 pupils. The second paper was an empirical study on data collected in 2008. A total of 217 5th graders from 12 classes in 5 schools participated. The third paper was a theoretical and methodological discussion based on eye-tracking data from a small scale eye-tracking study. The eye-tracking data were collected on 20 pupils from the 7th grade.

The results from the first paper revealed a tendency in the PIRLS assessment towards projecting the depth dimension onto a superficial plane in that a number of observations were accepted as a expression of a qualitative entity. This led to the quantification of a phenomenon originally defined as qualitative. The desire to reduce error of measurement tended to undermine the validation arguments which link results to the theoretical description of the construct. In the second paper, results showed that, after controlling for word-reading ability, listening comprehension and non-verbal abilities, reading self-efficacy was a significant positive predictor of reading-comprehension. For pupils with low self-efficacy in reading, this was a significant positive predictor of multiple-choice comprehension scores but not of constructed-response comprehension scores. For students with high self-efficacy in reading, this did not account for additional variance in either item format. These findings have implications for development of reading assessment instruments and the conclusions we draw on the basis of test scores. Results form the third study indicated that reading behaviour associated with a high level of reading comprehension, may be understood by comparing the pupils’ first reading and their reading while answering questions. This has important methodological consequences for research on assessment of reading comprehension.