A Comparison of Multifaceted Rasch Model and Rating Scale Model for Evaluating EFL Writing Performance

Sahebalam, Sareh; Baghaei, Purya; Rashtchi, Mojgan

doi:10.22034/ijlt.2025.506999.1423

A Comparison of Multifaceted Rasch Model and Rating Scale Model for Evaluating EFL Writing Performance

Articles in Press

Document Type : Original Research Article

Authors

Sareh Sahebalam ¹

Purya Baghaei ²

Mojgan Rashtchi ³

¹ TEFL Department, Faculty of Foreign Languages, North Tehran Branch, Islamic Azad University, Tehran, Iran

² English Department, Islamic Azad University of Mashhad, Mashhad, Iran.

³ Mojgan Rashtchi, TEFL Department, Faculty of Foreign Languages, North Tehran Branch, Islamic Azad University, Tehran, Iran

10.22034/ijlt.2025.506999.1423

Abstract

This study compares the Multifaceted Rasch Model (MFRM) and the Rasch Rating Scale Model (RSM) in evaluating English as a Foreign Language (EFL) writing performance. Rater-mediated assessments depend on rating quality to ensure fairness and validity. The MFRM accounts for rater severity, task difficulty, and student ability, making it a more precise tool for performance-based evaluations. In contrast, RSM simplifies assessment by assuming all raters and tasks function similarly, thus failing to adjust for rater variability. The present study analyzed writing samples from 156 Iranian TEFL students, rated by five experienced IELTS instructors. Results showed that MFRM-adjusted student scores for rater severity, whereas RSM assigned identical scores to students with the same raw scores, ignoring rater effects. Item difficulty rankings were consistent across models, with language being the most difficult criterion and content the easiest. MFRM also revealed interactions between rater gender and student gender, though biases were statistically insignificant. While MFRM provides fairer, rater-invariant scoring, its computational complexity and limited accessibility remain challenging. RSM, though easier to implement, risks overlooking rater bias. The study concludes that MFRM is preferable for high-stakes and criterion-referenced assessments where fairness is crucial, whereas RSM may suffice for norm-referenced assessments. Future research should explore ways to simplify MFRM adoption, such as user-friendly software and training programs for educators.

Keywords

Multifaceted Rasch model

rating scale model

writing assessment

rating quality