National University of Singapore, Centre for English Language Communication.
Writing assessments often make use of analytic rating scales to describe the criteria for different performance levels. However, the use of such rating scales requires a level of interpretation by raters and if several raters are involved, the reliability of examinee scores can be significantly affected (Engelhard, 1992; McNamara 1996). Variability between raters is partly managed by rater training in the use of the rating scale and this necessarily means that the rating scale itself should be well constructed and can be accurately applied to discriminate examinee performance consistently. This paper reports on the use of the Many-facets Rasch model (MFRM, Linacre, 1989) to assess the validity of a proposed analytic rating scale. The MFRM is widely used to study examinee performance and rater behavior and is useful in rating scale validation to analyze sources of variation in tests (Schaeffer, 2008). Bias analysis allows systematic subpatterns of interactions between raters and the rating scale to be examined. In this paper, scores from a set of essays rated by a team using a revised analytic descriptor were analyzed and the indices for rater severity, rater consistency, rater bias, criteria difficulty and scale functionality were studied. The findings indicate that raters were able to use the revised rating scale to discriminate performances in a consistent manner. The MFRM can contribute to improvements in rater training and rating scale development.