Document Type: Original Article
Guangdong University of Foreign Studies, China,
Middle School of Xiamen, Fujian, China.
Written Discourse Completion Task (WDCT) has been used in pragmatics tests to measure EFL learners’ interlanguage pragmatic knowledge. In a WDCT, the students give their responses to situations designed to elicit certain pragmatic functions, so human raters are required to rate the students’ performance. When decisions are made based upon such ratings, it is essential that the assigned ratings are accurate and fair. As a result, efforts should be taken to minimize the impact of rater inaccuracy or bias on ratings. This paper reports a study of rater effects in a WDCT pragmatics test. Based on the Myford& Wolfe (2003; 2004) model and corresponding retrospective interviews, four types of rater effects were investigated and discussed quantitatively and qualitatively: leniency/severity, central tendency, halo effect, and differential leniency/severity. Results revealed significant differences in terms of rating severity, with a general tendency towards severity. Though the raters could effectively and consistently employ the rating scales in their ratings, some of them showed certain degrees of halo effect. Most raters were also found to exhibit certain bias across both traits and test takers. Possible reasons behind the rater effects were analyzed. Finally suggestions were raised for rating training.