International Journal of Language Testing

International Journal of Language Testing

Examining Measurement Invariance of a C-Test Across Gender Using Multiple-Group Item Response Theory

Document Type : Original Research Article

Authors
1 PhD, Associate Professor, Termiz University of Economics and Service, 190111 Termez, Uzbekistan
2 PhD, Samarkand State Medical University. Samarkand, Uzbekistan. ORCID: https://orcid.org/0009-0002-8855-9919
3 Teacher, Department of Primary Education, Termez State Pedagogical Institute, I.Karimov street 288b, Termez, Surxondaryo, Uzbekistan. ORCID: https://orcid.org/0009-0001-8340-2501
4 DSc, Associate professor, Department of Sciences and training management, Kokand State University, Kokand, Uzbekistan. ORCID: https://orcid.org/0009-0008-5031-4063
5 Urgench State University, 14, Kh.Alimdjan str, Urganch, Khorezm, Uzbekistan. ORCID: https://orcid.org/0009-0003-5484-5471
6 Associate professor, Department of Economy, Mamun University, Uzbekistan. ORCID: https://orcid.org/0000-0002-6783-8591
7 1. Head of the Department Physics and Chemistry, Tashkent Institute of Irrigation and Agricultural Mechanization Engineers, National Research University, Tashkent, Uzbekistan. 2. Scientific researcher of the University of Tashkent
8 PhD student in Management, Centre for Postgraduate Studies, Swiss Information and Management Institute (SIMI Swiss) & Asia Metropolitan University (AMU), 63000 Cyberjaya, Selangor, Malaysia. ORCID: https://orcid.org/0009-0009-4843-9868
Abstract
Examination of measurement invariance is essential for cross-group comparisons. This study investigates the measurement invariance of a C-Test across gender using a Multiple-Group Item Response Theory framework. A C-Test, composed of six passages totaling 120 items, was administered to 256 intermediate-level English as a Second Language (ESL) learners at Termez University in Uzbekistan. To assess whether the test functioned equivalently for male and female participants with similar language proficiency, MG-IRT modeling was employed using the Partial Credit Model. Model fit indices, item-level statistics, and differential item functioning (DIF) were evaluated through Root Mean Square Difference (RMSD) and Mean Difference (MD) values. Results indicated high reliability (α = .96, EAP = .93), good model fit, and ordered item thresholds. Importantly, none of the six C-Test passages demonstrated meaningful gender-based DIF, suggesting that the instrument exhibits strong measurement invariance. These findings support the validity and fairness of the C-Test in assessing general language proficiency across gender and highlight the utility of MG-IRT models in language test validation.
Keywords