Three major potential sources of construct-irrelevant variability in the test scores can be the tasks, rating rubrics and rater judgments. The variance caused by these facets of assessment in the test scores is a threat to the dependability, and in turn, generalizability of the test scores. The generalizability of the test scores need to be empirically investigated; otherwise, no evidence can support the generalizability inferences made based on the test scores. The current study employed univariate generalizability theory to investigate the different sources of variance in the test scores and the dependability of the scores obtained from the Columbia University placement (CEP) speaking test (N=144). Moreover, this study used multivariate generalizability theory to look at the dependability of the individual scores obtained from the four scales of the analytic rubric of the test. Finally, justifiability of combining scores from the four analytic rubric scales to make a composite score was investigated. The univariate results revealed that the dependability of the scores of CEP speaking test is high enough to be taken as a consistent measure of the speaking ability of the test takers. The multivariate results showed that the high correlation between the four rating scales of the test and their almost equal effective weights in the composite score makes it justifiable to combine the four scores of the four scales to report a composite score. The present study can contribute to the understanding of L2 assessment researchers of the application of generalizability theory in their reliability and generalizability investigations.