Temporal stability of rating behaviors: effects of differences in rater and ratee samples

Virginia Polytechnic Institute and State University


Latham, Saari, and Fay (1980) asserted that their behavioral observation scales (BOS) produce stably high scale reliabilities across rating occasions. Kane Bernardin (in press) argued that this was due to methodological flaws. They faulted Latham et al. first for scale optimization and secondly, for utilizing the same sample of raters when evaluating the stability of their scale reliabilities. Kane and Bernardin hypothesized that rater-specific illusory halo contributed to producing the stably high reliabilities that Latham et al. reported. This study was a test of that hypothesis. Additionally, the temporal stability of halo, leniency/severity, restriction of range, differential accuracy, and internal consistency, and the intercorrelations among four of these rating behaviors were investigated. Particular interest was directed toward the relationship between differential accuracy and the first three measures, as well as to the relationship between two different definitions of halo.

Subjects were 274 undergraduate students. Performance ratings of videotaped managers were collected on two separate occasions. Replication samples were manipulated to be either independent of the original sample with respect to both raters and ratees, raters only, ratees only, or not independent at all.

Findings were not consistent with the hypotheses. The stability of scale reliabilities does not appear to be affected by idiosyncratic rater biases such as illusory halo. Furthermore, a strong ratee effect was observed in nearly every assessment of the stability of and correlations among the rating behavior measures. A weak positive correlation between halo and accuracy was found, substantiating previous findings, and the two halo measures were found to correlate strongly with one another.