A comparison of the stability of school effectiveness indices produced by classical least squares regression and Bayesian m-group regression techniques
Numerous school effectiveness studies have utilized least squares regression techniques to produce school effectiveness indices despite the fact that they are subject to serious sampling fluctuations when sample sizes are small. If the sample size is smaller than normally thought adequate for accurate prediction a larger sample can be analyzed by pooling students from similar programs from different schools. Even though the regression weights for similar programs should be similar across schools, direct pooling of students may be less than satisfactory. A technique such as Bayesian m-group regression can be used that will incorporate both the similarity of the regressions across schools as well as the uniqueness of the individual programs.
This study empirically examines the predictive efficiency of four regression techniques that utilize individual student data as input. Cross-validation analyses were performed and mean squared errors, mean absolute errors, and correlations between observed and predicted scores were compared for four methods: (1) within-school least squares regression, (2) pooled least squares regression, (3) pooled least squares regression with adjusted alphas, and (4) Bayesian m-group regression with identical regression coefficients.
In addition, school effectiveness indices were obtained for the four regression techniques as well as least squares regression using school means and mean difference scores. These effectiveness indices were compared, and the stability of these indices across random samples of students, and across consecutive classes examined.
The within-school least squares regression method was found to be somewhat inferior to the other three models in terms of predictive efficiency. The Bayesian m-group equal slope model showed no appreciable advantage over the pooled least squares regression model or the pooled least squares regression model with adjusted alphas.
The indices produced by all six methods appear to be capable of representing the relative effectiveness of the schools involved in the study. In addition, those indices that moderate the importance of extreme values remained relatively stable from one subsample to another with correlations ranging from .75 to .85. Stability from class to class were of a much lower magnitude than those values reflecting stability from sample to sample. Correlations between school effectiveness indices of consecutive classes ranged from .28 to .47.