A Comparison of Early Childhood Assessments and A Standardized Measure For Program Evaluation
Traditionally, standardized achievement tests have been used to monitor program effectiveness. Recently, however, educators have questioned the appropriateness of standardized tests for this purpose, especially for programs designed for young children. Early childhood advocates suggest using developmentally appropriate assessments instead of standardized achievement tests for making classroom-level decisions about children and for program evaluation. Proponents, however, have not fully identified the psychometric properties of the assessments, certainly not for the purposes of program evaluation. Although developmentally appropriate assessments have been implemented in a number of classrooms across the country, few studies have verified their ability to discriminate among developmental levels. In addition, even fewer studies have addressed their use for evaluating program effectiveness. Using the records of 293 students from the local site of a National Transition Project and both classical test theory (CTT) and item response theory (IRT) procedures, three assessment instruments and a standardized test were examined. It was shown that the Concepts about Print portion of the Early Childhood Assessment Package, the Language Arts component of the kindergarten developmental progress reports, and the first grade Early Literacy Scale tasks are, in fact, developmental assessments. Additionally, IRT procedures located students on the developmental continuum underlying the assessments. Although classical ANCOVAs were unable to identify Treatment or Head Start program effects beyond the kindergarten year, IRT procedures showed that the expected proportion of students at the highest latent ability levels tended to be greater for students in Demonstration schools and Head Start graduates than their counterparts throughout kindergarten and first grade. A standardized reading achievement measure administered to the students in second grade, was unable to differentiate program effects through either classical or IRT procedures. This suggests that the concepts underlying standardized tests differ from those underlying developmentally appropriate assessments. As a result, the key issue to be resolved is which type of measure is more valid, that is, more appropriate, for evaluating early childhood programs.