Impacts of Ignoring Nested Data Structure in Rasch/IRT Model and Comparison of Different Estimation Methods
This study involves investigating the impacts of ignoring nested data structure in Rasch/1PL item response theory (IRT) model via a two-level and three-level hierarchical generalized linear model (HGLM). Currently, Rasch/IRT models are frequently used in educational and psychometric researches for data obtained from multistage cluster samplings, which are more likely to violate the assumption of independent observations of examinees required by Rasch/IRT models. The violation of the assumption of independent observation, however, is ignored in the current standard practices which apply the standard Rasch/IRT for the large scale testing data. A simulation study (Study Two) was conducted to address this issue of the effects of ignoring nested data structure in Rasch/IRT models under various conditions, following a simulation study (Study One) to compare the performances of three methods, such as Penalized Quasi-Likelihood (PQL), Laplace approximation, and Adaptive Gaussian Quadrature (AGQ), commonly used in HGLM in terms of accuracy and efficiency in estimating parameters.
As expected, PQL tended to produce seriously biased item difficulty estimates and ability variance estimates whereas almost unbiased for Laplace or AGQ for both 2-level and 3-level analysis. As for the root mean squared errors (RMSE), three methods performed without substantive differences for item difficulty estimates and ability variance estimates in both 2-level and 3-level analysis, except for level-2 ability variance estimates in 3-level analysis. Generally, Laplace and AGQ performed similarly well in terms of bias and RMSE of parameter estimates; however, Laplace exhibited a much lower convergence rate than that of AGQ in 3-level analyses.
The results from AGQ, which produced the most accurate and stable results among three computational methods, demonstrated that the theoretical standard errors (SE), i.e., asymptotic information-based SEs, were underestimated by at most 34% when 2-level analyses were used for the data generated from 3-level model, implying that the Type I error rate would be inflated when the nested data structures are ignored in Rasch/IRT models. The underestimated theoretical standard errors were substantively more severe as the true ability variance increased or the number of students within schools increased regardless of test length or the number of schools.