Seven methods of handling missing data using samples from a national data base

Witta, Eleanor Lea2014-03-142014-03-141992etd-06062008-170840http://hdl.handle.net/10919/38437The effectiveness of seven methods of handling missing data was investigated in a factorial design using random samples selected from the National Education Longitudinal Study of 1988 (NELS-88). Methods evaluated were listwise deletion, pairwise deletion, mean substitution, Buck's procedure, mean regression, one iteration regression, and iterative regression. Factors controlled were number of variables (4 and 8), average intercorrelation (0.2 and 0.4), sample size (200 and 2000), and proportion of incomplete cases (10%, 20%, and 40%). The pattern of missing values was determined by the pattern existing in the variables selected from NELS-88 data base. Covariance matrices resulting from the use of each missing data method were compared to the 'true' covariance matrix using multi-sample analysis in LISREL 7. Variable means were compared to the 'true' means using the MANOVA procedure in SPSS/PC+. Statistically significant differences (p≤.05) were detected in both comparisons. The most surprising result of this study was the effectiveness (p>.05) of pairwise deletion whenever the sample size was large thus supporting the contention that the error term disappears as sample size approaches infinity (Glasser, 1964). Listwise deletion was also effective (p>.05) whenever there were four variables or the sample size was small. Almost as surprising was the relative ineffectiveness (p<.05) of the regression methods. This is explained by the difference in proportion of incomplete cases versus the proportion of missing values, and by the distribution of the missing values within the incomplete cases.vii, 82 leavesBTDapplication/pdfenIn CopyrightLD5655.V856 1992.W588Missing observations (Statistics)Seven methods of handling missing data using samples from a national data baseDissertationhttp://scholar.lib.vt.edu/theses/available/etd-06062008-170840/