HATLINK: a link between least squares regression and nonparametric curve estimation

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Polytechnic Institute and State University

For both least squares and nonparametric kernel regression, prediction at a given regressor location is obtained as a weighted average of the observed responses. For least squares, the weights used in this average are a direct consequence of the form of the parametric model prescribed by the user. If the prescribed model is not exactly correct, then the resulting predictions and subsequent inferences may be misleading. On the other hand, nonparametric curve estimation techniques, such as kernel regression, obtain prediction weights solely on the basis of the distance of the regressor coordinates of an observation to the point of prediction. These methods therefore ignore information that the researcher may have concerning a reasonable approximate model. In overlooking such information, the nonparametric curve fitting methods often fit anomalous patterns in the data.

This paper presents a method for obtaining an improved set of prediction weights by striking the proper balance between the least squares and kernel weighting schemes. The method is called "HATLINK," since the appropriate balance is achieved through a mixture of the hat matrices corresponding to the least squares and kernel fits. The mixing parameter is determined adaptively through cross-validation (PRESS) or by a version of the Cp statistic. Predictions obtained through the HATLINK procedure are shown through simulation studies to be robust to model misspecification by the researcher. It is also demonstrated that the HA TLINK procedure can be used to perform many of the usual tasks of regression analysis, such as estimate the error variance, provide confidence intervals, test for lack of fit of the user's prescribed model, and assist in the variable selection process. In accomplishing all of these tasks, the HATLINK procedure provides a modelrobust alternative to the standard model-based approach to regression.