Modified principal components regression

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Polytechnic Institute and State University

When near linear relationships exist among the columns of regressor variables, the variances of the least squares estimators of the regression coefficients become very large. The least squares estimator of the vector of the regression coefficients, which can be written in terms of latent roots and latent vectors of X'X, tends to place heavy weights on the latent vectors corresponding to small latent roots of X'X. Thus, the estimates of regression coefficients corresponding to the regressors involved in multicollinearities tend to be dominated by the multicollinearities. Therefore, the least squares estimators could estimate the true parameters poorly and could be very unreliable.

In order to overcome the ill-effects of multicollinearities on the least squares estimator, the procedure of principal components regression deletes those components corresponding to the small latent roots of X'X. Then we regress y on the retained components using ordinary least squares. When principal components regression is used as an alternative to the least squares in the presence of a near singular X'X matrix, its performance depends strongly on the orientation of the deleted components to the vector of regression coefficients. In this paper, we present a modification of the principal components procedure in which components associated with near singularities are dampened but are not completely deleted.

The resulting estimator was compared in a Monte Carlo study with the least squares estimator and the principal component estimator using mean squared error as the basis of comparison. The results indicate that the modified principal components estimator will perform better than either of the other two estimators over a wide range of orientations and signal-to-noise ratios and that it provides a reasonable compromise choice when the orientation is unknown.