Multivariate Applications of Bayesian Model Averaging
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The standard methodology when building statistical models has been to use one of several algorithms to systematically search the model space for a good model. If the number of variables is small then all possible models or best subset procedures may be used, but for data sets with a large number of variables, a stepwise procedure is usually implemented. The stepwise procedure of model selection was designed for its computational efficiency and is not guaranteed to find the best model with respect to any optimality criteria. While the model selected may not be the best possible of those in the model space, commonly it is almost as good as the best model. Many times there will be several models that exist that may be competitors of the best model in terms of the selection criterion, but classical model building dictates that a single model be chosen to the exclusion of all others. An alternative to this is Bayesian model averaging (BMA), which uses the information from all models based on how well each is supported by the data.
Using BMA allows a variance component due to the uncertainty of the model selection process to be estimated. The variance of any statistic of interest is conditional on the model selected so if there is model uncertainty then variance estimates should reflect this. BMA methodology can also be used for variable assessment since the probability that a given variable is active is readily obtained from the individual model posterior probabilities.
The multivariate methods considered in this research are principal components analysis (PCA), canonical variate analysis (CVA), and canonical correlation analysis (CCA). Each method is viewed as a particular multivariate extension of univariate multiple regression. The marginal likelihood of a univariate multiple regression model has been approximated using the Bayes information criteria (BIC), hence the marginal likelihood for these multivariate extensions also makes use of this approximation.
One of the main criticisms of multivariate techniques in general is that they are difficult to interpret. To aid interpretation, BMA methodology is used to assess the contribution of each variable to the methods investigated. A second issue that is addressed is displaying of results of an analysis graphically. The goal here is to effectively convey the germane elements of an analysis when BMA is used in order to obtain a clearer picture of what conclusions should be drawn.
Finally, the model uncertainty variance component can be estimated using BMA. The variance due to model uncertainty is ignored when the standard model building tenets are used giving overly optimistic variance estimates. Even though the model attained via standard techniques may be adequate, in general, it would be difficult to argue that the chosen model is in fact the correct model. It seems more appropriate to incorporate the information from all plausible models that are well supported by the data to make decisions and to use variance estimates that account for the uncertainty in the model estimation as well as model selection.