Some Advanced Model Selection Topics for Nonparametric/Semiparametric Models with High-Dimensional Data

dc.contributor.authorFang, Zailien
dc.contributor.committeechairKim, Inyoungen
dc.contributor.committeememberSmith, Eric P.en
dc.contributor.committeememberTerrell, George R.en
dc.contributor.committeememberDu, Pangen
dc.contributor.committeememberLeman, Scotland C.en
dc.contributor.departmentStatisticsen
dc.date.accessioned2014-03-14T21:21:58Zen
dc.date.adate2012-11-13en
dc.date.available2014-03-14T21:21:58Zen
dc.date.issued2012-10-19en
dc.date.rdate2012-11-13en
dc.date.sdate2012-10-21en
dc.description.abstractModel and variable selection have attracted considerable attention in areas of application where datasets usually contain thousands of variables. Variable selection is a critical step to reduce the dimension of high dimensional data by eliminating irrelevant variables. The general objective of variable selection is not only to obtain a set of cost-effective predictors selected but also to improve prediction and prediction variance. We have made several contributions to this issue through a range of advanced topics: providing a graphical view of Bayesian Variable Selection (BVS), recovering sparsity in multivariate nonparametric models and proposing a testing procedure for evaluating nonlinear interaction effect in a semiparametric model. To address the first topic, we propose a new Bayesian variable selection approach via the graphical model and the Ising model, which we refer to the ``Bayesian Ising Graphical Model'' (BIGM). There are several advantages of our BIGM: it is easy to (1) employ the single-site updating and cluster updating algorithm, both of which are suitable for problems with small sample sizes and a larger number of variables, (2) extend this approach to nonparametric regression models, and (3) incorporate graphical prior information. In the second topic, we propose a Nonnegative Garrote on a Kernel machine (NGK) to recover sparsity of input variables in smoothing functions. We model the smoothing function by a least squares kernel machine and construct a nonnegative garrote on the kernel model as the function of the similarity matrix. An efficient coordinate descent/backfitting algorithm is developed. The third topic involves a specific genetic pathway dataset in which the pathways interact with the environmental variables. We propose a semiparametric method to model the pathway-environment interaction. We then employ a restricted likelihood ratio test and a score test to evaluate the main pathway effect and the pathway-environment interaction.en
dc.description.degreePh. D.en
dc.identifier.otheretd-10212012-214919en
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-10212012-214919/en
dc.identifier.urihttp://hdl.handle.net/10919/40090en
dc.publisherVirginia Techen
dc.relation.haspartFang_ZL_D_2012.pdfen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectVariable Selectionen
dc.subjectSmoothing Splinesen
dc.subjectSparsistencyen
dc.subjectSemiparametric Modelen
dc.subjectPathway Analysisen
dc.subjectAdditive Modelen
dc.subjectCluster Algorithmen
dc.subjectGaussian Random Processen
dc.subjectGlobal-Local Shrinkageen
dc.subjectGraphical Modelen
dc.subjectIsing Modelen
dc.subjectKernel Machineen
dc.subjectKM Modelen
dc.subjectLASSOen
dc.subjectLong Tail Prioren
dc.subjectMixture Normalsen
dc.subjectModel Selectionen
dc.subjectMultivariate Smoothing Functionen
dc.subjectNonnegative Garroteen
dc.subjectNonparametric Modelen
dc.titleSome Advanced Model Selection Topics for Nonparametric/Semiparametric Models with High-Dimensional Dataen
dc.typeDissertationen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Fang_ZL_D_2012.pdf
Size:
7.91 MB
Format:
Adobe Portable Document Format