Show simple item record

dc.contributor.authorShen, Suminen
dc.date.accessioned2020-02-08T09:00:24Z
dc.date.available2020-02-08T09:00:24Z
dc.date.issued2020-02-07
dc.identifier.othervt_gsexam:24103en
dc.identifier.urihttp://hdl.handle.net/10919/96767
dc.description.abstractThe advances in data-collecting technologies provides great opportunities to access large sample-size data sets with high dimensionality. Variable selection is an important procedure to extract useful knowledge from such complex data. While in many real-data applications, appropriate selection of variables should facilitate the model interpretation and computation efficiency. It is thus important to incorporate domain knowledge of underlying data generation mechanism to select key variables for improving the model performance. However, general variable selection techniques, such as the best subset selection and the Lasso, often do not take the underlying data generation mechanism into considerations. This thesis proposal aims to develop statistical modeling methodologies with a focus on the structured variable selection towards better model interpretation and computation efficiency. Specifically, this thesis proposal consists of three parts: an additive heredity model with coefficients incorporating the multi-level data, a regularized dynamic generalized linear model with piecewise constant functional coefficients, and a structured variable selection method within the best subset selection framework. In Chapter 2, an additive heredity model is proposed for analyzing mixture-of-mixtures (MoM) experiments. The MoM experiment is different from the classical mixture experiment in that the mixture component in MoM experiments, known as the major component, is made up of sub-components, known as the minor components. The proposed model considers an additive structure to inherently connect the major components with the minor components. To enable a meaningful interpretation for the estimated model, we apply the hierarchical and heredity principles by using the nonnegative garrote technique for model selection. The performance of the additive heredity model was compared to several conventional methods in both unconstrained and constrained MoM experiments. The additive heredity model was then successfully applied in a real problem of optimizing the Pringlestextsuperscript{textregistered} potato crisp studied previously in the literature. In Chapter 3, we consider the dynamic effects of variables in the generalized linear model such as logistic regression. This work is motivated from the engineering problem with varying effects of process variables to product quality caused by equipment degradation. To address such challenge, we propose a penalized dynamic regression model which is flexible to estimate the dynamic coefficient structure. The proposed method considers modeling the functional coefficient parameter as piecewise constant functions. Specifically, under the penalized regression framework, the fused lasso penalty is adopted for detecting the changes in the dynamic coefficients. The group lasso penalty is applied to enable a sparse selection of variables. Moreover, an efficient parameter estimation algorithm is also developed based on alternating direction method of multipliers. The performance of the dynamic coefficient model is evaluated in numerical studies and three real-data examples. In Chapter 4, we develop a structured variable selection method within the best subset selection framework. In the literature, many techniques within the LASSO framework have been developed to address structured variable selection issues. However, less attention has been spent on structured best subset selection problems. In this work, we propose a sparse Ridge regression method to address structured variable selection issues. The key idea of the proposed method is to re-construct the regression matrix in the angle of experimental designs. We employ the estimation-maximization algorithm to formulate the best subset selection problem as an iterative linear integer optimization (LIO) problem. the mixed integer optimization algorithm as the selection step. We demonstrate the power of the proposed method in various structured variable selection problems. Moverover, the proposed method can be extended to the ridge penalized best subset selection problems. The performance of the proposed method is evaluated in numerical studies.en
dc.format.mediumETDen
dc.publisherVirginia Techen
dc.rightsThis item is protected by copyright and/or related rights. Some uses of this item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s).en
dc.subjectModel Selection; Nonnegative Garrote Method; Mixture Experiments; Dynamic Coefficient; Fused Lasso; Group Lasso; Alternative Direction Method of Multipliers (ADMM); Integer Optimization; Expectation-Maximization (EM); LASSO.en
dc.titleContributions to Structured Variable Selection Towards Enhancing Model Interpretation and Computation Efficiencyen
dc.typeDissertationen
dc.contributor.departmentStatisticsen
dc.description.degreeDoctor of Philosophyen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.leveldoctoralen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.disciplineStatisticsen
dc.contributor.committeechairDeng, Xinweien
dc.contributor.committeememberFranck, Christopher Thomasen
dc.contributor.committeememberKim, Inyoungen
dc.contributor.committeememberJin, Ranen
dc.description.abstractgeneralThe advances in data-collecting technologies provides great opportunities to access large sample-size data sets with high dimensionality. Variable selection is an important procedure to extract useful knowledge from such complex data. While in many real-data applications, appropriate selection of variables should facilitate the model interpretation and computation efficiency. It is thus important to incorporate domain knowledge of underlying data generation mechanism to select key variables for improving the model performance. However, general variable selection techniques often do not take the underlying data generation mechanism into considerations. This thesis proposal aims to develop statistical modeling methodologies with a focus on the structured variable selection towards better model interpretation and computation efficiency. The proposed approaches have been applied to real-world problems to demonstrate their model performance.en


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record