Show simple item record

dc.contributor.authorZhang, Angangen
dc.date.accessioned2017-06-09T06:00:15Zen
dc.date.available2017-06-09T06:00:15Zen
dc.date.issued2015-12-16en
dc.identifier.othervt_gsexam:6944en
dc.identifier.urihttp://hdl.handle.net/10919/77958en
dc.description.abstractIn statistical methodology of analyzing data, two of the most commonly used techniques are classification and regression modeling. As scientific technology progresses rapidly, complex data often occurs and requires novel classification and regression modeling methodologies according to the data structure. In this dissertation, I mainly focus on developing a few approaches for analyzing the data with complex structures. Classification problems commonly occur in many areas such as biomedical, marketing, sociology and image recognition. Among various classification methods, linear classifiers have been widely used because of computational advantages, ease of implementation and interpretation compared with non-linear classifiers. Specifically, linear discriminant analysis (LDA) is one of the most important methods in the family of linear classifiers. For high dimensional data with number of variables p larger than the number of observations n occurs more frequently, it calls for advanced classification techniques. In Chapter 2, I proposed a novel sparse LDA method which generalizes LDA through a regularized approach for the two-class classification problem. The proposed method can obtain an accurate classification accuracy with attractive computation, which is suitable for high dimensional data with p>n. In Chapter 3, I deal with the classification when the data complexity lies in the non-random missing responses in the training data set. Appropriate classification method needs to be developed accordingly. Specifically, I considered the "reject inference problem'' for the application of fraud detection for online business. For online business, to prevent fraud transactions, suspicious transactions are rejected with unknown fraud status, yielding a training data with selective missing response. A two-stage modeling approach using logistic regression is proposed to enhance the efficiency and accuracy of fraud detection. Besides the classification problem, data from designed experiments in scientific areas often have complex structures. Many experiments are conducted with multiple variance sources. To increase the accuracy of the statistical modeling, the model need to be able to accommodate more than one error terms. In Chapter 4, I propose a variance component mixed model for a nano material experiment data to address the between group, within group and within subject variance components into a single model. To adjust possible systematic error introduced during the experiment, adjustment terms can be added. Specifically a group adaptive forward and backward selection (GFoBa) procedure is designed to select the significant adjustment terms.en
dc.format.mediumETDen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectA/B testingen
dc.subjectfraud detectionen
dc.subjectlinear classifieren
dc.subjectmisclassification erroren
dc.subjectnet profit valueen
dc.subjectreject inferenceen
dc.subjectsparse linear discriminant analysisen
dc.subjecttwo-class classificationen
dc.subjectvariance component mixed model.en
dc.titleSome Advances in Classifying and Modeling Complex Dataen
dc.typeDissertationen
dc.contributor.departmentStatisticsen
dc.description.degreePh. D.en
thesis.degree.namePh. D.en
thesis.degree.leveldoctoralen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.disciplineStatisticsen
dc.contributor.committeechairDeng, Xinweien
dc.contributor.committeememberKim, Inyoungen
dc.contributor.committeememberSmith, Eric P.en
dc.contributor.committeememberHong, Yilien


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record