Show simple item record

dc.contributor.authorChen, Lien_US
dc.date.accessioned2017-04-06T15:43:34Z
dc.date.available2017-04-06T15:43:34Z
dc.date.issued2011-08-24en_US
dc.identifier.otheretd-09062011-225149en_US
dc.identifier.urihttp://hdl.handle.net/10919/77188
dc.description.abstractRecent advances in biotechnologies have enabled multiplatform and large-scale quantitative measurements of biomedical events. The need to analyze the produced vast amount of imaging and genomic data stimulates various novel applications of statistical machine learning methods in many areas of biomedical research. The main objective is to assist biomedical investigators to better interpret, analyze, and understand the biomedical questions based on the acquired data. Given the computational challenges imposed by these high-dimensional and complex data, machine learning research finds its new opportunities and roles. In this dissertation thesis, we propose to develop, test and apply novel statistical machine learning methods to analyze the data mainly acquired by dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) and single nucleotide polymorphism (SNP) microarrays. The research work focuses on: (1) tissue-specific compartmental analysis for dynamic contrast-enhanced MR imaging of complex tumors; (2) computational Analysis for detecting DNA SNP interactions in genome-wide association studies. DCE-MRI provides a noninvasive method for evaluating tumor vasculature patterns based on contrast accumulation and washout. Compartmental analysis is a widely used mathematical tool to model dynamic imaging data and can provide accurate pharmacokinetics parameter estimates. However partial volume effect (PVE) existing in imaging data would have profound effect on the accuracy of pharmacokinetics studies. We therefore propose a convex analysis of mixtures (CAM) algorithm to explicitly eliminate PVE by expressing the kinetics in each pixel as a nonnegative combination of underlying compartments and subsequently identifying pure volume pixels at the corners of the clustered pixel time series scatter plot. The algorithm is supported by a series of newly proved theorems and additional noise filtering and normalization preprocessing. We demonstrate the principle and feasibility of the CAM approach together with compartmental modeling on realistic synthetic data, and compare the accuracy of parameter estimates obtained using CAM or other relevant techniques. Experimental results show a significant improvement in the accuracy of kinetic parameter estimation. We then apply the algorithm to real DCE-MRI data of breast cancer and observe improved pharmacokinetics parameter estimation that separates tumor tissue into sub-regions with differential tracer kinetics on a pixel-by-pixel basis and reveals biologically plausible tumor tissue heterogeneity patterns. This method has combined the advantages of multivariate clustering, convex optimization and compartmental modeling approaches. Interactions among genetic loci are believed to play an important role in disease risk. Due to the huge dimension of SNP data (normally several millions in genome-wide association studies), the combinatorial search and statistical evaluation required to detect multi-locus interactions constitute a significantly challenging computational task. While many approaches have been proposed for detecting such interactions, their relative performance remains largely unclear, due to the fact that performance was evaluated on different data sources, using different performance measures, and under different experimental protocols. Given the importance of detecting gene-gene interactions, a thorough evaluation of the performance and limitations of available methods, a theoretical analysis of the interaction effect and the genetic factors it depends on, and the development of more efficient methods are warranted. Therefore, we perform a computational analysis for detect interactions among SNPs. The contributions are four-fold: (1) developed simulation tools for evaluating performance of any technique designed to detect interactions among genetic variants in case-control studies; (2) used these tools to compare performance of five popular SNP detection methods; and (3) derived analytic relationships between power and the genetic factors, which not only support the experimental results but also gives a quantitative linkage between interaction effect and these factors; (4) based on the novel insights gained by comparative and theoretical analysis, developed an efficient statistically-principled method, namely the hybrid correlation-based association (HCA) to detect interacting SNPs. The HCA algorithm is based on three correlation-based statistics, which are designed to measure the strength of multi-locus interaction with three different interaction types, covering a large portion of possible interactions. Moreover, to maximize the detection power (sensitivity) while suppressing false positive rate (or retaining moderate specificity), we also devised a strategy to hybridize these three statistics in a case-by-case way. A heuristic search strategy is also proposed to largely decrease the computational complexity, especially for high-order interaction detection. We have tested HCA in both simulation study and real disease study. HCA and the selected peer methods were compared on a large number of simulated datasets, each including multiple sets of interaction models. The assessment criteria included several power measures, family-wise type I error rate, and computational complexity. The experimental results of HCA on the simulation data indicate its promising performance in terms of a good balance between detection accuracy and computational complexity. By running on multiple real datasets, HCA also replicates plausible biomarkers reported in previous literatures.
dc.language.isoen_USen_US
dc.publisherVirginia Techen_US
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to Virginia Tech or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectGWASen_US
dc.subjectBSSen_US
dc.subjectbioinformaticsen_US
dc.subjectmedical imagingen_US
dc.subjectsignal processingen_US
dc.subjectmachine learningen_US
dc.subjectMRIen_US
dc.subjectSNPen_US
dc.titleStatistical Machine Learning for Multi-platform Biomedical Data Analysisen_US
dc.typeDissertationen_US
dc.contributor.departmentElectrical and Computer Engineeringen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
dc.contributor.committeechairWang, Yue J.en_US
dc.contributor.committeememberWyatt, Christopher L.en_US
dc.contributor.committeememberZaghloul, Amir I.en_US
dc.contributor.committeememberClarke, Roberten_US
dc.contributor.committeememberLu, Chang-Tienen_US
dc.type.dcmitypeTexten_US
dc.identifier.sourceurlhttp://theses.lib.vt.edu/theses/available/etd-09062011-225149/en_US
dc.contributor.committeecochairXuan, Jianhua Jasonen_US
dc.date.sdate2011-09-06en_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record