Bayesian Modeling for Isoform Identification and Phenotype-specific Transcript Assembly

dc.contributor.authorShi, Xuen
dc.contributor.committeechairXuan, Jianhua Jasonen
dc.contributor.committeememberLu, Chang-Tienen
dc.contributor.committeememberBaumann, William T.en
dc.contributor.committeememberWang, Yue J.en
dc.contributor.committeememberAbbott, A. Lynnen
dc.contributor.departmentElectrical Engineeringen
dc.date.accessioned2017-10-25T08:00:14Zen
dc.date.available2017-10-25T08:00:14Zen
dc.date.issued2017-10-24en
dc.description.abstractThe rapid development of biotechnology has enabled researchers to collect high-throughput data for studying various biological processes at the genomic level, transcriptomic level, and proteomic level. Due to the large noise in the data and the high complexity of diseases (such as cancer), it is a challenging task for researchers to extract biologically meaningful information that can help reveal the underlying molecular mechanisms. The challenges call for more efforts in developing efficient and effective computational methods to analyze the data at different levels so as to understand the biological systems in different aspects. In this dissertation research, we have developed novel Bayesian approaches to infer alternative splicing mechanisms in biological systems using RNA sequencing data. Specifically, we focus on two research topics in this dissertation: isoform identification and phenotype-specific transcript assembly. For isoform identification, we develop a computational approach, SparseIso, to jointly model the existence and abundance of isoforms in a Bayesian framework. A spike-and-slab prior is incorporated into the model to enforce the sparsity of expressed isoforms. A Gibbs sampler is developed to sample the existence and abundance of isoforms iteratively. For transcript assembly, we develop a Bayesian approach, IntAPT, to assemble phenotype-specific transcripts from multiple RNA sequencing profiles. A two-layer Bayesian framework is used to model the existence of phenotype-specific transcripts and the transcript abundance in individual samples. Based on the hierarchical Bayesian model, a Gibbs sampling algorithm is developed to estimate the joint posterior distribution for phenotype-specific transcript assembly. The performances of our proposed methods are evaluated with simulation data, compared with existing methods and benchmarked with real cell line data. We then apply our methods on breast cancer data to identify biologically meaningful splicing mechanisms associated with breast cancer. For the further work, we will extend our methods for de novo transcript assembly to identify novel isoforms in biological systems; we will incorporate isoform-specific networks into our methods to better understand splicing mechanisms in biological systems.en
dc.description.abstractgeneralThe next-generation sequencing technology has significantly improved the resolution of the biomedical research at the genomic level and transcriptomic level. Due to the large noise in the data and the high complexity of diseases (such as cancer), it is a challenging task for researchers to extract biologically meaningful information that can help reveal the underlying molecular mechanisms. In this dissertation, we have developed two novel Bayesian approaches to infer alternative splicing mechanisms in biological systems using RNA sequencing data. We have demonstrated the advantages of our proposed approaches over existing methods on both simulation data and real cell line data. Furthermore, the application of our methods on real breast cancer data and glioblastoma tissue data has further shown the efficacy of our methods in real biological applications.en
dc.description.degreePh. D.en
dc.format.mediumETDen
dc.identifier.othervt_gsexam:12985en
dc.identifier.urihttp://hdl.handle.net/10919/79772en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectTranscriptome Assemblyen
dc.subjectRNA-seq Data Analysisen
dc.subjectBayesian Inferenceen
dc.subjectGibbs Samplingen
dc.subjectMarkov Chain Monte Carlo (MCMC)en
dc.titleBayesian Modeling for Isoform Identification and Phenotype-specific Transcript Assemblyen
dc.typeDissertationen
thesis.degree.disciplineElectrical Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Shi_X_D_2017.pdf
Size:
4.69 MB
Format:
Adobe Portable Document Format