Bayesian Modeling for Isoform Identification and Phenotype-specific Transcript Assembly

Shi, Xu

Bayesian Modeling for Isoform Identification and Phenotype-specific Transcript Assembly

dc.contributor.author	Shi, Xu	en
dc.contributor.committeechair	Xuan, Jianhua Jason	en
dc.contributor.committeemember	Lu, Chang-Tien	en
dc.contributor.committeemember	Baumann, William T.	en
dc.contributor.committeemember	Wang, Yue J.	en
dc.contributor.committeemember	Abbott, A. Lynn	en
dc.contributor.department	Electrical Engineering	en
dc.date.accessioned	2017-10-25T08:00:14Z	en
dc.date.available	2017-10-25T08:00:14Z	en
dc.date.issued	2017-10-24	en
dc.description.abstract	The rapid development of biotechnology has enabled researchers to collect high-throughput data for studying various biological processes at the genomic level, transcriptomic level, and proteomic level. Due to the large noise in the data and the high complexity of diseases (such as cancer), it is a challenging task for researchers to extract biologically meaningful information that can help reveal the underlying molecular mechanisms. The challenges call for more efforts in developing efficient and effective computational methods to analyze the data at different levels so as to understand the biological systems in different aspects. In this dissertation research, we have developed novel Bayesian approaches to infer alternative splicing mechanisms in biological systems using RNA sequencing data. Specifically, we focus on two research topics in this dissertation: isoform identification and phenotype-specific transcript assembly. For isoform identification, we develop a computational approach, SparseIso, to jointly model the existence and abundance of isoforms in a Bayesian framework. A spike-and-slab prior is incorporated into the model to enforce the sparsity of expressed isoforms. A Gibbs sampler is developed to sample the existence and abundance of isoforms iteratively. For transcript assembly, we develop a Bayesian approach, IntAPT, to assemble phenotype-specific transcripts from multiple RNA sequencing profiles. A two-layer Bayesian framework is used to model the existence of phenotype-specific transcripts and the transcript abundance in individual samples. Based on the hierarchical Bayesian model, a Gibbs sampling algorithm is developed to estimate the joint posterior distribution for phenotype-specific transcript assembly. The performances of our proposed methods are evaluated with simulation data, compared with existing methods and benchmarked with real cell line data. We then apply our methods on breast cancer data to identify biologically meaningful splicing mechanisms associated with breast cancer. For the further work, we will extend our methods for de novo transcript assembly to identify novel isoforms in biological systems; we will incorporate isoform-specific networks into our methods to better understand splicing mechanisms in biological systems.	en
dc.description.abstractgeneral	The next-generation sequencing technology has significantly improved the resolution of the biomedical research at the genomic level and transcriptomic level. Due to the large noise in the data and the high complexity of diseases (such as cancer), it is a challenging task for researchers to extract biologically meaningful information that can help reveal the underlying molecular mechanisms. In this dissertation, we have developed two novel Bayesian approaches to infer alternative splicing mechanisms in biological systems using RNA sequencing data. We have demonstrated the advantages of our proposed approaches over existing methods on both simulation data and real cell line data. Furthermore, the application of our methods on real breast cancer data and glioblastoma tissue data has further shown the efficacy of our methods in real biological applications.	en
dc.description.degree	Ph. D.	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:12985	en
dc.identifier.uri	http://hdl.handle.net/10919/79772	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Transcriptome Assembly	en
dc.subject	RNA-seq Data Analysis	en
dc.subject	Bayesian Inference	en
dc.subject	Gibbs Sampling	en
dc.subject	Markov Chain Monte Carlo (MCMC)	en
dc.title	Bayesian Modeling for Isoform Identification and Phenotype-specific Transcript Assembly	en
dc.type	Dissertation	en
thesis.degree.discipline	Electrical Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Ph. D.	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Shi_X_D_2017.pdf
Size:: 4.69 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations