Multi-omics Data Integration for Identifying Disease Specific Biological Pathways

Lu, Yingzhou

Multi-omics Data Integration for Identifying Disease Specific Biological Pathways

dc.contributor.author	Lu, Yingzhou	en
dc.contributor.committeechair	Wang, Yue J.	en
dc.contributor.committeemember	Chantem, Thidapat	en
dc.contributor.committeemember	Yu, Guoqiang	en
dc.contributor.department	Electrical and Computer Engineering	en
dc.date.accessioned	2018-06-06T08:01:47Z	en
dc.date.available	2018-06-06T08:01:47Z	en
dc.date.issued	2018-06-05	en
dc.description.abstract	Pathway analysis is an important task for gaining novel insights into the molecular architecture of many complex diseases. With the advancement of new sequencing technologies, a large amount of quantitative gene expression data have been continuously acquired. The springing up omics data sets such as proteomics has facilitated the investigation on disease relevant pathways. Although much work has previously been done to explore the single omics data, little work has been reported using multi-omics data integration, mainly due to methodological and technological limitations. While a single omic data can provide useful information about the underlying biological processes, multi-omics data integration would be much more comprehensive about the cause-effect processes responsible for diseases and their subtypes. This project investigates the combination of miRNAseq, proteomics, and RNAseq data on seven types of muscular dystrophies and control group. These unique multi-omics data sets provide us with the opportunity to identify disease-specific and most relevant biological pathways. We first perform t-test and OVEPUG test separately to define the differential expressed genes in protein and mRNA data sets. In multi-omics data sets, miRNA also plays a significant role in muscle development by regulating their target genes in mRNA dataset. To exploit the relationship between miRNA and gene expression, we consult with the commonly used gene library - Targetscan to collect all paired miRNA-mRNA and miRNA-protein co-expression pairs. Next, by conducting statistical analysis such as Pearson's correlation coefficient or t-test, we measured the biologically expected correlation of each gene with its upstream miRNAs and identify those showing negative correlation between the aforementioned miRNA-mRNA and miRNA-protein pairs. Furthermore, we identify and assess the most relevant disease-specific pathways by inputting the differential expressed genes and negative correlated genes into the gene-set libraries respectively, and further characterize these prioritized marker subsets using IPA (Ingenuity Pathway Analysis) or KEGG. We will then use Fisher method to combine all these p-values derived from separate gene sets into a joint significance test assessing common pathway relevance. In conclusion, we will find all negative correlated paired miRNA-mRNA and miRNA-protein, and identifying several pathophysiological pathways related to muscular dystrophies by gene set enrichment analysis. This novel multi-omics data integration study and subsequent pathway identification will shed new light on pathophysiological processes in muscular dystrophies and improve our understanding on the molecular pathophysiology of muscle disorders, preventing and treating disease, and make people become healthier in the long term.	en
dc.description.abstractgeneral	Identification of biological pathways play a central role in understanding both human health and diseases. A biological pathway is a series of information processing steps via interactions among molecules in a cell that partially determines the phenotype of a cell. Specifically, identifying disease-specific pathway will guide focused studies on complex diseases, thus potentially improve the prevention and treatment of diseases. To identify disease-specific pathways, it is crucial to develop computational methods and statistical tests that can integrate multi-omics (multiple omes such as genome, proteome, etc) data. Compared to single omics data, multi-omics data will help gaining a more comprehensive understanding on the molecular architecture of disease processes. In this thesis, we propose a novel data analytics pipeline for multi-omics data integration. We test and apply our method on/to the real proteomics data sets on muscular dystrophy subtypes, and identify several biologically plausible pathways related to muscular dystrophies.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:15264	en
dc.identifier.uri	http://hdl.handle.net/10919/83467	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Biological Pathways	en
dc.subject	Multi-omics Data Integration	en
dc.subject	Muscular Dystrophy	en
dc.subject	Statistical significance test	en
dc.subject	Gene set enrichment analysis	en
dc.title	Multi-omics Data Integration for Identifying Disease Specific Biological Pathways	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lu_Y_T_2018.pdf
Size:: 1.27 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses