Isoform-Specific Expression During Embryo Development in Arabidopsis and Soybean

TR Number

Date

2016-06-19

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Almost every precursor mRNA (pre-mRNA) in a eukaryotic organism undergoes splicing, in some cases resulting in the formation of more than one splice variant, a process called alternative splicing. RNA-Seq provides a major opportunity to capture the state of the transcriptome, which includes the detection of alternative spicing events. Alternative splicing is a highly regulated process occurring in a complex machinery called the spliceosome. In this dissertation, I focus on identification of different splice variants and splicing factors that are produced during Arabidopsis and soybean embryo development. I developed several data analysis pipelines for the detection and the functional characterization of active splice variants and splicing factors that arise during embryo development. The main goal of this dissertation was to identify transcriptional changes associated with specific stages of embryo development and infer possible associations between known regulatory genes and their targets. We identified several instances of exon skipping and intron retention as products of alternative splicing. The coding potential of the splice variants were evaluated using CodeWise. I developed CodeWise, a weighted support vector machine classifier to assess the coding potential of novel transcripts with respect to RNA secondary structure free energy, conserved domains, and sequence properties. We also examined the effect of alternative splicing on the domain composition of resulting protein isoforms. The majority of splice variants pairs encode proteins with identical domains or similar domains with truncation and in less than 10% of the cases alternative splicing results in gain or loss of a conserved domain. I constructed several possible regulatory networks that occur at specific stages of embryo development. In addition, in order to gain a better understanding of splicing regulation, we developed the concept of co-splicing networks, as a group of transcripts containing common RNA-binding motifs, which are co-expressed with a specific splicing factor. For this purpose, I developed a multi-stage analysis pipeline to integrate the co-expression networks with de novo RNA binding motif discovery at inferred splice sites, resulting in the identification of specific splicing factors and the corresponding cis-regulatory sequences that cause the production of splice variants. This approach resulted in the development of several novel hypotheses about the regulation of minor and major splicing in developing Arabidopsis embryos. In summary, this dissertation provides a comprehensive view of splicing regulation in Arabidopsis and soybean embryo development using computational analysis.

Description

Keywords

Alternative splicing, data analysis, bioinformatics, transcriptomics, RNA-Seq, noncoding RNAs, Machine learning, computational biology

Citation