Browsing by Author "ElHefnawi, Mahmoud M."
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Deep Learning for Biological ProblemsElmarakeby, Haitham Abdulrahman (Virginia Tech, 2017-06-14)The last decade has witnessed a tremendous increase in the amount of available biological data. Different technologies for measuring the genome, epigenome, transcriptome, proteome, metabolome, and microbiome in different organisms are producing large amounts of high-dimensional data every day. High-dimensional data provides unprecedented challenges and opportunities to gain a better understanding of biological systems. Unlike other data types, biological data imposes more constraints on researchers. Biologists are not only interested in accurate predictive models that capture complex input-output relationships, but they also seek a deep understanding of these models. In the last few years, deep models have achieved better performance in computational prediction tasks compared to other approaches. Deep models have been extensively used in processing natural data, such as images, text, and recently sound. However, application of deep models in biology is limited. Here, I propose to use deep models for output prediction, dimension reduction, and feature selection of biological data to get better interpretation and understanding of biological systems. I demonstrate the applicability of deep models in a domain that has a high and direct impact on health care. In this research, novel deep learning models have been introduced to solve pressing biological problems. The research shows that deep models can be used to automatically extract features from raw inputs without the need to manually craft features. Deep models are used to reduce the dimensionality of the input space, which resulted in faster training. Deep models are shown to have better performance and less variant output when compared to other shallow models even when an ensemble of shallow models is used. Deep models are shown to be able to process non-classical inputs such as sequences. Deep models are shown to be able to naturally process input sequences to automatically extract useful features.
- Identifying Splicing Regulatory Elements with de Bruijn GraphsBadr, Eman (Virginia Tech, 2015-05-12)Splicing regulatory elements (SREs) are short, degenerate sequences on pre-mRNA molecules that enhance or inhibit the splicing process via the binding of splicing factors, proteins that regulate the functioning of the spliceosome. Existing methods for identifying SREs in a genome are either experimental or computational. This work tackles the limitations in the current approaches for identifying SREs. It addresses two major computational problems, identifying variable length SREs utilizing a graph-based model with de Bruijn graphs and discovering co-occurring sets of SREs (combinatorial SREs) utilizing graph mining techniques. In addition, I studied and analyzed the effect of alternative splicing on tissue specificity in human. First, I have used a formalism based on de Bruijn graphs that combines genomic structure, word count enrichment analysis, and experimental evidence to identify SREs found in exons. In my approach, SREs are not restricted to a fixed length (i.e., k-mers, for a fixed k). Consequently, the predicted SREs are of different lengths. I identified 2001 putative exonic enhancers and 3080 putative exonic silencers for human genes, with lengths varying from 6 to 15 nucleotides. Many of the predicted SREs overlap with experimentally verified binding sites. My model provides a novel method to predict variable length putative regulatory elements computationally for further experimental investigation. Second, I developed CoSREM (Combinatorial SRE Miner), a graph mining algorithm for discovering combinatorial SREs. The goal is to identify sets of exonic splicing regulatory elements whether they are enhancers or silencers. Experimental evidence is incorporated through my graph-based model to increase the accuracy of the results. The identified SREs do not have a predefined length, and the algorithm is not limited to identifying only SRE pairs as are current approaches. I identified 37 SRE sets that include both enhancer and silencer elements in human genes. These results intersect with previous results, including some that are experimental. I also show that the SRE set GGGAGG and GAGGAC identified by CoSREM may play a role in exon skipping events in several tumor samples. Further, I report a genome-wide analysis to study alternative splicing on multiple human tissues, including brain, heart, liver, and muscle. I developed a pipeline to identify tissue-specific exons and hence tissue-specific SREs. Utilizing the publicly available RNA-Seq data set from the Human BodyMap project, I identified 28,100 tissue-specific exons across the four tissues. I identified 1929 exonic splicing enhancers with 99% overlap with previously published experimental and computational databases. A complicated enhancer regulatory network was revealed, where multiple enhancers were found across multiple tissues while some were found only in specific tissues. Putative combinatorial exonic enhancers and silencers were discovered as well, which may be responsible for exon inclusion or exclusion across tissues. Some of the enhancers are found to be co-occurring with multiple silencers and vice versa, which demonstrates a complicated relationship between tissue-specific enhancers and silencers.
- Machine Learning Approaches for Identifying microRNA Targets and Conserved Protein ComplexesTorkey, Hanaa A. (Virginia Tech, 2017-04-27)Much research has been directed toward understanding the roles of essential components in the cell, such as proteins, microRNAs, and genes. This dissertation focuses on two interesting problems in bioinformatics research: microRNA-target prediction and the identification of conserved protein complexes across species. We define the two problems and develop novel approaches for solving them. MicroRNAs are short non-coding RNAs that mediate gene expression. The goal is to predict microRNA targets. Existing methods rely on sequence features to predict targets. These features are neither sufficient nor necessary to identify functional target sites and ignore the cellular conditions in which microRNA and mRNA interact. We developed MicroTarget to predict microRNA-mRNA interactions using heterogeneous data sources. MicroTarget uses expression data to learn candidate target set for each microRNA. Then, sequence data is used to provide evidence of direct interactions and ranking the predicted targets. The predicted targets overlap with many of the experimentally validated ones. The results indicate that using expression data helps in predicting microRNA targets accurately. Protein complexes conserved across species specify processes that are core to cell machinery. Methods that have been devised to identify conserved complexes are severely limited by noise in PPI data. Behind PPIs, there are domains interacting physically to perform the necessary functions. Therefore, employing domains and domain interactions gives a better view of the protein interactions and functions. We developed novel strategy for local network alignment, DONA. DONA maps proteins into their domains and uses DDIs to improve the network alignment. We developed novel strategy for constructing an alignment graph and then uses this graph to discover the conserved sub-networks. DONA shows better performance in terms of the overlap with known protein complexes with higher precision and recall rates than existing methods. The result shows better semantic similarity computed with respect to both the biological process and the molecular function of the aligned sub-networks.
- Predicting the Interactions of Viral and Human ProteinsEid, Fatma Elzahraa Sobhy (Virginia Tech, 2017-05-03)The world has proven unprepared for deadly viral outbreaks. Designing antiviral drugs and strategies requires a firm understanding of the interactions taken place between the proteins of the virus and human proteins. The current computational models for predicting these interactions consider only single viruses for which extensive prior knowledge is available. The two prediction frameworks in this dissertation, DeNovo and DeNovo-Human, make it possible for the first time to predict the interactions between any viral protein and human proteins. They further helped to answer critical questions about the Zika virus. DeNovo utilizes concepts from virology, bioinformatics, and machine learning to make predictions for novel viruses possible. It pools protein-protein interactions (PPIs) from different viruses sharing the same host. It further introduces taxonomic partitioning to make the reported performance reflect the situation of predicting for a novel virus. DeNovo avoids the expected low accuracy of such a prediction by introducing a negative sampling scheme that is based on sequence similarity. DeNovo achieved accuracy up to 81% and 86% when predicting for a new viral species and a new viral family, respectively. This result is comparable to the best achieved previously in single virus-host and intra-species PPI prediction cases. DeNovo predicts PPIs of a novel virus without requiring known PPIs for it, but with a limitation on the number of human proteins it can make predictions against. The second framework, DeNovo-Human, relaxes this limitation by forcing in-network prediction and random sampling while keeping the pooling technique of DeNovo. The accuracy and AUC are both promising ($>85%$, and $>91%$ respectively). DeNovo-Human facilitates predicting the virus-human PPI network. To demonstrate how the two frameworks can enrich our knowledge about virus behavior, I use them to answer interesting questions about the Zika virus. The research questions examine how the Zika virus enters human cells, fights the innate immune system, and causes microcephaly. The answers obtained are well supported by recently published Zika virus studies.