Browsing by Author "Mendes, Pedro J. P."
Now showing 1 - 14 of 14
Results Per Page
Sort Options
- An Algebraic Approach to Reverse Engineering with an Application to Biochemical NetworksStigler, Brandilyn Suzanne (Virginia Tech, 2005-08-04)One goal of systems biology is to predict and modify the behavior of biological networks by accurately monitoring and modeling their responses to certain types of perturbations. The construction of mathematical models based on observation of these responses, referred to as reverse engineering, is an important step in elucidating the structure and dynamics of such networks. Continuous models, described by systems of differential equations, have been used to reverse engineer biochemical networks. Of increasing interest is the use of discrete models, which may provide a conceptual description of the network. In this dissertation we introduce a discrete modeling approach, rooted in computational algebra, to reverse-engineer networks from experimental time series data. The algebraic method uses algorithmic tools, including Groebner-basis techniques, to build the set of all discrete models that fit time series data and to select minimal models from this set. The models used in this work are discrete-time finite dynamical systems, which, when defined over a finite field, are described by systems of polynomial functions. We present novel reverse-engineering algorithms for discrete models, where each algorithm is suitable for different amounts and types of data. We demonstrate the effectiveness of the algorithms on simulated networks and conclude with a description of an ongoing project to reverse-engineer a real gene regulatory network in yeast.
- Algorithms for modeling and simulation of biological systems; applications to gene regulatory networksVera-Licona, Martha Paola (Virginia Tech, 2007-06-06)Systems biology is an emergent field focused on developing a system-level understanding of biological systems. In the last decade advances in genomics, transcriptomics and proteomics have gathered a remarkable amount data enabling the possibility of a system-level analysis to be grounded at a molecular level. The reverse-engineering of biochemical networks from experimental data has become a central focus in systems biology. A variety of methods have been proposed for the study and identification of the system's structure and/or dynamics. The objective of this dissertation is to introduce and propose solutions to some of the challenges inherent in reverse-engineering of biological systems. First, previously developed reverse engineering algorithms are studied and compared using data from a simulated network. This study draws attention to the necessity for a uniform benchmark that enables an ob jective comparison and performance evaluation of reverse engineering methods. Since several reverse-engineering algorithms require discrete data as input (e.g. dynamic Bayesian network methods, Boolean networks), discretization methods are being used for this purpose. Through a comparison of the performance of two network inference algorithms that use discrete data (from several different discretization methods) in this work, it has been shown that data discretization is an important step in applying network inference methods to experimental data. Next, a reverse-engineering algorithm is proposed within the framework of polynomial dynamical systems over finite fields. This algorithm is built for the identification of the underlying network structure and dynamics; it uses as input gene expression data and, when available, a priori knowledge of the system. An evolutionary algorithm is used as the heuristic search method for an exploration of the solution space. Computational algebra tools delimit the search space, enabling also a description of model complexity. The performance and robustness of the algorithm are explored via an artificial network of the segment polarity genes in the D. melanogaster. Once a mathematical model has been built, it can be used to run simulations of the biological system under study. Comparison of simulated dynamics with experimental measurements can help refine the model or provide insight into qualitative properties of the systems dynamical behavior. Within this work, we propose an efficient algorithm to describe the phase space, in particular to compute the number and length of all limit cycles of linear systems over a general finite field. This research has been partially supported by NIH Grant Nr. RO1GM068947-01.
- Causal Gene Network Inference from Genetical Genomics Experiments via Structural Equation ModelingLiu, Bing (Virginia Tech, 2006-09-11)The goal of this research is to construct causal gene networks for genetical genomics experiments using expression Quantitative Trait Loci (eQTL) mapping and Structural Equation Modeling (SEM). Unlike Bayesian Networks, this approach is able to construct cyclic networks, while cyclic relationships are expected to be common in gene networks. Reconstruction of gene networks provides important knowledge about the molecular basis of complex human diseases and generally about living systems. In genetical genomics, a segregating population is expression profiled and DNA marker genotyped. An Encompassing Directed Network (EDN) of causal regulatory relationships among genes can be constructed with eQTL mapping and selection of candidate causal regulators. Several eQTL mapping approaches and local structural models were evaluated in their ability to construct an EDN. The edges in an EDN correspond to either direct or indirect causal relationships, and the EDN is likely to contain cycles or feedback loops. We implemented SEM with genetics algorithms to produce sub-models of the EDN containing fewer edges and being well supported by the data. The EDN construction and sparsification methods were tested on a yeast genetical genomics data set, as well as the simulated data. For the simulated networks, the SEM approach has an average detection power of around ninety percent, and an average false discovery rate of around ten percent.
- Data integration and visualization for systems biology dataCheng, Hui (Virginia Tech, 2010-10-27)Systems biology aims to understand cellular behavior in terms of the spatiotemporal interactions among cellular components, such as genes, proteins and metabolites. Comprehensive visualization tools for exploring multivariate data are needed to gain insight into the physiological processes reflected in these molecular profiles. Data fusion methods are required to integratively study high-throughput transcriptomics, metabolomics and proteomics data combined before systems biology can live up to its potential. In this work I explored mathematical and statistical methods and visualization tools to resolve the prominent issues in the nature of systems biology data fusion and to gain insight into these comprehensive data. In order to choose and apply multivariate methods, it is important to know the distribution of the experimental data. Chi square Q-Q plot and violin plot were applied to all M. truncatula data and V. vinifera data, and found most distributions are right-skewed (Chapter 2). The biplot display provides an effective tool for reducing the dimensionality of the systems biological data and displaying the molecules and time points jointly on the same plot. Biplot of M. truncatula data revealed the overall system behavior, including unidentified compounds of interest and the dynamics of the highly responsive molecules (Chapter 3). The phase spectrum computed from the Fast Fourier transform of the time course data has been found to play more important roles than amplitude in the signal reconstruction. Phase spectrum analyses on in silico data created with two artificial biochemical networks, the Claytor model and the AB2 model proved that phase spectrum is indeed an effective tool in system biological data fusion despite the data heterogeneity (Chapter 4). The difference between data integration and data fusion are further discussed. Biplot analysis of scaled data were applied to integrate transcriptome, metabolome and proteome data from the V. vinifera project. Phase spectrum combined with k-means clustering was used in integrative analyses of transcriptome and metabolome of the M. truncatula yeast elicitation data and of transcriptome, metabolome and proteome of V. vinifera salinity stress data. The phase spectrum analysis was compared with the biplot display as effective tools in data fusion (Chapter 5). The results suggest that phase spectrum may perform better than the biplot. This work was funded by the National Science Foundation Plant Genome Program, grant DBI-0109732, and by the Virginia Bioinformatics Institute.
- Functional genomics through metabolite profiling and gene expression analysis in Arabidopsis thalianaCortes Bermudez, Diego Fernando (Virginia Tech, 2008-07-25)In the post-genomic era, one of the most important goals for the community of plant biologists is to take full advantage of the knowledge generated by the Arabidopsis thaliana genome project, and to employ state-of-the-art functional genomics techniques to assign function to each gene. This will be achieved through a complete understanding of what all cellular components do, and how they interact with one another to produce a phenotype. Among the proteins encoded by the Arabidopsis genome are 24 related carboxyl methyltransferases that belong to the SABATH family. Several of the SABATH methyltransferases convert plant hormones, like jasmonic acid, indole-3-acetic acid, salicylic acid, gibberellins, and other plant constituents into methyl esters, thereby regulating the biological activity of these molecules and, consequently, myriad important physiological processes. Our research aims to decipher the function of proteins belonging to the SABATH family by applying a combination of genomics tools, including genome-wide expression analysis and gas-chromatography coupled with mass spectrometry-based metabolite profiling. Our results, combined with available biochemical information, provide a better understanding of the physiological role of SABATH methyltransferases, further insights into secondary plant metabolism and deeper knowledge of the consequences of modulating the expression of SABATH methyltransferases, both at the genome-wide expression and metabolite levels.
- In silico cell biology and biochemistry: a systems biology approachCamacho, Diogo Mayo (Virginia Tech, 2007-06-01)In the post-"omic" era the analysis of high-throughput data is regarded as one of the major challenges faced by researchers. One focus of this data analysis is uncovering biological network topologies and dynamics. It is believed that this kind of research will allow the development of new mathematical models of biological systems as well as aid in the improvement of already existing ones. The work that is presented in this dissertation addresses the problem of the analysis of highly complex data sets with the aim of developing a methodology that will enable the reconstruction of a biological network from time series data through an iterative process. The first part of this dissertation relates to the analysis of existing methodologies that aim at inferring network structures from experimental data. This spans the use of statistical tools such as correlations analysis (presented in Chapter 2) to more complex mathematical frameworks (presented in Chapter 3). A novel methodology that focuses on the inference of biological networks from time series data by least squares fitting will then be introduced. Using a set of carefully designed inference rules one can gain important information about the system which can aid in the inference process. The application of the method to a data set from the response of the yeast Saccharomyces cerevisiae to cumene hydroperoxide is explored in Chapter 5. The results show that this method can be used to generate a coarse-level mathematical model of the biological system at hand. Possible developments of this method are discussed in Chapter 6.
- Mathematical Models of Some Signaling Pathways Regulating Cell Survival and DeathZhang, Tongli (Virginia Tech, 2008-10-23)In a multi-cellular organism, cells constantly receive signals on their internal condition and surrounding environment. In response to various signals, cells proliferate, move around or even undergo suicide. The signal-response is controlled by complex molecular machinery, understanding of which is an important goal of basic molecular biological research. Such understanding is also valuable for clinical application, since lethal diseases like cancer show maladaptive responses to growth-regulating signals. Because the multiple feedbacks in the molecular regulatory machinery obscure cause-effect relations, it is hard to understand these control systems by intuition alone. Here we translate the molecular interactions into differential equations and recapture the cellular physiological properties with the help of numerical simulations and non-linear dynamical tools. The models address the physiological features of programmed cell death, the cell fate decision by p53 and the dynamics of the NF-?B control system. These models identify key molecular interactions responsible for the observed physiological properties, and they generate experimentally testable predictions to validate the assumptions made in the models.
- Microarray data analysis methods and their applications to gene expression data analysis for Saccharomyces cerevisiae under oxidative stressSha, Wei (Virginia Tech, 2006-05-12)Oxidative stress is a harmful condition in a cell, tissue, or organ, caused by an imbalance between reactive oxygen species or other oxidants and the capacity of antioxidant defense systems to remove them. These oxidants cause wide-ranging damage to macromolecules, including proteins, lipids, DNA and carbohydrates. Oxidative stress is an important pathophysiologic component of a number of diseases, such as Alzheimer's disease, diabetes and certain cancers. Cells contain effective defense mechanisms to respond to oxidative stress. Despite much accumulated knowledge about these responses, their kinetics, especially the kinetics of early responses is still not clearly understood. The Yap1 transcription factor is crucial for the normal response to a variety of stress conditions including oxidative stress. Previous studies on Yap1 regulation started to measure gene expression profile at least 20 minutes after the induction of oxidative stress. Genes and pathways regulated by Yap1 in early oxidative stress response (within 20 minutes) were not identified in these studies. Here we study the kinetics of early oxidative stress response induced by the cumene hydroperoxide (CHP) in Saccharomyces cerevisiae wild type and yap1 mutant. Gene expression profiles after exposure to CHP were obtained in controlled conditions using Affymetrix Yeast Genome S98 arrays. The oxidative stress response was measured at 8 time points along 120 minutes after the addition of CHP, with the earliest time point at 3 minute after the exposure. Statistical analysis methods, including ANOVA, k-means clustering analysis, and pathway analysis were used to analyze the data. The results from this study provide a dynamic resolution of the oxidative stress responses in S. cerevisiae, and contribute to a richer understanding of the antioxidant defense systems. It also provides a global view of the roles that Yap1 plays under normal and oxidative stress conditions.
- Polynomial Models for Systems Biology: Data Discretization and Term Order Effect on DynamicsDimitrova, Elena Stanimirova (Virginia Tech, 2006-08-01)Systems biology aims at system-level understanding of biological systems, in particular cellular networks. The milestones of this understanding are knowledge of the structure of the system, understanding of its dynamics, effective control methods, and powerful prediction capability. The complexity of biological systems makes it inevitable to consider mathematical modeling in order to achieve these goals. The enormous accumulation of experimental data representing the activities of the living cell has triggered an increasing interest in the reverse engineering of biological networks from data. In particular, construction of discrete models for reverse engineering of biological networks is receiving attention, with the goal of providing a coarse-grained description of such networks. In this dissertation we consider the modeling framework of polynomial dynamical systems over finite fields constructed from experimental data. We present and propose solutions to two problems inherent in this modeling method: the necessity of appropriate discretization of the data and the selection of a particular polynomial model from the set of all models that fit the data. Data discretization, also known as binning, is a crucial issue for the construction of discrete models of biological networks. Experimental data are however usually continuous, or, at least, represented by computer floating point numbers. A major challenge in discretizing biological data, such as those collected through microarray experiments, is the typically small samples size. Many methods for discretization are not applicable due to the insufficient amount of data. The method proposed in this work is a first attempt to develop a discretization tool that takes into consideration the issues and limitations that are inherent in short data time courses. Our focus is on the two characteristics that any discretization method should possess in order to be used for dynamic modeling: preservation of dynamics and information content and inhibition of noise. Given a set of data points, of particular importance in the construction of polynomial models for the reverse engineering of biological networks is the collection of all polynomials that vanish on this set of points, the so-called ideal of points. Polynomial ideals can be represented through a special finite generating set, known as Gröbner basis, that possesses some desirable properties. For a given ideal, however, the Gröbner basis may not be unique since its computation depends on the choice of leading terms for the multivariate polynomials in the ideal. The correspondence between data points and uniqueness of Gröbner bases is studied in this dissertation. More specifically, an algorithm is developed for finding all minimal sets of points that, added to the given set, have a corresponding ideal of points with a unique Gröbner basis. This question is of interest in itself but the main motivation for studying it was its relevance to the construction of polynomial dynamical systems. This research has been partially supported by NIH Grant Nr. RO1GM068947-01.
- Proteome Profiling of Saccharomyces cerevisae stress response to Cumene Hydroperoxide (CHP)Tuli, Leepika (Virginia Tech, 2008-07-25)Oxidative stress, described as the state of disturbed intracellular redox balance, has been associated with several human conditions including ageing, apoptosis, cancer, autoimmune and neuro-degenerative diseases. Stress studies have shown that reactive oxygen species (ROS) and reactive nitrogen species (RNS) along with its intermediates can attack essential cell targets such as: DNA, proteins, lipids and carbohydrates, leaving behind dysfunctional biologic molecules. In effect, a cell's primary response is to involve several defense mechanisms that are under a complex and intricate regulatory control to repair any damages that may have occurred. Although several stress studies have been conducted in the past that have approached this biologically complex process step by step, application of a Systems Biology towards a comprehensive understanding is still emerging. The current objective of this project is to identify proteins that change in response to cumene hydroperxoide (CHP) treatment and in parallel make an attempt to uncover events and processes that are a part of CHP-induced oxidative stress response. From a systems biology viewpoint, the Yeast Oxidative Stress project will monitor response at three different levels: transcriptomics, proteomics and metabolomics, with dynamic changes being measured from 3 to 120 min after CHP addition. Data collected from the different levels will be integrated to accomplish a holistic viewpoint of stress response in the given system and to develop mathematical tools for modeling biochemical networks. Saccharomyces cerevisiae was chosen as a model, based on its availability of a completely mapped genome sequence with a collection of null mutants that was relevant to our fundamental research of stress response mechanism. Yeast, a simple unicellular eukaryote has been extensively used for applied studies and has proven to be indispensable for stress research. Information derived from this project can reveal response mechanisms used by higher eukaryotes, especially if via analogous signaling cascades that are comparable between organisms. Current research investigates an optimal workflow for generating 2D gel based protein expression data and identifying proteins that are induced by cumene hydroperoxide treatment. A non-targeted protein profiling followed by a 2-way ANOVA analysis provided a list of proteins that differ significantly between treatments. Protein identification provided relevant information on which proteins are affected by CHP induced stress response, including posttranslational modifications of peroxiredoxins. Redox active protein, Ahp1, was regulated post-translationally with sulfonic acid modification observed for its active Cys(62) residue.
- Reconstruction of metabolic pathways by the exploration of gene expression data with factor analysisHenderson, David Allen (Virginia Tech, 2001-12-14)Microarray gene expression data for thousands of genes in many organisms is quickly becoming available. The information this data can provide the experimental biologist is powerful. This data may provide information clarifying the regulatory linkages between genes within a single metabolic pathway, or alternative pathway routes under different environmental conditions, or provide information leading to the identification of genes for selection in animal and plant genetic improvement programs or targets for drug therapy. Many analysis methods to unlock this information have been both proposed and utilized, but not evaluated under known conditions (e.g. simulations). Within this dissertation, an analysis method is proposed and evaluated for identifying independent and linked metabolic pathways and compared to a popular analysis method. Also, this same analysis method is investigated for its ability to identify regulatory linkages within a single metabolic pathway. Lastly, a variant of this same method is used to analyze time series microarray data. In Chapter 2, Factor Analysis is shown to identify and group genes according to membership within independent metabolic pathways for steady state microarray gene expression data. There were cases, however, where the allocation of all genes to a pathway was not complete. A competing analysis method, Hierarchical Clustering, was shown to perform poorly when negatively correlated genes are assumed unrelated, but performance improved when the sign of the correlation coefficient was ignored. In Chapter 3, Factor Analysis is shown to identify regulatory relationships between genes within a single metabolic pathway. These relationships can be explained using metabolic control analysis, along with external knowledge of the pathway structure and activation and inhibition of transcription regulation. In this chapter, it is also shown why factor analysis can group genes by metabolic pathway using metabolic control analysis. In Chapter 4, a Bayesian exploratory factor analysis is developed and used to analyze microarray gene expression data. This Bayesian model differs from a previous implementation in that it is purely exploratory and can be used with vague or uninformative priors. Additionally, 95% highest posterior density regions can be calculated for each factor loading to aid in interpretation of factor loadings. A correlated Bayesian exploratory factor analysis model is also developed in this chapter for application to time series microarray gene expression data. While this method is appropriate for the analysis of correlated observation vectors, it fails to group genes by metabolic pathway for simulated time series data.
- Statistical Analysis of Gene Expression Profile: Transcription Network Inference and Sample ClassificationBing, Nan (Virginia Tech, 2004-04-14)The copious information generated from transcriptomes gives us an opportunity to learn biological processes as integrated systems; however, due to numerous sources of variation, high dimensions of data structure, various levels of data quality, and different formats of the inputs, dissecting and interpreting such data presents daunting challenges to scientists. The goal of this research is to provide improved and new statistical tools for analyzing transcriptomes data to identify gene expression patterns for classifying samples, to discover regulatory gene networks using natural genetic perturbations, to develop statistical methods for model fitting and comparison of biochemical networks, and eventually to advance our capability to understand the principles of biological processes at the system level.
- Systems Biology in an Imperfect World: Modeling Biological Systems with Incomplete InformationPokrzywa, Revonda Maria (Virginia Tech, 2009-10-08)One of the primary goals of systems biology is to understand the complex underlying network of biochemical interactions which allow an organism to respond to environmental stimuli. Models of these biological interactions serve as a tool to both codify current understanding of these interactions as well as a starting point for scientific discovery. Due to the massive amount of information which is required for this modeling process, systems biology studies must often attempt to construct models which reflect the whole of the system while having access to only partial information. In some cases, the missing information will not have a confounding effect on the accuracy of the model. In other cases, there is the danger that this missing information will make the model useless. The focus of this thesis is to study the effect which missing information has on systems level studies within several different contexts. Specifically, we study two contexts : when the missing information takes the role of incomplete molecular interaction network knowledge and when it takes the role of unknown kinetic rate laws. These studies yield interesting results. We show that when metabolism is isolated from gene expression, the effects are not limited to those reactions under strong control by gene expression. Thus, incomplete understanding of molecular interaction networks may have unexpected effects on the resulting analysis. We also reveal that under the conditions of the current study, mass action was shown to be the superior substitute when the true rate equations for a biological system are unknown. In addition to studying the effect of missing information in the aforementioned contexts, we propose a method for limiting the parameter search space of biochemical systems. Even in ideal scenarios where both the molecular interaction network and the relevant kinetic rate equations are known, obtaining appropriate estimates for the unknown system parameters can be challenging. By employing a method which limits the parameter search space, we are able to acquire estimates for parameter values which are much closer to the true values than those which could be obtained otherwise.
- The Unintended Consequences of Implementing Information Technology: Understanding the Impact of Misalignment between Mental Models and Organizational StructureSallada, Michael (Virginia Tech, 2006-12-06)In this research, I study the unintended consequences of implementing information technology. Understanding the causes of these unintended effects is important because information technology is ubiquitous in the modern economy. I used three research protocols to study this phenomenon. The first approach was a literature review to explore and understand what was already written on the subject of implementing information technology. The second approach was an experiment using the beer distribution game to study the implementation of information technology. The third approach I used was a case study in which I used system dynamics modeling to study the information technology in an engineering and architecture firm. I tested the implementation of information technology in the beer distribution game by modifying the play with a change that simulated implementing information technology. I compared the performance of test subjects with control groups that played the game at the same time, without the modification. I also compared the subjects' performance against the performance of trials first published in 1989. I hypothesized that implementing information technology would result in an immediate improvement of the teams' performance. The results of implementing information technology in the beer distribution game were not as expected; implementing information technology did not improve performance. When it became clear that my experimental hypotheses were incorrect, I went back to the literature to see if there was an explanation for this failure that could be derived from the literature on the beer game. I studied the information technology in the case study firm in order to extend the learning from the experimental research. The results of the experiment were not as I expected; I learned a great deal about the effect of information technology in a very controlled experimental setting. By expanding the research to include a case study I was able to explore the behavior in a more realistic environment. The beer distribution game provided me with an unexpected insight into the alignment of users mental models and the structure of the organization. The case study was completed using system dynamics tools to model, and then simulate, the effect of misalignment in a real world organization. Considering the results of the beer distribution game and the case study, I suggest that one explanation for the unintended consequences of implementing information technology is the misalignment of users' mental models with the altered structure of the organization after information technology is implemented.