Algorithms for regulatory network inference and experiment planning in systems biology
MetadataShow full item record
I present novel solutions to two different classes of computational problems that arise in the study of complex cellular processes. The first problem arises in the context of planning large-scale genetic cross experiments that can be used to validate predictions of multigenic perturbations made by mathematical models. (i) I present CrossPlan, a novel methodology for systematically planning genetic crosses to make a set of target mutants from a set of source mutants. CrossPlan is based on a generic experimental workflow used in performing genetic crosses in budding yeast. CrossPlan uses an integer-linear-program (ILP) to maximize the number of target mutants that we can make under certain experimental constraints. I apply it to a comprehensive mathematical model of the protein regulatory network controlling cell division in budding yeast. (ii) I formulate several natural problems related to efficient synthesis of a target mutant from source mutants. These formulations capture experimentally-useful notions of verifiability (e.g., the need to confirm that a mutant contains mutations in the desired genes) and permissibility (e.g., the requirement that no intermediate mutants in the synthesis be inviable). I present several polynomial time or fixed-parameter tractable algorithms for optimal synthesis of a target mutant for special cases of the problem that arise in practice. The second problem I address is inferring gene regulatory networks (GRNs) from single cell transcriptomic (scRNA-seq) data. These GRNs can serve as starting points to build mathematical models. (iii) I present BEELINE, a comprehensive evaluation of state-of-the-art algorithms for inferring gene regulatory networks (GRNs) from single-cell gene expression data. The evaluations from BEELINE suggest that the area under the precision-recall curve and early precision of these algorithms are moderate. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, I present recommendations to end users of GRN inference methods. BEELINE will aid the development of gene regulatory network inference algorithms. (iv) Based on the insights gained from BEELINE, I propose a novel graph convolutional neural network (GCN) based supervised algorithm for GRN inference form single-cell gene expression data. This GCN-based model has a considerably better accuracy than existing supervised learning algorithms for GRN inference from scRNA-seq data and can infer cell-type specific regulatory networks.
General Audience Abstract
A small number of key molecules can completely change the cell's state, for example, a stem cell differentiating into distinct types of blood cells or a healthy cell turning cancerous. How can we uncover the important cellular events that govern complex biological behavior? One approach to answering the question has been to elucidate the mechanisms by which genes and proteins control each other in a cell. These mechanisms are typically represented in the form of a gene or protein regulatory network. The resulting networks can be modeled as a system of mathematical equations, also known as a mathematical model. The advantage of such a model is that we can computationally simulate the time courses of various molecules. Moreover, we can use the model simulations to predict the effect of perturbations such as deleting one or more genes. A biologist can perform experiments to test these predictions. Subsequently, the model can be iteratively refined by reconciling any differences between the prediction and the experiment. In this thesis I present two novel solutions aimed at dramatically reducing the time and effort required for this build-simulate-test cycle. The first solution I propose is in prioritizing and planning large-scale gene perturbation experiments that can be used for validating existing models. I then focus on taking advantage of the recent advances in experimental techniques that enable us to measure gene activity at a single-cell resolution, known as scRNA-seq. This scRNA-seq data can be used to infer the interactions in gene regulatory networks. I perform a systematic evaluation of existing computational methods for building gene regulatory networks from scRNA-seq data. Based on the insights gained from this comprehensive evaluation, I propose novel algorithms that can take advantage of prior knowledge in building these regulatory networks. The results underscore the promise of my approach in identifying cell-type specific interactions. These context-specific interactions play a key role in building mathematical models to study complex cellular processes such as a developmental process that drives transitions from one cell type to another
- Doctoral Dissertations