Discovering contextual connections between biological processes using high-throughput data

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

Hearkening to calls from life scientists for aid in interpreting rapidly-growing repositories of data, the fields of bioinformatics and computational systems biology continue to bear increasingly sophisticated methods capable of summarizing and distilling pertinent phenomena captured by high-throughput experiments. Techniques in analysis of genome-wide gene expression (e.g., microarray) data, for example, have moved beyond simply detecting individual genes perturbed in treatment-control experiments to reporting the collective perturbation of biologically-related collections of genes, or "processes". Recent expression analysis methods have focused on improving comprehensibility of results by reporting concise, non-redundant sets of processes by leveraging statistical modeling techniques such as Bayesian networks.

Simultaneously, integrating gene expression measurements with gene interaction networks has led to computation of response networks--subgraphs of interaction networks in which genes exhibit strong collective perturbation or co-expression. Methods that integrate process annotations of genes with interaction networks identify high-level connections between biological processes, themselves. To identify context-specific changes in these inter-process connections, however, techniques beyond process-based expression analysis, which reports only perturbed processes and not their relationships, response networks, composed of interactions between genes rather than processes, and existing techniques in process connection detection, which do not incorporate specific biological context, proved necessary.

We present two novel methods which take inspiration from the latest techniques in process-based gene expression analysis, computation of response networks, and computation of inter-process connections. We motivate the need for detecting inter-process connections by identifying a collection of processes exhibiting significant differences in collective expression in two liver tissue culture systems widely used in toxicological and pharmaceutical assays. Next, we identify perturbed connections between these processes via a novel method that integrates gene expression, interaction, and annotation data. Finally, we present another novel method that computes non-redundant sets of perturbed inter-process connections, and apply it to several additional liver-related data sets. These applications demonstrate the ability of our methods to capture and report biologically relevant high-level trends.

molecular interactions, gene expression, liver, Markov chain Monte Carlo, computational systems biology