Knowledge-fused Identification of Condition-specific Rewiring of Dependencies in Biological Networks
Gene network modeling is one of the major goals of systems biology research. Gene network modeling targets the middle layer of active biological systems that orchestrate the activities of genes and proteins. Gene network modeling can provide critical information to bridge the gap between causes and effects which is essential to explain the mechanisms underlying disease. Among the network construction tasks, the rewiring of relevant network structure plays critical roles in determining the behavior of diseases. To systematically characterize the selectively activated regulatory components and mechanisms, the modeling tools must be able to effectively distinguish significant rewiring from random background fluctuations. While differential dependency networks cannot be constructed by existing knowledge alone, effective incorporation of prior knowledge into data-driven approaches can improve the robustness and biological relevance of network inference. Existing studies on protein-protein interactions and biological pathways provide constantly accumulated rich domain knowledge. Though novel incorporation of biological prior knowledge into network learning algorithms can effectively leverage domain knowledge, biological prior knowledge is neither condition-specific nor error-free, only serving as an aggregated source of partially-validated evidence under diverse experimental conditions. Hence, direct incorporation of imperfect and non-specific prior knowledge in specific problems is prone to errors and theoretically problematic. To address this challenge, we propose a novel mathematical formulation that enables incorporation of prior knowledge into structural learning of biological networks as Gaussian graphical models, utilizing the strengths of both measurement data and prior knowledge. We propose a novel strategy to estimate and control the impact of unavoidable false positives in the prior knowledge that fully exploits the evidence from data while obtains "second opinion" by efficient consultations with prior knowledge. By proposing a significance assessment scheme to detect statistically significant rewiring of the learned differential dependency network, our method can assign edge-specific p-values and specify edge types to indicate one of six biological scenarios. The data-knowledge jointly inferred gene networks are relatively simple to interpret, yet still convey considerable biological information. Experiments on extensive simulation data and comparison with peer methods demonstrate the effectiveness of knowledge-fused differential dependency network in revealing the statistically significant rewiring in biological networks, leveraging data-driven evidence and existing biological knowledge, while remaining robust to the false positive edges in the prior knowledge. We also made significant efforts in disseminating the developed method tools to the research community. We developed an accompanying R package and Cytoscape plugin to provide both batch processing ability and user-friendly graphic interfaces. With the comprehensive software tools, we apply our method to several practically important biological problems to study how yeast response to stress, to find the origin of ovarian cancer, and to evaluate the drug treatment effectiveness and other broader biological questions. In the yeast stress response study our findings corroborated existing literatures. A network distance measurement is defined based on KDDN and provided novel hypothesis on the origin of high-grade serous ovarian cancer. KDDN is also used in a novel integrated study of network biology and imaging in evaluating drug treatment of brain tumor. Applications to many other problems also received promising biological results.
- Doctoral Dissertations