Differential Dependency Network and Data Integration for Detecting Network Rewiring and Biomarkers

TR Number

Date

2020-01-30

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Rapid advances in high-throughput molecular profiling techniques enabled large-scale genomics, transcriptomics, and proteomics-based biomedical studies, generating an enormous amount of multi-omics data. Processing and summarizing multi-omics data, modeling interactions among biomolecules, and detecting condition-specific dysregulation using multi-omics data are some of the most important yet challenging analytics tasks.

In the case of detecting somatic DNA copy number aberrations using bulk tumor samples in cancer research, normal cell contamination becomes one significant confounding factor that weakens the power regardless of whichever methods used for detection. To address this problem, we propose a computational approach – BACOM 2.0 to more accurately estimate normal cell fraction and accordingly reconstruct DNA copy number signals in cancer cells. Specifically, by introducing allele-specific absolute normalization, BACOM 2.0 can accurately detect deletion types and aneuploidy in cancer cells directly from DNA copy number data.

Genes work through complex networks to support cellular processes. Dysregulated genes can cause structural changes in biological networks, also known as network rewiring. Genes with a large number of rewired edges are more likely to be associated with functional alteration leading phenotype transitions, and hence are potential biomarkers in diseases such as cancers. Differential dependency network (DDN) method was proposed to detect such network rewiring and biomarkers.

However, the existing DDN method and software tool has two major drawbacks. Firstly, in imbalanced sample groups, DDN suffers from systematic bias and produces false positive differential dependencies. Secondly, the computational time of the block coordinate descent algorithm in DDN increases rapidly with the number of involved samples and molecular entities. To address the imbalanced sample group problem, we propose a sample-scale-wide normalized formulation to correct systematic bias and design a simulation study for testing the performance. To address high computational complexity, we propose several strategies to accelerate DDN learning, including two reformulated algorithms for block-wise coefficient updating in the DDN optimization problem. Specifically, one strategy on discarding predictors and one strategy on accelerating parallel computing. More importantly, experimental results show that new DDN learning speed with combined accelerating strategies is hundreds of times faster than that of the original method on medium-sized data.

We applied the DDN method on several biomedical datasets of omics data and detected significant phenotype-specific network rewiring. With a random-graph-based detection strategy, we discovered the hub node defined biomarkers that helped to generate or validate several novel scientific hypotheses in collaborative research projects. For example, the hub genes detected by the DDN methods in proteomics data from artery samples are significantly enriched in the citric acid cycle pathway that plays a critical role in the development of atherosclerosis.

To detect intra-omics and inter-omics network rewirings, we propose a method called multiDDN that uses a multi-layer signaling model to integrate multi-omics data. We adapt the block coordinate descent algorithm to solve the multiDDN optimization problem with accelerating strategies. The simulation study shows that, compared with the DDN method on single omics, the multiDDN method has considerable advantage on higher accuracy of detecting network rewiring. We applied the multiDDN method on the real multi-omics data from CPTAC ovarian cancer dataset, and detected multiple hub genes associated with histone protein deacetylation and were previously reported in independent ovarian cancer data analysis.

Description

Keywords

molecular data integration, differential network analysis, biomarker

Citation