Browsing by Author "Chen, Lulu"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- Data-driven detection of subtype-specific differentially expressed genesChen, Lulu; Lu, Yingzhou; Wu, Chiung-Ting; Clarke, Robert; Yu, Guoqiang; Van Eyk, Jennifer E.; Herrington, David M.; Wang, Yue (2021-01-11)Among multiple subtypes of tissue or cell, subtype-specific differentially-expressed genes (SDEGs) are defined as being most-upregulated in only one subtype but not in any other. Detecting SDEGs plays a critical role in the molecular characterization and deconvolution of multicellular complex tissues. Classic differential analysis assumes a null hypothesis whose test statistic is not subtype-specific, thus can produce a high false positive rate and/or lower detection power. Here we first introduce a One-Versus-Everyone Fold Change (OVE-FC) test for detecting SDEGs. We then propose a scaled test statistic (OVE-sFC) for assessing the statistical significance of SDEGs that applies a mixture null distribution model and a tailored permutation test. The OVE-FC/sFC test was validated on both type 1 error rate and detection power using extensive simulation data sets generated from real gene expression profiles of purified subtype samples. The OVE-FC/sFC test was then applied to two benchmark gene expression data sets of purified subtype samples and detected many known or previously unknown SDEGs. Subsequent supervised deconvolution results on synthesized bulk expression data, obtained using the SDEGs detected from the independent purified expression data by the OVE-FC/sFC test, showed superior performance in deconvolution accuracy when compared with popular peer methods.
- Mathematical Modeling and Deconvolution for Molecular Characterization of Tissue HeterogeneityChen, Lulu (Virginia Tech, 2020-01-22)Tissue heterogeneity, arising from intermingled cellular or tissue subtypes, significantly obscures the analyses of molecular expression data derived from complex tissues. Existing computational methods performing data deconvolution from mixed subtype signals almost exclusively rely on supervising information, requiring subtype-specific markers, the number of subtypes, or subtype compositions in individual samples. We develop a fully unsupervised deconvolution method to dissect complex tissues into molecularly distinctive tissue or cell subtypes directly from mixture expression profiles. We implement an R package, deconvolution by Convex Analysis of Mixtures (debCAM) that can automatically detect tissue or cell-specific markers, determine the number of constituent sub-types, calculate subtype proportions in individual samples, and estimate tissue/cell-specific expression profiles. We demonstrate the performance and biomedical utility of debCAM on gene expression, methylation, and proteomics data. With enhanced data preprocessing and prior knowledge incorporation, debCAM software tool will allow biologists to perform a deep and unbiased characterization of tissue remodeling in many biomedical contexts. Purified expression profiles from physical experiments provide both ground truth and a priori information that can be used to validate unsupervised deconvolution results or improve supervision for various deconvolution methods. Detecting tissue or cell-specific expressed markers from purified expression profiles plays a critical role in molecularly characterizing and determining tissue or cell subtypes. Unfortunately, classic differential analysis assumes a convenient test statistic and associated null distribution that is inconsistent with the definition of markers and thus results in a high false positive rate or lower detection power. We describe a statistically-principled marker detection method, One Versus Everyone Subtype Exclusively-expressed Genes (OVESEG) test, that estimates a mixture null distribution model by applying novel permutation schemes. Validated with realistic synthetic data sets on both type 1 error and detection power, OVESEG-test applied to benchmark gene expression data sets detects many known and de novo subtype-specific expressed markers. Subsequent supervised deconvolution results, obtained using markers detected by the OVESEG-test, showed superior performance when compared with popular peer methods. While the current debCAM approach can dissect mixed signals from multiple samples into the 'averaged' expression profiles of subtypes, many subsequent molecular analyses of complex tissues require sample-specific deconvolution where each sample is a mixture of 'individualized' subtype expression profiles. The between-sample variation embedded in sample-specific subtype signals provides critical information for detecting subtype-specific molecular networks and uncovering hidden crosstalk. However, sample-specific deconvolution is an underdetermined and challenging problem because there are more variables than observations. We propose and develop debCAM2.0 to estimate sample-specific subtype signals by nuclear norm regularization, where the hyperparameter value is determined by random entry exclusion based cross-validation scheme. We also derive an efficient optimization approach based on ADMM to enable debCAM2.0 application in large-scale biological data analyses. Experimental results on realistic simulation data sets show that debCAM2.0 can successfully recover subtype-specific correlation networks that is unobtainable otherwise using existing deconvolution methods.
- Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissuesWang, Niya; Hoffman, Eric P.; Chen, Lulu; Chen, Li; Zhang, Zhen; Liu, Chunyu; Yu, Guoqiang; Herrington, David M.; Clarke, Robert; Wang, Yue (Springer Nature, 2016-01-07)Tissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised computational methods to deconvolute tissue heterogeneity, these approaches require a priori information on the marker genes or composition of known subpopulations. To address the critical problem of the absence of validated marker genes for many (including novel) subpopulations, we describe convex analysis of mixtures (CAM), a fully unsupervised in silico method, for identifying subpopulation marker genes directly from the original mixed gene expressions in scatter space that can improve molecular analyses in many biological contexts. Validated with predesigned mixtures, CAM on the gene expression data from peripheral leukocytes, brain tissue, and yeast cell cycle, revealed novel marker genes that were otherwise undetectable using existing methods. Importantly, CAM requires no a priori information on the number, identity, or composition of the subpopulations present in mixed samples, and does not require the presence of pure subpopulations in sample space. This advantage is significant in that CAM can achieve all of its goals using only a small number of heterogeneous samples, and is more powerful to distinguish between phenotypically similar subpopulations.