Browsing by Author "Van Eyk, Jennifer E."
Now showing 1 - 6 of 6
Results Per Page
Sort Options
- Comparative assessment and novel strategy on methods for imputing proteomics dataShen, Minjie; Chang, Yi-Tan; Wu, Chiung-Ting; Parker, Sarah J.; Saylor, Georgia; Wang, Yizhi; Yu, Guoqiang; Van Eyk, Jennifer E.; Clarke, Robert; Herrington, David M.; Wang, Yue (2022-01-20)Missing values are a major issue in quantitative proteomics analysis. While many methods have been developed for imputing missing values in high-throughput proteomics data, a comparative assessment of imputation accuracy remains inconclusive, mainly because mechanisms contributing to true missing values are complex and existing evaluation methodologies are imperfect. Moreover, few studies have provided an outlook of future methodological development. We first re-evaluate the performance of eight representative methods targeting three typical missing mechanisms. These methods are compared on both simulated and masked missing values embedded within real proteomics datasets, and performance is evaluated using three quantitative measures. We then introduce fused regularization matrix factorization, a low-rank global matrix factorization framework, capable of integrating local similarity derived from additional data types. We also explore a biologically-inspired latent variable modeling strategy—convex analysis of mixtures—for missing value imputation and present preliminary experimental results. While some winners emerged from our comparative assessment, the evaluation is intrinsically imperfect because performance is evaluated indirectly on artificial missing or masked values not authentic missing values. Nevertheless, we show that our fused regularization matrix factorization provides a novel incorporation of external and local information, and the exploratory implementation of convex analysis of mixtures presents a biologically plausible new approach.
- Cosbin: cosine score-based iterative normalization of biologically diverse samplesWu, Chiung-Ting; Shen, Minjie; Du, Dongping; Cheng, Zuolin; Parker, Sarah J.; Lu, Yingzhou; Van Eyk, Jennifer E.; Yu, Guoqiang; Clarke, Robert; Herrington, David M.; Wang, Yue (Oxford University Press, 2022)Motivation: Data normalization is essential to ensure accurate inference and comparability of gene expression measures across samples or conditions. Ideally, gene expression data should be rescaled based on consistently expressed reference genes. However, to normalize biologically diverse samples, the most commonly used reference genes exhibit striking expression variability and size-factor or distribution-based normalization methods can be problematic when the amount of asymmetry in differential expression is significant. Results: We report an efficient and accurate data-driven method-Cosine score-based iterative normalization (Cosbin)-to normalize biologically diverse samples. Based on the Cosine scores of cross-condition expression patterns, the Cosbin pipeline iteratively eliminates asymmetric differentially expressed genes, identifies consistently expressed genes, and calculates sample-wise normalization factors. We demonstrate the superior performance and enhanced utility of Cosbin compared with six representative peer methods using both simulation and real multi-omics expression datasets. Implemented in open-source R scripts and specifically designed to address normalization bias due to significant asymmetry in differential expression across multiple conditions, the Cosbin tool complements rather than replaces the existing methods and will allow biologists to more accurately detect true molecular signals among diverse phenotypic groups. Availability and implementation: The R scripts of Cosbin pipeline are freely available at https://github.com/MinjieSh/Cosbin. Supplementary information: Supplementary data are available at Bioinformatics Advances online.
- COT: an efficient and accurate method for detecting marker genes among many subtypesLu, Yingzhou; Wu, Chiung-Ting; Parker, Sarah J.; Cheng, Zuolin; Saylor, Georgia; Van Eyk, Jennifer E.; Yu, Guoqiang; Clarke, Robert; Herrington, David M.; Wang, Yue (Oxford University Press, 2022)Motivation: Ideally, a molecularly distinct subtype would be composed of molecular features that are expressed uniquely in the subtype of interest but in no others-so-called marker genes (MGs). MG plays a critical role in the characterization, classification or deconvolution of tissue or cell subtypes. We and others have recognized that the test statistics used by most methods do not exactly satisfy the MG definition and often identify inaccurate MG. Results: We report an efficient and accurate data-driven method, formulated as a Cosine-based One-sample Test (COT) in scatter space, to detect MG among many subtypes using subtype expression profiles. Fundamentally different from existing approaches, the test statistic in COT precisely matches the mathematical definition of an ideal MG. We demonstrate the performance and utility of COT on both simulated and real gene expression and proteomics data. The open source Python/R tool will allow biologists to efficiently detect MG and perform a more comprehensive and unbiased molecular characterization of tissue or cell subtypes in many biomedical contexts. Nevertheless, COT complements not replaces existing methods. Availability and implementation: The Python COT software with a detailed user's manual and a vignette are freely available at https://github.com/MintaYLu/COT. Supplementary information: Supplementary data are available at Bioinformatics Advances online.
- Data-driven detection of subtype-specific differentially expressed genesChen, Lulu; Lu, Yingzhou; Wu, Chiung-Ting; Clarke, Robert; Yu, Guoqiang; Van Eyk, Jennifer E.; Herrington, David M.; Wang, Yue (2021-01-11)Among multiple subtypes of tissue or cell, subtype-specific differentially-expressed genes (SDEGs) are defined as being most-upregulated in only one subtype but not in any other. Detecting SDEGs plays a critical role in the molecular characterization and deconvolution of multicellular complex tissues. Classic differential analysis assumes a null hypothesis whose test statistic is not subtype-specific, thus can produce a high false positive rate and/or lower detection power. Here we first introduce a One-Versus-Everyone Fold Change (OVE-FC) test for detecting SDEGs. We then propose a scaled test statistic (OVE-sFC) for assessing the statistical significance of SDEGs that applies a mixture null distribution model and a tailored permutation test. The OVE-FC/sFC test was validated on both type 1 error rate and detection power using extensive simulation data sets generated from real gene expression profiles of purified subtype samples. The OVE-FC/sFC test was then applied to two benchmark gene expression data sets of purified subtype samples and detected many known or previously unknown SDEGs. Subsequent supervised deconvolution results on synthesized bulk expression data, obtained using the SDEGs detected from the independent purified expression data by the OVE-FC/sFC test, showed superior performance in deconvolution accuracy when compared with popular peer methods.
- Guidelines for experimental models of myocardial ischemia and infarctionLindsey, Merry L.; Bolli, Roberto; Canty, John M., Jr.; Du, Xiao-Jun; Frangogiannis, Nikolaos G.; Frantz, Stefan; Gourdie, Robert G.; Holmes, Jeffrey W.; Jones, Steven P.; Kloner, Robert A.; Lefer, David J.; Liao, Ronglih; Murphy, Elizabeth; Ping, Peipei; Przyklenk, Karin; Recchia, Fahio A.; Longacre, Lisa Schwartz; Ripplinger, Crystal M.; Van Eyk, Jennifer E.; Heusch, Gerd (2018-04)Myocardial infarction is a prevalent major cardiovascular event that arises from myocardial ischemia with or without reperfusion, and basic and translational research is needed to better understand its underlying mechanisms and consequences for cardiac structure and function. Ischemia underlies a broad range of clinical scenarios ranging from angina to hibernation to permanent occlusion, and while reperfusion is mandatory for salvage from ischemic injury, reperfusion also inflicts injury on its own. In this consensus statement, we present recommendations for animal models of myocardial ischemia and infarction. With increasing awareness of the need for rigor and reproducibility in designing and performing scientific research to ensure validation of results, the goal of this review is to provide best practice information regarding myocardial ischemia-reperfusion and infarction models. Listen to this article's corresponding podcast at ajpheart.podbean.com/e/guidelines-for-experimental-models-of-myocardial-ischemia-and-infarction/.
- Whole Exome Sequencing to Identify Genetic Variants Associated with Raised Atherosclerotic Lesions in Young PersonsHixson, James E.; Jun, Goo; Shimmin, Lawrence C.; Wang, Yizhi; Yu, Guoqiang; Mao, Chunhong; Warren, Andrew S.; Howard, Timothy D.; Vander Heide, Richard S.; Van Eyk, Jennifer E.; Wang, Yue; Herrington, David M. (Springer Nature, 2017-06-22)We investigated the influence of genetic variants on atherosclerosis using whole exome sequencing in cases and controls from the autopsy study "Pathobiological Determinants of Atherosclerosis in Youth (PDAY)". We identified a PDAY case group with the highest total amounts of raised lesions (n = 359) for comparisons with a control group with no detectable raised lesions (n = 626). In addition to the standard exome capture, we included genome-wide proximal promoter regions that contain sequences that regulate gene expression. Our statistical analyses included single variant analysis for common variants (MAF > 0.01) and rare variant analysis for low frequency and rare variants (MAF < 0.05). In addition, we investigated known CAD genes previously identified by meta-analysis of GWAS studies. We did not identify individual common variants that reached exome-wide significance using single variant analysis. In analysis limited to 60 CAD genes, we detected strong associations with COL4A2/COL4A1 that also previously showed associations with myocardial infarction and arterial stiffness, as well as coronary artery calcification. Likewise, rare variant analysis did not identify genes that reached exomewide significance. Among the 60 CAD genes, the strongest association was with NBEAL1 that was also identified in gene-based analysis of whole exome sequencing for early onset myocardial infarction.