Browsing by Author "Oh, Min"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
- Deep Learning for Enhancing Precision MedicineOh, Min (Virginia Tech, 2021-06-07)Most medical treatments have been developed aiming at the best-on-average efficacy for large populations, resulting in treatments successful for some patients but not for others. It necessitates the need for precision medicine that tailors medical treatment to individual patients. Omics data holds comprehensive genetic information on individual variability at the molecular level and hence the potential to be translated into personalized therapy. However, the attempts to transform omics data-driven insights into clinically actionable models for individual patients have been limited. Meanwhile, advances in deep learning, one of the most promising branches of artificial intelligence, have produced unprecedented performance in various fields. Although several deep learning-based methods have been proposed to predict individual phenotypes, they have not established the state of the practice, due to instability of selected or learned features derived from extremely high dimensional data with low sample sizes, which often results in overfitted models with high variance. To overcome the limitation of omics data, recent advances in deep learning models, including representation learning models, generative models, and interpretable models, can be considered. The goal of the proposed work is to develop deep learning models that can overcome the limitation of omics data to enhance the prediction of personalized medical decisions. To achieve this, three key challenges should be addressed: 1) effectively reducing dimensions of omics data, 2) systematically augmenting omics data, and 3) improving the interpretability of omics data.
- DeepMicro: deep representation learning for disease prediction based on microbiome dataOh, Min; Zhang, Liqing (Nature Research, 2020)Human microbiota plays a key role in human health and growing evidence supports the potential use of microbiome as a predictor of various diseases. However, the high-dimensionality of microbiome data, often in the order of hundreds of thousands, yet low sample sizes, poses great challenge for machine learning-based prediction algorithms. This imbalance induces the data to be highly sparse, preventing from learning a better prediction model. Also, there has been little work on deep learning applications to microbiome data with a rigorous evaluation scheme. To address these challenges, we propose DeepMicro, a deep representation learning framework allowing for an effective representation of microbiome profiles. DeepMicro successfully transforms high-dimensional microbiome data into a robust low-dimensional representation using various autoencoders and applies machine learning classification algorithms on the learned representation. In disease prediction, DeepMicro outperforms the current best approaches based on the strain-level marker profile in five different datasets. In addition, by significantly reducing the dimensionality of the marker profile, DeepMicro accelerates the model training and hyperparameter optimization procedure with 8X–30X speedup over the basic approach. DeepMicro is freely available at https://github.com/minoh0201/DeepMicro.
- Drug voyager: a computational platform for exploring unintended drug actionOh, Min; Ahn, Jaegyoon; Lee, Taekeon; Jang, Giup; Park, Chihyun; Yoon, Youngmi (2017-02-28)Background The dominant paradigm in understanding drug action focuses on the intended therapeutic effects and frequent adverse reactions. However, this approach may limit opportunities to grasp unintended drug actions, which can open up channels to repurpose existing drugs and identify rare adverse drug reactions. Advances in systems biology can be exploited to comprehensively understand pharmacodynamic actions, although proper frameworks to represent drug actions are still lacking. Results We suggest a novel platform to construct a drug-specific pathway in which a molecular-level mechanism of action is formulated based on pharmacologic, pharmacogenomic, transcriptomic, and phenotypic data related to drug response (http://databio.gachon.ac.kr/tools/). In this platform, an adoption of three conceptual levels imitating drug perturbation allows these pathways to be realistically rendered in comparison to those of other models. Furthermore, we propose a new method that exploits functional features of the drug-specific pathways to predict new indications as well as adverse reactions. For therapeutic uses, our predictions significantly overlapped with clinical trials and an up-to-date drug-disease association database. Also, our method outperforms existing methods with regard to classification of active compounds for cancers. For adverse reactions, our predictions were significantly enriched in an independent database derived from the Food and Drug Administration (FDA) Adverse Event Reporting System and meaningfully cover an Adverse Reaction Database provided by Health Canada. Lastly, we discuss several predictions for both therapeutic indications and side-effects through the published literature. Conclusions Our study addresses how we can computationally represent drug-signaling pathways to understand unintended drug actions and to facilitate drug discovery and screening.
- Effect of antibiotic use and composting on antibiotic resistance gene abundance and resistome risks of soils receiving manure-derived amendmentsChen, Chaoqi; Pankow, Christine A.; Oh, Min; Heath, Lenwood S.; Zhang, Liqing; Du, Pang; Xia, Kang; Pruden, Amy (Elsevier, 2019-05-03)Manure-derived amendments are commonly applied to soil, raising questions about whether antibiotic use in livestock could influence the soil resistome (collective antibiotic resistance genes (ARGs)) and ultimately contribute to the spread of antibiotic resistance to humans during food production. Here, we examined the metagenomes of soils amended with raw or composted manure generated from dairy cows administered pirlimycin and cephapirin (antibiotic) or no antibiotics (control) relative to unamended soils. Initial amendment (Day 1) with manure or compost significantly increased the diversity (richness) of ARGs in soils (p < 0.01) and resulted in distinct abundances of individual ARG types. Notably, initial amendment with antibiotic-manure significantly increased the total ARG relative abundances (per 16S rRNA gene) in the soils (2.21×unamended soils, p < 0.001). After incubating 120 days, to simulate a wait period before crop harvest, 282 ARGs reduced 4.33- fold (median) up to 307-fold while 210 ARGs increased 2.89-fold (median) up to 76-fold in the antibioticmanure- amended soils, resulting in reduced total ARG relative abundances equivalent to those of the unamended soils. We further assembled the metagenomic data and calculated resistome risk scores, which was recently defined as a relative index comparing co-occurrence of sequences corresponding to ARGs, mobile genetic elements, and putative pathogens on the same scaffold. Initial amendment of manure significantly increased the soil resistome risk scores, especially when generated by cows administered antibiotics, while composting reduced the effects and resulted in soil resistomes more similar to the background. The risk scores of manure-amended soils reduced to levels comparable to the unamended soils after 120 days. Overall, this study provides an integrated, high-resolution examination of the effects of prior antibiotic use, composting, and a 120-day wait period on soil resistomes following manure-derived amendment, demonstrating that all three management practices have measurable effects and should be taken into consideration in the development of policy and practice for mitigating the spread of antibiotic resistance.
- Exploring the Consistency of the Quality Scores with Machine Learning for Next-Generation Sequencing ExperimentsCosgun, Erdal; Oh, Min (2020-02-26)Background. Next-generation sequencing enables massively parallel processing, allowing lower cost than the other sequencing technologies. In the subsequent analysis with the NGS data, one of the major concerns is the reliability of variant calls. Although researchers can utilize raw quality scores of variant calling, they are forced to start the further analysis without any preevaluation of the quality scores. Method. We presented a machine learning approach for estimating quality scores of variant calls derived from BWA+GATK. We analyzed correlations between the quality score and these annotations, specifying informative annotations which were used as features to predict variant quality scores. To test the predictive models, we simulated 24 paired-end Illumina sequencing reads with 30x coverage base. Also, twenty-four human genome sequencing reads resulting from Illumina paired-end sequencing with at least 30x coverage were secured from the Sequence Read Archive. Results. Using BWA+GATK, VCFs were derived from simulated and real sequencing reads. We observed that the prediction models learned by RFR outperformed other algorithms in both simulated and real data. The quality scores of variant calls were highly predictable from informative features of GATK Annotation Modules in the simulated human genome VCF data (R2: 96.7%, 94.4%, and 89.8% for RFR, MLR, and NNR, respectively). The robustness of the proposed data-driven models was consistently maintained in the real human genome VCF data (R2: 97.8% and 96.5% for RFR and MLR, respectively).
- Generalizing predictions to unseen sequencing profiles via deep generative modelsOh, Min; Zhang, Liqing (Nature Portfolio, 2022-05-03)Predictive models trained on sequencing profiles often fail to achieve expected performance when externally validated on unseen profiles. While many factors such as batch effects, small data sets, and technical errors contribute to the gap between source and unseen data distributions, it is a challenging problem to generalize the predictive models across studies without any prior knowledge of the unseen data distribution. Here, this study proposes DeepBioGen, a sequencing profile augmentation procedure that characterizes visual patterns of sequencing profiles, generates realistic profiles based on a deep generative model capturing the patterns, and generalizes the subsequent classifiers. DeepBioGen outperforms other methods in terms of enhancing the generalizability of the prediction models on unseen data. The generalized classifiers surpass the state-of-the-art method, evaluated on RNA sequencing tumor expression profiles for anti-PD1 therapy response prediction and WGS human gut microbiome profiles for type 2 diabetes diagnosis.
- MetaCompare: a computational pipeline for prioritizing environmental resistome riskOh, Min; Pruden, Amy; Chen, Chaoqi; Heath, Lenwood S.; Xia, Kang; Zhang, Liqing (2018-07)The spread of antibiotic resistance is a growing public health concern. While numerous studies have highlighted the importance of environmental sources and pathways of the spread of antibiotic resistance, a systematic means of comparing and prioritizing risks represented by various environmental compartments is lacking. Here, we introduce MetaCompare, a publicly available tool for ranking 'resistome risk', which we define as the potential for antibiotic resistance genes (ARGs) to be associated with mobile genetic elements (MGEs) and mobilize to pathogens based on metagenomic data. A computational pipeline was developed in which each ARG is evaluated based on relative abundance, mobility, and presence within a pathogen. This is determined through the assembly of shotgun sequencing data and analysis of contigs containing ARGs to determine if they contain sequence similarity to MGEs or human pathogens. Based on the assembled metagenomes, samples are projected into a 3-dimensionalhazard space and assigned resistome risk scores. To validate, we tested previously published metagenomic data derived from distinct aquatic environments. Based on unsupervised machine learning, the test samples clustered in the hazard space in a manner consistent with their origin. The derived scores produced a well-resolved ascending resistome risk ranking of: wastewater treatment plant effluent, dairy lagoon, and hospital sewage.