Deep Learning for Enhancing Human and Environmental Health

Loading...
Thumbnail Image

Files

TR Number

Date

2026-04-21

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Ensuring human and environmental health is a growing global priority and a fundamental challenge at the intersection of computer science, biology, and medicine. Advances in high-throughput sequencing technologies have enabled comprehensive characterization of biological systems across multiple omics layers, offering unprecedented opportunities to support precision medicine and environmental risk prevention. These data have been widely used for disease understanding, patient stratification, and monitoring of microbial communities in both clinical and environmental settings. In recent years, deep learning has emerged as an approach for modeling nonlinear relationships from high-dimensional and noisy omics data, demonstrating improved performance over traditional machine learning methods across various tasks. However, its practical application remains fundamentally constrained by key challenges arising from omics data scarcity and heterogeneity, including (1) limited availability of labeled samples, (2) batch effects across datasets, (3) the prevalence of missing values, and (4) the need for efficient and robust learning under limited data conditions. This work proposes a series of deep learning frameworks to address these challenges and enhance the practical applicability of omics-based analysis. To mitigate the scarcity of labeled data and batch effects, BCtypeFinder and CancerSubminer are presented as cancer subtyping methods that leverage both labeled and unlabeled datasets while correcting batch effects, resulting in improved robustness and generalizability. To address missing data in longitudinal studies, DeepMicroGen is developed as a generative adversarial network-based imputation framework that captures temporal dependencies and accurately reconstructs incomplete observations, thereby improving downstream predictive performance. Furthermore, to enable efficient and robust learning under limited data conditions, ARGfore is proposed as a forecasting framework for predicting antibiotic resistance gene abundances from time-series omics data, achieving improved predictive performance with reduced computational cost. Collectively, the proposed methods help to advance the applicability of deep learning in omics research by addressing fundamental omics data-related challenges. This work contributes to more robust disease characterization and improved predictive modeling and forecasting, thereby supporting the broader goals of precision medicine and environmental risk prevention.

Description

Keywords

cancer subtyping, data imputation, time-series forecasting, deep learning

Citation