Towards Network-Guided Large-Scale Foundation Models on Single-Cell Transcriptomics
dc.contributor.author | Kommu, Sindhura | en |
dc.contributor.committeechair | Wang, Xuan | en |
dc.contributor.committeemember | Wang, Yue J. | en |
dc.contributor.committeemember | Zhou, Dawei | en |
dc.contributor.department | Computer Science and#38; Applications | en |
dc.date.accessioned | 2025-05-29T08:01:27Z | en |
dc.date.available | 2025-05-29T08:01:27Z | en |
dc.date.issued | 2025-05-28 | en |
dc.description.abstract | Large-scale pretrained models known as foundation models, have made breakthrough progress in the fields like NLP and computer vision. Recently, transformer-based foundation models tailored for single-cell RNA sequencing (scRNA-seq) data have shown significant potential in interpreting the 'languages' of cells through self-supervised learning on huge amounts of unlabeled scRNA-seq datasets. These models could significantly enhance our understanding of cellular functions and disease mechanisms. However, unlike text data, scRNA-seq data is high-dimensional, inherently noisy and sparse, posing unique chal- lenges. We hypothesize that a major limitation of current single-cell foundation models (scFMs) lies in their inability to effectively leverage prior biological knowledge that could provide valuable complementary insights on relationships between various genes. One of the most critical applications of scRNA-seq is the inference of gene regulatory networks (GRNs), which represent the intricate interactions between transcription factors (TFs) and their target genes. In the first part of this thesis, we propose SCREGNET, an innovative framework that combines scFMs with graph-based learning by incorporating experimentally validated transcription factor-DNA binding data in the form of networks with known regula- tory interactions for the GRN inference task. SCREGNET achieved state-of-the-art results in the gene regulatory link prediction task when compared to nine baseline methods across seven scRNA-seq benchmark datasets and demonstrated greater robustness. In the second part of the thesis, we systematically explored incorporating prior GRNs into the pretraining of scFMs. This exploration provided valuable insights into the benefits and limitations of network guidance, revealing varied effects on predictive accuracy across different downstream tasks related to chromatin and network dynamics. | en |
dc.description.abstractgeneral | Every cell in our body contains thousands of genes working together in complex networks to control how cells grow, respond to stress, or become specialized. Understanding these gene regulatory networks is crucial for studying diseases, development, and treatment responses. With recent advances in single-cell RNA sequencing, scientists can examine gene activity in individual cells, but making sense of this data requires powerful tools. This thesis explores how large-scale "foundation models" trained on millions of single cells can help uncover hidden gene interactions. In the first part of the study, we introduce a new method called SCREGNET, which combines these foundation models with graph-based learning to accurately predict missing or unknown gene connections. This method showed superior performance across a variety of cell types, especially when dealing with noisy data. In the second part, we investigate whether incorporating prior biological knowledge, such as known gene regulatory networks, into the training of foundation models can improve their performance on related tasks. By guiding the learning process with real-world biological graphs, we show that these models become better at identifying important gene regulators. Together, these contributions provide new ways to blend data-driven learning with expert knowledge, helping advance biomedical research and precision medicine. | en |
dc.description.degree | Master of Science | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:43933 | en |
dc.identifier.uri | https://hdl.handle.net/10919/134278 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Gene Regulatory Networks | en |
dc.subject | Single-Cell Foundation Models | en |
dc.subject | Graph Neural Networks | en |
dc.title | Towards Network-Guided Large-Scale Foundation Models on Single-Cell Transcriptomics | en |
dc.type | Thesis | en |
thesis.degree.discipline | Computer Science & Applications | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | masters | en |
thesis.degree.name | Master of Science | en |
Files
Original bundle
1 - 1 of 1