Graph-Based Computational Approaches for Modeling Viral Evolution

dc.contributor.authorDas, Badhanen
dc.contributor.committeechairHeath, Lenwood S.en
dc.contributor.committeememberCao, Youngen
dc.contributor.committeememberPritchard, Leightonen
dc.contributor.committeememberJi, Boen
dc.contributor.committeememberVinatzer, Boris A.en
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2026-01-08T09:00:14Zen
dc.date.available2026-01-08T09:00:14Zen
dc.date.issued2026-01-07en
dc.description.abstractModeling viral evolution is essential for understanding how pathogens adapt, spread, and generate new variants of concern. Yet, it remains challenging due to high mutation rates, minimal sequence divergence, and the scale of modern genomic data. Most phylogenetic trees enforce a strictly bifurcating structure that struggles to represent recurrent mutations, recombination, convergent evolution, and intra-host diversity. In contrast, quasispecies theory describes viral populations as clouds of closely related mutants evolving within a high-dimensional sequence space, where evolutionary relationships are more naturally captured by graphs than trees. In this dissertation, I develop a sequence of graph-centered frameworks that integrate viral fitness, mutational distance, and mutational dynamics to model viral evolution from algorithmic and data-driven perspectives. First, ViraFit introduces a proof-of-concept model that couples epidemiological spread on contact networks with evolutionary dynamics on fitness landscapes, demonstrating how mutation, selection, and network structure jointly shape adaptive trajectories. Second, the Variant Evolution Graph (VEG) provides a scalable graph-based representation of SARS-CoV-2 evolution derived from mutational distances, allowing multiple ancestral relationships and capturing virus-specific evolutionary patterns that are difficult to represent with phylogenetic trees. A derived Disease Transmission Network further supports inference of likely transmission pathways and superspreaders. Finally, the Ancestor-Joining algorithm extends this representation into a predictive framework, Mutation Learning Graph (MLG), by inferring intermediate ancestral variants and enabling graph neural network–based lineage classification and mutational link prediction across geographically diverse SARS-CoV-2 cohorts. Together, ViraFit, VEG, and MLG form a unified methodological progression that links mechanistic modeling, evolutionary reconstruction, and predictive graph learning, providing a scalable, mutation-centric view of viral evolution that complements traditional phylogenetic approaches and supports future variant forecasting.en
dc.description.abstractgeneralViruses such as SARS-CoV-2 evolve rapidly, generating many closely related variants as they spread through a population. These small genetic changes can influence how easily a virus spreads, how severe the disease becomes, and whether existing vaccines or treatments remain effective. Because of this, understanding how viruses change over time is essential for public health and pandemic preparedness. This dissertation develops new computational approaches to study viral evolution by using graphs, where each viral genome is represented as a node and edges show how one strain may have changed into another. Graphs offer a flexible way to capture the many possible paths a virus may take as it mutates, including patterns that traditional phylogenetic methods often miss. The first part of this work introduces ViraFit, a simulation framework that models how viruses mutate and how disease is transmitted simultaneously. It shows how the structure of human contact networks and the fitness of different viral strains work together to shape which variants become dominant. The second part presents the Variant Evolution Graph (VEG), a new method for organizing real viral genome data. VEG makes it easier to detect important virus-specific evolutionary events, such as repeated mutations, recombination, and diversity within infected individuals, that may not be clearly evident in standard evolutionary trees. The final part of the dissertation expands these ideas using machine learning. The Mutation Learning Graph (MLG) combines graph representations with advanced neural networks to learn patterns in viral evolution, predict relationships that have not yet been observed, and help anticipate how future variants might emerge. Together, these methods provide a more detailed and flexible picture of viral evolution, offering tools to support genomic surveillance, early detection of new variants, and improved understanding of viral evolution. Although developed using SARS-CoV-2 data, the approaches are general and can be applied to many other rapidly evolving viruses and biological systems.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:45479en
dc.identifier.urihttps://hdl.handle.net/10919/140658en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectViral evolutionen
dc.subjectMutationen
dc.subjectFitness landscapeen
dc.subjectEdit distanceen
dc.subjectMutation similarityen
dc.subjectVariant Evolution Graphen
dc.subjectMutation Learning Graphen
dc.titleGraph-Based Computational Approaches for Modeling Viral Evolutionen
dc.typeDissertationen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Das_B_D_2026.pdf
Size:
14.91 MB
Format:
Adobe Portable Document Format