Computational Analysis and Network-based Modeling of Cross-Species Transmissions

TR Number

Date

2026-01-06

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Zoonotic spillover of pathogens is the dominant cause of emerging infectious diseases.Cross-species transmission (CST) risks are accelerated by climate change, which alters animal habitats and aggregates new combinations of host species at high population density and elevations. Modeling CST dynamics is essential in ecology and computational epidemiology to enhance preparedness and resilience against future outbreaks. However, accurate prediction remains challenging due to biased pathogen sampling in existing CST databases and complex interactions among viral host range. This dissertation addresses three main challenges: (1) optimizing CST testing set selection through graph entropy frameworks; (2) modeling infectious pathways and biodiversity shifts with climate change scenarios; and (3) developing an accessible knowledge-based question and answering (QA) framework using Retrieval Augmented Generation (RAG) technology. Current viral databases consist mostly of pathogens in humans and domesticated animals, while the remaining vertebrate genera account for a mere 9%. Testing resources are limited for assessing indeterminate 800,000 to 1.5 million mammalian viruses with zoonotic potential. Furthermore, climate change will expose host species to novel ecological interactions and complicate efforts to identify infectious pathways. I leveraged information theory and graph entropy, where high entropy implies a more informative, diverse, and unpredictable network structure, to guide testing set selection, aiming to maximize the entropy and improve diversity of CST database. A graph representation constructed based on animal habitats, climate classification, and future climate scenarios, identifies biodiversity patterns in climate classifications and vulnerable hosts and viruses. Lastly, this dissertation introduces a knowledge graph-based CST information system for question answering (QA) using RAG, comparing multiple external database architectures, including Knowledge graphs, node embeddings, and vector databases. The evaluation framework integrates reasoning, summarization, and hallucination detection using curated unanswerable queries. Through computational modeling and graph-based analysis of CSTs, it identifies potential missing links and delivers an accessible and accurate CST information framework, facilitating early detection of CST risks and improving preparedness for future emerging infectious diseases.

Description

Keywords

Graph analysis, computational biology, cross-species transmissions, computational epidemiology

Citation