VTechWorks
VTechWorks provides global access to Virginia Tech scholarship, including journal articles, books, theses, dissertations, conference papers, slide presentations, technical reports, working papers, administrative documents, videos, images, and more by faculty, students, and staff. Faculty can deposit items to VTechWorks from Elements, including journal articles covered by the University open access policy. Email vtechworks@vt.edu for help.
Communities in VTechWorks
Select a community to browse its collections.
Recent Submissions
An Investigation of the Interplay Among CX3CR1, Immune Cells, and Gut Microbiota in Lupus-Associated Arthritis and Renal Disease using the MRL/lpr Mouse
Estaleen, Rana (Virginia Tech, 2026-01-09)
Discovering Viral Hosts, Mutations, and Diseases using Machine Learning
Antony, Blessy (Virginia Tech, 2026-01-09)
The discovery of a novel virus raises three important questions, namely, which host(s) can the virus infect, what mutations in the virus could affect its interaction with its hosts and enable a host-shift, and which diseases can the virus cause in humans. We propose novel machine learning (ML)-based solutions to these three different problems in computational virology.
(i) We develop a viral protein language model for predicting the host infected by a virus, given only the sequence of one of its proteins. Our approach, 'Hierarchical Attention for Viral protEin-based host iNference (HAVEN)', includes a novel architecture comprising segmentation and hierarchical self-attention to tackle the challenges posed by long sequences. Pretrained on 1.2 million viral protein sequences, the model accepts any protein sequence of any virus and predicts its host. We integrate HAVEN with a prototype-based few-shot learning (FSL) classifier to generalize it to predict rare and unseen hosts, and hosts of unseen viruses.
(ii) Structured datasets of known viral mutations and their effects are required to develop computational models that can predict potential detrimental changes in novel animal viruses. We leverage large language models (LLMs) to create these datasets from the results of experimental studies available as unstructured text in scientific literature. We design an open-ended task for 'scientific information extraction (SIE)' from publications and propose a unique two-step retrieval augmented generation (RAG) framework for the same. We curate a novel dataset of mutations in influenza A viral proteins. We use this dataset to benchmark our proposed approach, a wide range of LLMs, RAG-, and agent-based tools for SIE.
(iii) Finally, we look at the effects of viral infections in humans. Specifically, we focus on the long-term effects of SARS-CoV-2 (or long COVID) wherein patients experience the persistence of COVID-19 symptoms for a long period of time after their initial SARS-CoV-2 infection. We propose an ML-based classification pipeline to predict the diagnosis of long COVID in COVID-19 patients using their electronic health records (EHRs) in the National COVID Cohort Collaborative, which is the largest collection of clinical data across the US. Using techniques to explain our models' prediction for each patient, we uncover many features that were correlated with long COVID. We also evaluate the impact of different data sources on our long COVID prediction models using a novel a cross-site analysis.
Perceptions of Leader Development Programming by College Students with Introverted Personalities
Martin, Perry Douglas (Virginia Tech, 2026-01-09)
This is a qualitative study on the perceptions of leadership development programming by students who identified as more introverted than their peers. The study examined the self-efficacy of these students towards leadership and the contributing factors to the achievement of their efficacy to be a leader. Conducted at a Research I, land-grant institution, the study consisted of interviews with students who identify as more introverted than their peers. Interviews allowed the researcher to examine their experiences and attitudes towards their own leadership development. The purpose of the study was to better understand the concept of leadership efficacy in the context of introverted student experiences. Findings from the study highlighted the importance of close relationships as a source of vicarious learning, verbal encouragement, and as a steadying influence on emotional well-being for introverted students developing as leaders. Students value teaching as an optimal model for leadership. As they navigated the rigors of serving in leadership roles in college, students looked to close relationships and regular practices of self-care to mitigate the impacts of stress on their energy. This study contributes to the body of knowledge on the understanding of personality and leadership development, specifically how self-efficacy is manifested in those with an introverted personality.
Understanding Tradeoffs of Replicated Data Library Integration Strategies in Multilingual Environments
Mondal, Provakar; Tilevich, Eli (ACM, 2025-12-15)
Modern distributed systems replicate data across multiple execution sites by means of special-purpose replicated data libraries (RDLs), which provide read-write data access and synchronization. Programming languages often need to be mixed across replica sites to meet business requirements and resource constraints. Because RDLs are typically written in a single language, integrating them in multilingual environments requires special-purpose code, whose characteristics are poorly understood. We aim to bridge this knowledge gap by reviewing two key strategies for integrating RDLs in multilingual environments: (1) foreign-function interface (FFI) and (2) common data format (CDF). Our preliminary results indicate performance and implementation tradeoffs: CDF offers latency and memory consumption advantages, while incurring an additional implementation burden. With modern distributed systems utilizing multiple languages, our findings can inform the design of RDLs for multilingual replicated data systems.
Toward Thorough and Practical Integration Testing of Replicated Data Systems
Mondal, Provakar (ACM, 2025-12-15)
Highly available applications rely on replicated data, but complex event interleavings between application logic and replicated data libraries (RDLs) often cause subtle integration bugs. Detecting such bugs is challenging due to the inherent nondeterminism of distributed execution, as certain bugs can only manifest under specific interleavings. Correctness testing, therefore, requires replaying all possible interleavings—a challenging task due to the combinatorial explosion of the interleaving space. My doctoral dissertation addresses this challenge with ER-𝜋, a middleware framework that exercises all possible interleavings between the application code and RDL; it also eliminates redundant and impossible interleavings via novel pruning techniques. Initial results show that ER-𝜋 successfully reproduces 12 real-world bugs across multiple opensource RDLs while significantly reducing the interleaving search space. Our ongoing work extends this foundation with interleaving prioritization, ranking interleavings execution by their likelihood of exposing faults—particularly those introduced by recent code changes, thus accelerating bug discovery. This research supports developers responsible for ensuring the correctness and reliability of replicated data systems.


