Scholarly Works, Computer Science

Permanent URI for this collection

Research articles, presentations, and other scholarship

Browse

Recent Submissions

Now showing 1 - 20 of 718
  • Evidence of horizontal gene transfer and environmental selection impacting antibiotic resistance evolution in soil-dwelling Listeria
    Goh, Ying-Xian; Anupoju, Sai Manohar Balu; Nguyen, Anthony; Zhang, Hailong; Ponder, Monica A.; Krometis, Leigh-Anne H.; Pruden, Amy; Liao, Jingqiu (Nature Research, 2024-11-19)
    Soil is an important reservoir of antibiotic resistance genes (ARGs) and understanding how corresponding environmental changes influence their emergence, evolution, and spread is crucial. The soil-dwelling bacterial genus Listeria, including L. monocytogenes, the causative agent of listeriosis, serves as a keymodel for establishing this understanding. Here, we characterize ARGs in 594 genomes representing 19 Listeria species that we previously isolated from soils in natural environments across the United States. Among the five putatively functional ARGs identified, lin,which confers resistance to lincomycin, is the most prevalent, followed by mprF, sul, fosX, and norB. ARGs are predominantly found in Listeria sensu stricto species, with those more closely related to L. monocytogenes tending to harbor more ARGs. Notably, phylogenetic and recombination analyses provide evidence of recent horizontal gene transfer (HGT) in all five ARGs within and/or across species, likelymediated by transformation rather than conjugation and transduction. In addition, the richness and genetic divergence of ARGs are associated with environmental conditions, particularly soil properties (e.g., aluminum and magnesium) and surrounding land use patterns (e.g., forest coverage). Collectively, our data suggest that recent HGT and environmental selection play a vital role in the acquisition and diversification of bacterial ARGs in natural environments.
  • Red is Sus: Automated Identification of Low-Quality Service Availability Claims in the US National Broadband Map
    Nabi, Syed Tauhidun; Wen, Zhuowei; Ritter, Brooke; Hasan, Shaddi (ACM, 2024-11-04)
    The FCC’s National Broadband Map aspires to provide an unprecedented view into broadband availability in the US. However, this map, which also determines eligibility for public grant funding, relies on self-reported data from service providers that in turn have incentives to strategically misrepresent their coverage. In this paper, we develop an approach for automatically identifying these low-quality service claims in the National Broadband Map. To do this, we develop a novel dataset of broadband availability consisting of 750k observations from more than 900 US ISPs, derived from a combination of regulatory data and crowdsourced speed tests. Using this dataset, we develop a model to classify the accuracy of service provider regulatory filings and achieve AUCs over 0.98 for unseen examples. Our approach provides an effective technique to enable policymakers, civil society, and the public to identify portions of the National Broadband Map that are likely to have integrity challenges.
  • Technology Use in the Black Church: Perspectives of Black Church Leaders Preliminary Findings
    Thompson, Gabriella; Otoo, Nissi; Fisher, Jaden; Sibi, Irene; Smith, Angela; Ogbonnaya-Ogburu, Ihudiya (ACM, 2024-11-11)
    Historically, the Black church has played a pivotal role in civic engagement and social justice, and continues to do so today. Yet, few researchers have explored how decisions around technology use are made in the church. To address this gap, we conducted semi-structured interviews with five Black church leaders to understand how church leaders interact with digital technologies, both in general and specifically with the communities that they serve. We found that while Black Church leaders are eager to engage with technology, most of the engagement with outside communities is through in-person contact; opportunities to give online have a financial penalty in comparison to traditional methods of tithing and donating; lastly, technology use within outreach and ministries is highly dependent by ministry leaders – many whom volunteer their time.We contribute to research that focuses on technology use in religious organizations and community engagement of community-based organizations.
  • Designing Technology to Support the Hospital Classroom: Preliminary Findings
    Rasberry, Nadra; Essandoh, Joshua; Do, Ethan; Ogbonnaya-Ogburu, Ihudiya (ACM, 2024-11-11)
    Hospital teachers are state-employed educators who provide K-12 instruction to children in the hospital. We conducted research to understand how technology is used in hospital classrooms, an area which has been relatively underexplored. We conducted semistructured interviews with five hospital teachers to understand their experience of using technology in and outside the classroom. Our findings revealed that hospital teachers often rely on older curricula given the changing education atmosphere; learning is often assessed through in-classroom observations of mastery; and technology and internet use by students is often restricted, which may inhibit opportunities to use AI and other technical resources in the classroom.We contribute a deeper understanding of technology use in the hospital classroom.
  • Evaluation of Interactive Demonstration in Voice-assisted Counting for Young Children
    Karunaratna, Sulakna; Vargas-Diaz, Daniel; Kim, Jisun; Wang, Jenny; Choi, Koeun; Lee, Sang Won (ACM, 2024-11-11)
    In recent years, the number of AI voice agent applications designed to help young children learn math has increased. However, the impact of interactivity within these applications on children’s learning and engagement remains unexplored. While current apps may employ various levels of interactions, such as visual, haptic, sound, and animation, the efficacy of these interactions in facilitating children’s learning remains uncertain. This research investigates how varying levels of interactivity in touch-based interfaces, combined with an AI voice agent, affect the learning of counting skills in children aged 2 to 4 years.We examine three conditions: baseline (no demonstration), animated demonstration, and interactive demonstration. By examining how these different levels of interactivity influence children’s engagement with math apps, this study seeks to enhance our understanding of effective design strategies for educational technology targeting early childhood education. The findings of this research hold the potential to inform the development of interfaces for math games that leverage both touch-based interactions and AI voice assistants to support young children’s learning of foundational mathematical concepts.
  • Investigating Characteristics of Media Recommendation Solicitation in r/ifyoulikeblank
    Bhuiyan, Md Momen; Hu, Donghan; Jelson, Andrew; Mitra, Tanushree; Lee, Sang Won (ACM, 2024-11-08)
    Despite the existence of search-based recommender systems like Google, Netflix, and Spotify, online users sometimes may turn to crowdsourced recommendations in places like the r/ifyoulikeblank subreddit. In this exploratory study, we probe why users go to r/ifyoulikeblank, how they look for recommendation, and how the subreddit users respond to recommendation requests. To answer, we collected sample posts from r/ifyoulikeblank and analyzed them using a qualitative approach. Our analysis reveals that users come to this subreddit for various reasons, such as exhausting popular search systems, not knowing what or how to search for an item, and thinking crowd have better knowledge than search systems. Examining users query and their description, we found novel information users provide during recommendation seeking using r/ifyoulikeblank. For example, sometimes they ask for artifacts recommendation based on the tools used to create them. Or, sometimes indicating a recommendation seeker's time constraints can help better suit recommendations to their needs. Finally, recommendation responses and interactions revealed patterns of how requesters and responders refine queries and recommendations. Our work informs future intelligent recommender systems design.
  • Simplify, Consolidate, Intervene: Facilitating Institutional Support with Mental Models of Learning Management System Use
    Hassan, Taha; Edmison, Bob; Williams, Daron; Cox II, Larry; Louvet, Matthew; Knijnenburg, Bart; McCrickard, D. (ACM, 2024-11-08)
    Measuring instructors' adoption of learning management system (LMS) tools is a critical first step in evaluating the efficacy of online teaching and learning at scale. Existing models for LMS adoption are often qualitative, learner-centered, and difficult to leverage towards institutional support. We propose depth-of-use (DOU): an intuitive measurement model for faculty's utilization of a university-wide LMS and their needs for institutional support. We hypothesis-test the relationship between DOU and course attributes like modality, participation, logistics, and outcomes. In a large-scale analysis of metadata from 30000+ courses offered at Virginia Tech over two years, we find that a pervasive need for scale, interoperability and ubiquitous access drives LMS adoption by university instructors. We then demonstrate how DOU can help faculty members identify the opportunity-cost of transition from legacy apps to LMS tools. We also describe how DOU can help instructional designers and IT organizational leadership evaluate the impact of their support allocation, faculty development and LMS evangelism initiatives.
  • ThreatKG: An AI-Powered System for Automated Open-Source Cyber Threat Intelligence Gathering and Management
    Gao, Peng; Liu, Xiaoyuan; Choi, Edward; Ma, Sibo; Yang, Xinyu; Song, Dawn (ACM, 2023-11-19)
    Open-source cyber threat intelligence (OSCTI) has become essential for keeping up with the rapidly changing threat landscape. However, current OSCTI gathering and management solutions mainly focus on structured Indicators of Compromise (IOC) feeds, which are lowlevel and isolated, providing only a narrow view of potential threats. Meanwhile, the extensive and interconnected knowledge found in the unstructured text of numerous OSCTI reports (e.g., security articles, threat reports) available publicly is still largely underexplored. To bridge the gap, we propose THREATKG, an automated system for OSCTI gathering and management. THREATKG efficiently collects a large number of OSCTI reports from multiple sources, leverages specialized AI-based techniques to extract high-quality knowledge about various threat entities and their relationships, and constructs and continuously updates a threat knowledge graph by integrating new OSCTI data. THREATKG features a modular and extensible design, allowing for the addition of components to accommodate diverse OSCTI report structures and knowledge types. Our extensive evaluations demonstrate THREATKG’s practical effectiveness in enhancing threat knowledge gathering and management.
  • Editorial: ACM Transactions on Computer Systems
    van Renesse, Robbert; Noh, Sam H. (ACM, 2024-11-22)
  • FedCaSe: Enhancing Federated Learning with Heterogeneity-aware Caching and Scheduling
    Khan, Redwan Ibne Seraj; Paul, Arnab K.; Jian, Xun (Steve); Cheng, Yue; Butt, Ali R. (ACM, 2024-11-20)
    Federated learning (FL) has emerged as a new paradigm of machine learning (ML) with the goal of collaborative learning on the vast pool of private data available across distributed edge devices. The focus of most existing works in FL systems has been on addressing the challenges of computation and communication heterogeneity inherent in training with edge devices. However, the crucial impact of I/O and the role of limited on-device storage has not been explored fully in FL context. Without policies to exploit the on-device storage for placement of client data samples, and schedule clients based on I/O benefits, FL training can lead to inefficiencies, such as increased training time and impacted accuracy convergence. In this paper, we propose FedCaSe, a framework for efficiently caching client samples in-situ on limited on-device storage and scheduling client participation. FedCaSe boosts the I/O performance by exploiting a unique characteristic— the experience, i.e., relative impact on overall performance, of data samples and clients. FedCaSe utilizes this information in adaptive caching policies for sample placement inside the limited memory of edge clients. The framework also exploits the experience information to orchestrate the future selection of clients. Our experiments with representative workloads and policies show that compared to the state of the art, FedCaSe improves the training time by 2.06× for accuracy convergence at the scale of thousands of clients.
  • A Survey of Prototyping Platforms for Intermittent Computing Research
    Williams, Harrison; Hicks, Matthew (ACM, 2024-11-04)
    Batteryless energy harvesting platforms are gaining popularity as a way to bring next-generation sensing and edge computing devices to deployments previously limited by their need for batteries. Energy harvesting enables perpetual, maintenance-free operation, but also introduces new challenges associated with unreliable environmental power as systems face common-case, yet unpredictable power failures. Software execution on these devices is an active area of research: intermittently executed software must correctly and efficiently handle arbitrary interruption, frequent state saving/ restoration, and re-execution of certain code segments as part of a normal operation. The wide application range for batteryless systems combined with strict limitations on size and performance means there is little overlap in batteryless system prototypes— platforms are chosen for familiarity or specific features in a given application. Unfortunately, the effectiveness of different intermittent computing approaches varies widely across devices. As a result, intermittent computing research is at best hard to generalize across platforms and at worst contradictory across studies. This work explores several of the device-level differences that substantially affect intermittent system performance across eight low-power prototyping platforms. We examine system-level assumptions made by the major approaches to intermittent computing today and determine how compatible each approach is with each platform. The goal of this paper is to serve as a guide for researchers and practitioners developing intermittent systems to both understand the landscape of devices suitable for batteryless operation and to highlight how interactions between devices and the intermittent software running on them can profoundly affect both performance and high-level conclusions in intermittent systems research.We open source our device bring-up code and instructions to facilitate multi-board experiments for future approaches.
  • Optimizing Effectiveness and Defense of Drone Surveillance Missions via Honey Drones
    Wan, Zelin; Cho, Jin-Hee; Zhu, Mu; Anwar, Ahmed; Kamhoua, Charles; Singh, Munindar (ACM, 2024)
    This work aims to develop a surveillance mission system using unmanned aerial vehicles (UAVs) or drones when Denial-of-Service (DoS) attacks are present to disrupt normal operations for mission systems. In particular, we introduce the concept of cyber deception using honey drones (HDs) to protect the mission system from DoS attacks. HDs exhibit fake vulnerabilities and employ stronger signal strengths to lure DoS attacks, unlike the legitimate drones called mission drones (MDs) deployed for mission execution. This research formulates an optimization problem to identify an optimal set of signal strengths of HDs and MDs to best prevent the system from DoS attacks while maximizing mission performance under the resource constraints of UAVs. To solve this optimization problem, we leverage deep reinforcement learning (DRL) to achieve these multiple objectives of the mission system concerning system security and performance. Particularly, for efficient and effective parallel processing in DRL, we utilize a DRL algorithm called the Asynchronous Advantage Actor-Critic (A3C) algorithm to model attack-defense interactions. We employ a physical engine-based simulation testbed to consider realistic scenarios and demonstrate valid findings from the realistic testbed. The extensive experiments proved that our HD-based approach could achieve up to a 32% increase in mission completion, a 20% reduction in energy consumption, and a 62% decrease in attack success rates compared to existing defense strategies.
  • XplainScreen: Unveiling the Black Box of Graph Neural Network Drug Screening Models with a Unified XAI Framework
    Ahn, Geonhee; Haque, Md Mahim Anjum; Hazarika, Subhashis; Kim, Soo Kyung (ACM, 2024-10-21)
    Despite the powerful capabilities of GNN-based drug screening model in predicting target drug properties, the black-box nature of these models poses a challenge for practical application, particularly in a field as critical as drug development where understanding and trust in AI-driven decisions are important. To address the interpretability issues associated with GNN-based virtual drug screening, we introduce XplainScreen: a unified explanation framework designed to evaluate various explanation methods for GNN-based models. XplainScreen offers a user-friendly, web-based interactive platform that allows for the selection of specific GNN-based drug screening models and multiple cutting-edge explainable AI methods. It supports both qualitative assessments (through visualization and generative text descriptions) and quantitative evaluations of these methods, utilizing drug molecules in SMILES format. This demonstration showcases the utility of XplainScreen through a user study with pharmacological researchers focused on virtual screening tasks based on toxicity, highlighting the framework’s potential to enhance the integrity and trustworthiness of AI-driven virtual drug screening. A video demo of XplainScreen is available at https://youtu.be/Q4yobrTLKec, and the source code can be accessed at https://github.com/GeonHeeAhn/XplainScreen.
  • Hermes: Boosting the Performance of Machine-Learning-Based Intrusion Detection System through Geometric Feature Learning
    Zhang, Chaoyu; Shi, Shanghao; Wang, Ning; Xu, Xiangxiang; Li, Shaoyu; Zheng, Lizhong; Marchany, Randy; Gardner, Mark; Hou, Y. Thomas; Lou, Wenjing (ACM, 2024-10-14)
    Anomaly-Based Intrusion Detection Systems (IDSs) have been extensively researched for their ability to detect zero-day attacks. These systems establish a baseline of normal behavior using benign traffic data and flag deviations from this norm as potential threats. They generally experience higher false alarm rates than signature-based IDSs. Unlike image data, where the observed features provide immediate utility, raw network traffic necessitates additional processing for effective detection. It is challenging to learn useful patterns directly from raw traffic data or simple traffic statistics (e.g., connection duration, package inter-arrival time) as the complex relationships are difficult to distinguish. Therefore, some feature engineering becomes imperative to extract and transform raw data into new feature representations that can directly improve the detection capability and reduce the false positive rate. We propose a geometric feature learning method to optimize the feature extraction process. We employ contrastive feature learning to learn a feature space where normal traffic instances reside in a compact cluster. We further utilize H-Score feature learning to maximize the compactness of the cluster representing the normal behavior, enhancing the subsequent anomaly detection performance. Our evaluations using the NSL-KDD and N-BaloT datasets demonstrate that the proposed IDS powered by feature learning can consistently outperform state-of-the-art anomaly-based IDS methods by significantly lowering the false positive rate. Furthermore, we deploy the proposed IDS on a Raspberry Pi 4 and demonstrate its applicability on resource-constrained Internet of Things (IoT) devices, highlighting its versatility for diverse application scenarios.
  • VizGroup: An AI-assisted Event-driven System for Collaborative Programming Learning Analytics
    Tang, Xiaohang; Wong, Sam; Pu, Kevin; Chen, Xi; Yang, Yalong; Chen, Yan (ACM, 2024-10-13)
    Programming instructors often conduct collaborative learning activities, like Peer Instruction, to foster a deeper understanding in students and enhance their engagement with learning. These activities, however, may not always yield productive outcomes due to the diversity of student mental models and their ineffective collaboration. In this work, we introduce VizGroup, an AI-assisted system that enables programming instructors to easily oversee students’ real-time collaborative learning behaviors during large programming courses. VizGroup leverages Large Language Models (LLMs) to recommend event specifications for instructors so that they can simultaneously track and receive alerts about key correlation patterns between various collaboration metrics and ongoing coding tasks. We evaluated VizGroup with 12 instructors in a comparison study using a dataset collected from a Peer Instruction activity that was conducted in a large programming lecture. The results showed that VizGroup helped instructors effectively overview, narrow down, and track nuances throughout students’ behaviors.
  • Evaluating Layout Dimensionalities in PC+VR Asymmetric Collaborative Decision Making
    Enriquez, Daniel; Tong, Wai; North, Christopher L.; Qu, Huamin; Yang, Yalong (ACM, 2024-10-20)
    With the commercialization of virtual/augmented reality (VR/AR) devices, there is an increasing interest in combining immersive and non-immersive devices (e.g., desktop computers) for asymmetric collaborations. While such asymmetric settings have been examined in social platforms, significant questions around layout dimensionality in data-driven decision-making remain underexplored. A crucial inquiry arises: although presenting a consistent 3D virtual world on both immersive and non-immersive platforms has been a common practice in social applications, does the same guideline apply to lay out data? Or should data placement be optimized locally according to each device's display capacity? This study aims to provide empirical insights into the user experience of asymmetric collaboration in data-driven decision-making. We tested practical dimensionality combinations between PC and VR, resulting in three conditions: PC2D+VR2D, PC2D+VR3D, and PC3D+VR3D. The results revealed a preference for PC2D+VR3D, and PC2D+VR2D led to the quickest task completion. Our investigation facilitates an in-depth discussion of the trade-offs associated with different layout dimensionalities in asymmetric collaborations.
  • Causality-Aware Spatiotemporal Graph Neural Networks for Spatiotemporal Time Series Imputation
    Jing, Baoyu; Zhou, Dawei; Ren, Kan; Yang, Carl (ACM, 2024-10-21)
    Spatiotemporal time series are usually collected via monitoring sensors placed at different locations, which usually contain missing values due to various failures, such as mechanical damages and Internet outages. Imputing the missing values is crucial for analyzing time series. When recovering a specific data point, most existing methods consider all the information relevant to that point regardless of the cause-and-effect relationship. During data collection, it is inevitable that some unknown confounders are included, e.g., background noise in time series and non-causal shortcut edges in the constructed sensor network. These confounders could open backdoor paths and establish non-causal correlations between the input and output. Over-exploiting these non-causal correlations could cause overfitting. In this paper, we first revisit spatiotemporal time series imputation from a causal perspective and show how to block the confounders via the frontdoor adjustment. Based on the results of frontdoor adjustment, we introduce a novel Causality- Aware Spatiotemporal Graph Neural Network (Casper), which contains a novel Prompt Based Decoder (PBD) and a Spatiotemporal Causal Attention (SCA). PBD could reduce the impact of confounders and SCA could discover the sparse causal relationships among embeddings. Theoretical analysis reveals that SCA discovers causal relationships based on the values of gradients. We evaluate Casper on three real-world datasets, and the experimental results show that Casper could outperform the baselines and could effectively discover the causal relationships.
  • An Exploratory Mixed-methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software
    Franke, Lucas; Liang, Huayu; Farzanehpour, Sahar; Brantly, Aaron F.; Davis, James C.; Brown, Chris (ACM, 2024-10-24)
    Background: Governments worldwide are considering data privacy regulations. These laws, such as the European Union’s General Data Protection Regulation (GDPR), require software developers to meet privacy-related requirements when interacting with users’ data. Prior research describes the impact of such laws on software development, but only for commercial software. Although opensource software is commonly integrated into regulated software, and thus must be engineered or adapted for compliance, we do not know how such laws impact open-source software development. Aims: To understand how data privacy laws affect open-source software (OSS) development, we focus on the European Union’s GDPR, as it is the most prominent such law. We investigated how GDPR compliance activities influence OSS developer activity (RQ1), how OSS developers perceive fulfilling GDPR requirements (RQ2), the most challenging GDPR requirements to implement (RQ3), and how OSS developers assess GDPR compliance (RQ4). Method:We distributed an online survey to explore perceptions of GDPR implementations from open-source developers (N=56). To augment this analysis, we further conducted a repository mining study to analyze development metrics on pull requests (N=31,462) submitted to open-source GitHub repositories. Results: Our results suggest GDPR policies complicate OSS development and introduce challenges, primarily regarding the management of users’ data, implementation costs and time, and assessments of compliance. Moreover, we observed negative perceptions of the GDPR from OSS developers and significant increases in development activity, in particular metrics related to coding and reviewing, on GitHub pull requests related to GDPR compliance. Conclusions: Our findings provide future research directions and implications for improving data privacy policies, motivating the need for relevant resources and automated tools to support data privacy regulation implementation and compliance efforts in OSS.
  • Goldilocks Zoning: Evaluating a Gaze-Aware Approach to Task-Agnostic VR Notification Placement
    Ilo, Cory; DiVerdi, Stephen; Bowman, Douglas A. (ACM, 2024-10-07)
    While virtual reality (VR) offers immersive experiences, users need to remain aware of notifications from outside VR. However, inserting notifications into a VR experience can result in distraction or breaks in presence, since existing notification systems in VR use static placement and lack situational awareness. We address this challenge by introducing a novel notification placement technique, Goldilocks Zoning (GZ), which leverages a 360-degree heatmap generated using gaze data to place notifications near salient areas of the environment without obstructing the primary task. To investigate the effectiveness of this technique, we conducted a dualtask experiment comparing GZ to common notification placement techniques. We found that GZ had similar performance to state-ofthe- art techniques in a variety of primary task scenarios. Our study reveals that no single technique is universally optimal in dynamic settings, underscoring the potential for adaptive approaches to notification management. As a step in this direction, we explored the potential to use machine learning to predict the task based on the gaze heatmap.
  • Breaking Privacy in Model-Heterogeneous Federated Learning
    Haldankar, Atharva; Riasi, Arman; Nguyen, Hoang-Dung; Phuong, Tran; Hoang, Thang (ACM, 2024-09-30)
    Federated learning (FL) allows multiple distrustful clients to collaboratively train a machine learning model. In FL, data never leaves client devices; instead, clients only share locally computed gradients with a central server. As individual gradients may leak information about a given client’s dataset, secure aggregation was proposed. With secure aggregation, the server only receives the aggregate gradient update from the set of all sampled clients without being able to access any individual gradient. One challenge in FL is the systemslevel heterogeneity that is quite often present among client devices. Specifically, clients in the FL protocol may have varying levels of compute power, on-device memory, and communication bandwidth. These limitations are addressed by model-heterogeneous FL schemes, where clients are able to train on subsets of the global model. Despite the benefits of model-heterogeneous schemes in addressing systems-level challenges, the implications of these schemes on client privacy have not been thoroughly investigated. In this paper, we investigate whether the nature of model distribution and the computational heterogeneity among client devices in model-heterogeneous FL schemes may result in the server being able to recover sensitive data from target clients. To this end, we propose two attacks in the model-heterogeneous FL setting, even with secure aggregation in place. We call these attacks the Convergence Rate Attack and the Rolling Model Attack. The Convergence Rate Attack targets schemes where clients train on the same subset of the global model, while the Rolling Model Attack targets schemes where model parameters are dynamically updated each round. We show that a malicious adversary can compromise the model and data confidentiality of a target group of clients. We evaluate our attacks on the MNIST and CIFAR-10 datasets and show that using our techniques, an adversary can reconstruct data samples with near perfect accuracy for batch sizes of up to 20 samples.