Journal Articles, Association for Computing Machinery (ACM)

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 509
  • Top-Down Stochastic Block Partitioning: Turning Graph Clustering Upside Down
    Wanye, Frank; Gleyzer, Vitaliy; Kao, Edward; Feng, Wu-chun (ACM, 2025-07-20)
    Stochastic block partitioning (SBP) is a statistical inference-based algorithm for clustering vertices within a graph. It has been shown to be statistically robust and highly accurate even on graphs with a complex structure, but its poor scalability limits its usability to smaller-sized graphs. In this manuscript we argue that one reason for its poor scalability is the agglomerative, or bottom-up, nature of SBP’s algorithmic design; the agglomerative computations cause high memory usage and create a large search space that slows down statistical inference, particularly in the algorithm’s initial iterations. To address this bottleneck, we propose Top-Down SBP, a novel algorithm that replaces the agglomerative (bottom-up) block merges in SBP with a block-splitting operation. This enables the algorithm to start with all vertices in one cluster and subdivide them over time into smaller clusters. We show that Top-Down SBP is up to 7.7× faster than Bottom-Up SBP without sacrificing accuracy and can process larger graphs than Bottom-Up SBP on the same hardware due to an up to 4.1× decrease in memory usage. Additionally, we adapt existing methods for accelerating Bottom- Up SBP to the Top-Down approach, leading to up to 13.2× speedup over accelerated Bottom-Up SBP and up to 403× speedup over sequential Bottom-Up SBP on 64 compute nodes. Thus, Top-Down SBP represents substantial improvements to the scalability of SBP, enabling the analysis of larger datasets on the same hardware.
  • Can Large Language Models Predict Parallel Code Performance?
    Bolet, Gregory; Georgakoudis, Giorgis; Menon, Harshitha; Parasyris, Konstantinos; Hasabnis, Niranjan; Estes, Hayden; Cameron, Kirk; Oren, Gal (ACM, 2025-07-20)
    Accurate determination of the performance of parallel GPU code typically requires execution-time profiling on target hardware – an increasingly prohibitive step due to limited access to high-end GPUs. This paper explores whether Large Language Models (LLMs) can offer an alternative approach for GPU performance prediction without relying on hardware.We frame the problem as a roofline classification task: given the source code of a GPU kernel and the hardware specifications of a target GPU, can an LLM predict whether the GPU kernel is compute-bound or bandwidth-bound? For this study, we build a balanced dataset of 340 GPU kernels, obtained from HeCBench benchmark and written in CUDA and OpenMP, along with their ground-truth labels obtained via empirical GPU profiling. We evaluate LLMs across four scenarios: (1) with access to profiling data of the kernel source, (2) zero-shot with source code only, (3) few-shot with code and label pairs, and (4) finetuned on a small custom dataset. Our results show that state-of-theart LLMs have a strong understanding of the Roofline model, achieving 100% classification accuracy when provided with explicit profiling data. We also find that reasoning-capable LLMs significantly outperform standard LLMs in zero- and few-shot settings, achieving up to 64% classification accuracy of GPU source codes, without any profiling information. Lastly, we find that model accuracy does not benefit meaningfully from few-shot prompting compared to zero-shot, and that LLM fine-tuning will require much more data than what we currently have available. This work is among the first to use LLMs for source-level roofline performance prediction via classification, and illustrates their potential to guide optimization efforts when runtime profiling is infeasible. Our findings suggest that with better datasets and prompt strategies, LLMs could become practical tools for HPC performance analysis and performance portability. Code and datasets are publicly available at https: //github.com/Scientific-Computing-Lab/ParallelCodeEstimation.
  • A Generalized Web3D API for Metaverse Bookmarks
    Narra, Nikhil; Marisetty, Anuj; Polys, Nicholas; Sandbrook, Ben (ACM, 2025-09-09)
    Sharing identical 3D scene states across different platforms and user sessions poses practical limitations. Existing mechanisms such as X3D anchors, glTF camera tags, and proprietary URL-based encodings typically support only static, author-defined viewpoints or are constrained to a specific viewer implementation. This paper introduces a schema and API for a Web3D Bookmarking system that enables interoperable sharing of precise, user-generated 3D scene contexts. The proposed grammar encodes the complete scene state, including camera parameters, navigation settings, scene composition, and contextual metadata, into a compact descriptor suitable for embedding in hyperlinks or other distribution methods. When accessed, the scene is re-instantiated with the same view and context as intended by the original sharer. We define the grammar using a formal JSON Schema and justify each parameter through a design rationale. Use cases include collaborative VR/AR environments, remote design reviews, and educational applications where reproducible scene context is critical. An initial implementation using X3D and X3DOM demonstrates feasibility. We also compare our approach with existing viewpoint sharing techniques and outline a structured evaluation plan to assess the utility and performance of the proposed system.
  • X3Test: A Headless Browser-Based Framework for Automated Performance Benchmarking of X3D/X3DOM Scenes
    Narra, Nikhil; Marisetty, Anuj; Polys, Nicholas; Sandbrook, Ben (ACM, 2025-09-09)
    We present X3Test, an automated testing and performance benchmarking framework designed to address the absence of dedicated tools for X3D scenes on the web. X3Test employs Node.js and Puppeteer to load X3D/X3DOM scenes in a headless browser, simulate user interactions, and collect runtime performance metrics like frame rate (FPS), render timings, and scene graph size. This is achieved by leveraging X3DOM’s runtime API, which allows data extraction without requiring modifications to the existing content. The framework offers both a command-line interface (CLI) and a programmatic API, enabling developers to specify scene URLs, trigger events such as animations or camera changes, and customize test parameters like duration. We evaluated X3Test across diverse scenarios, from simple static scenes and animated content to the complex "3D Roanoke" city model, demonstrating its capability to effectively capture performance variations linked to scene complexity and dynamic elements. X3Test provides a robust solution for systematic Web3D performance analysis, with future plans including RenderDoc support and the collection of more advanced metrics.
  • Pairwise BPF Programs Should Be Optimized Together
    Craun, Milo; Williams, Dan (ACM, 2025-09-08)
    BPF programs are extensively used for tracing and observability in production systems where performance overheads matter. Many individual BPF programs do not incur serious performance degrading overhead on their own, but increasingly more than a single BPF program is used to understand production system performance. BPF deployments have begun to look more like distributed applications; however, this is a mismatch with the underlying Linux kernel, potentially leading to high overhead cost. In particular, we identify that many BPF programs follow a pattern based on pairwise program deployment where entry and exit probes will be attached to measure a single quantity. We find that the pairwise BPF program pattern results in unnecessary overheads. We identify three optimizations—BPF program inlining, context aware optimization, and intermediate state internalization—that apply to pairwise BPF programs. We show that applying these optimizations to an example pairwise BPF program can reduce overhead on random read throughput from 28.13% to 8.98% and on random write throughput from 26.97% to 8.60%. We then examine some key design questions that arise when seeking to integrate optimizations with the existing BPF system.
  • SchedBPF - Scheduling BPF programs
    Shekar, Kavya; Williams, Dan (ACM, 2025-09-08)
    The Linux BPF framework enables the execution of verified custom bytecode in the critical path of various Linux kernel routines, allowing for efficient in-kernel extensions. The safety properties and low execution overhead of BPF programs have led to advancements in kernel extension use-cases that can be broadly categorized into tracing, custom kernel policies, and application acceleration. However, BPF is fundamentally event-driven and lacks native support for periodic or continuous tasks such as background tracing, metric aggregation, or kernel housekeeping. Existing approaches such as kernel modules with kthreads, userspace daemons, or BPF timers fail to satisfy all the essential requirements for periodic kernel extensions such as fine-grained CPU control, kernel safety, and minimal overhead. To address this gap, we propose SchedBPF — a conceptual framework that enables periodic execution of BPF programs on kernel threads. SchedBPF program executions are sandboxed and preemptible, as governed by the existing BPF verifier and JIT engine. They also adopt time-slice semantics, cgroup-style CPU quotas, and nice-level priority control, similar to kernel threads. SchedBPF aims to enable low-overhead, periodic execution of safe BPF code with fine-grained CPU resource management.
  • BPFflow - Preventing information leaks from eBPF
    Dimobi, Chinecherem; Tiwari, Rahul; Ji, Zhengjie; Williams, Dan (ACM, 2025-09-08)
    eBPF has seen major industry adoption by enterprises to enhance observability, tracing, and monitoring by hooking at different points in the kernel. However, since the kernel is a critical resource, eBPF can also pose as a threat if misused, potentially leading to privilege escalation, information leaks and more. While effective to some extent, existing mitigation strategies like interface filtering are coarse-grained and often over-restrictive. We propose BPFflow, a flexible framework for the system administrator to define policies that specify sensitive data sources, trusted sinks and permitted flows between them. These policies are enforced by an Information Flow Control (IFC) system within the eBPF verifier to track the propagation of sensitive data to prevent unauthorized leakage to userspace or any other untrusted sinks without any runtime overhead.
  • Modeling and Simulation of Trapped Ion Quantum Repeaters and Networks
    Jain, Charu; Chan, Chuen Hei; Kissel, Ezra; Wu, Wenji; Monga, Inder (ACM, 2025-09-08)
    This paper explores the design and implementation of trapped-ion quantum repeaters and networks using modeling and simulation. We aim to quantitatively understand the practical architecture design and resource requirements of trapped-ion entanglement-based quantum repeater paradigms. Our simulation results explore entanglement rate and fidelity as key performance metrics, and we discuss the major challenges for practical deployment of quantum networks and future directions for research and development in order to meet these challenges.
  • Modeling and Simulation of All-photonic Quantum Repeaters and Networks
    Chan, Chuen Hei; Jain, Charu; Kissel, Ezra; Wu, Wenji; Barnes, Edwin; Economou, Sophia E.; Monga, Inder (ACM, 2025-09-08)
    This paper explores the design and implementation of all-photonic quantum repeaters and networks using modeling and simulation. We aim to quantitatively understand the practical architecture design and resource requirements of all-photonic entanglement-based quantum repeater paradigms.
  • Adaptive and Efficient Dynamic Memory Management for Hardware Enclaves
    Dhanraj, Vijay; Chawla, Harpreet; Manila, Daniel; Schneider, Eric; Fu, Erica; Porter, Donald; Tsai, Chia-Che; Vij, Mona (ACM, 2025-09-08)
    The second version of Intel® Software Guard Extensions (Intel SGX), or SGX2, adds dynamic management of enclave memory and threads. The first version required the address space and thread counts to be fixed before execution. The Enclave Dynamic Memory Management (EDMM) feature of SGX2 has the potential to lower launch times and overall execution time. Despite reducing the enclave loading time by 28–93%, straightforward EDMM adoption strategies actually slow execution time down by as much as 58%. Using the Gramine library OS as a representative enclave runtime environment, this paper shows how to recover EDMM performance. The paper explains how implementing mutual distrust between the OS and enclave increases the cost of modifying page mappings. The paper then describes a series of optimizations and, using application benchmarks, shows that these optimizations effectively eliminate the overheads of EDMM while retaining EDMM’s performance and flexibility gains.
  • Teaching K-8 Students Computer Science Using Digital Storybooks
    Deverin, Thomas; Hamouda, Sally (ACM, 2025-08-03)
    This study introduces CodeKids, a novel web-based platform to teach computer science to elementary and middle school-aged students using interactive digital storybooks. Several books on the CodeKids website were created to target common misconceptions young students form when learning foundational computer science concepts, including variables, conditionals, and loops. The primary objectives of this research are to evaluate student engagement and perceptions of the storybook format and determine if the books help prevent common computer science misconceptions for K-8 students.
  • Designing Answer-Aware LLM Hints to Scaffold Deeper Learning in K-12 Programming Education
    Bhaskar, Sahana; Hamouda, Sally (ACM, 2025-08-03)
    Many K–12 students struggle with programming concepts. While LLMs offer scalable, timely support, overly direct answers can reduce reasoning and engagement [8], prompting the question: How can LLMs support learning without encouraging overreliance? In our study with 105 students, 31.4% showed misconceptions about variable assignment and data types, and in another survey, only 20% correctly solved conditional problems. This highlights the need for scaffolding to address conceptual gaps in K–12 programming. To address these gaps, we designed an answer-aware hint generation system using LLMs to support learning without reducing cognitive demand.We developed the system for CodeKids—an opensource, curriculum-aligned platform built with Virginia Tech and local public schools. It helps students practice grade-level programming through interactive activities, using LLM-generated hints to guide thinking without revealing answers [1, 11]. Based on Vygotsky’s Zone of Proximal Development [12], our approach balances support and autonomy through structured prompting that preserves productive struggle.
  • Teaching and Learning AI in K-12 Informatics Education
    Barendsen, Erik; Lonati, Violetta; Quille, Keith; Altin, Rukiye; Divitini, Monica; Hooshangi, Sara; Karnalim, Oscar; Kiesler, Natalie; Melton, Madison; Suero Montero, Calkin; Morpurgo, Anna (ACM, 2024-12)
    With the diffuse and very rapidly evolving presence of Artificial Intelligence (AI) in so many aspects of our everyday life, the question of at what age and how to start preparing citizens for the AI revolution has become urgent. Moreover, AI in general, and the recent and easy availability for everyone of generative AI in particular, are raising pressing issues regarding education in terms of both opportunities and challenges. This working group research is an international and collective effort to reach deep but scalable insights into current trends, developments, and perspectives regarding AI in K-12 education. The focus is on AI as content matter as opposed to AI as enabler of digital technology for teaching and learning. This work complements earlier reviews of scientific literature on AI in education by sketching the bigger picture by putting progress in scientific research in the context of developments in educational practice. Drawing on a scoping literature review, interviews with educators, AI experts, and policymakers, and an analysis of institutional documents, we provide an overview of the current trends and gaps. The analyses revealed several gaps and opportunities. First, maintaining a conceptual distinction between teaching with AI, with AI as enabler of digital tools for education, and teaching about AI, with AI as content matter, is crucial. Second, in spite of the abundance of interventions and tools in literature and documents, there is a lack of fundamental pedagogical knowledge about the underlying mechanisms informing teaching and learning of AI. Third, research and design efforts are required to find a proper balance between technical and socio-ethical aspects of AI in curricula. Fourth, there is a need for research-informed initiatives to support K-12 teachers’ preparation and continuous professional development on teaching and learning of AI. These findings give rise to the formulation of four grand challenges as inspiration for future research and policy developments
  • From Transients to Flips: Hardware-level Bit Manipulation of In-Vehicle Serial Communication
    Mohammed, Abdullah Zubair; Gerdes, Ryan M. (ACM, 2025-08)
    In a modern automobile, the in-vehicle communication network interconnects multiple subsystems, including those that perform safety-critical functions such as engine control, anti-lock braking, and airbag deployment, among many others. Therefore, the loss of data integrity in the network can have serious consequences for the safety of the vehicle. To that extent, CAN protocol, the most common in-vehicle communication standard, employs error-handling mechanisms such as bit-monitoring and cyclic-redundancy check to detect intentional or unintentional data manipulation. In this work, we exploit the transmission line nature of the CAN physical layer (a twisted pair cable) to induce voltage transients that result in bit manipulations. Specifically, we demonstrate bidirectional bit flip attacks, recessive to dominant (R→D) and dominant to recessive (D→R) with the aid of multiple compromised nodes (electronic control units) in the network. In addition, both the attacks, the simpler R→D, and the complex D→R are designed to be undetectable to the aforementioned error-handling mechanisms. The attacks become effective for distances ≥ 4m for D→R and ≥ 1m for R→D between the transmitter and receiver nodes. By demonstrating these bit flips, we challenge two fundamental physical layer assumptions of CAN: the impossibility of turning a dominant bit to recessive without an external current source, and having nonidentical signals on two nodes at the same time. The theory behind the attacks is presented, backed by circuit simulations, in-lab validations, and real-world demonstrations in a vehicle. These bit-level attacks, designed at the physical layer, circumvent software-based CAN defenses and lay the groundwork for a broader spectrum of potential attacks, including the manipulation of a data frame that we demonstrate.
  • Evaluating Capabilities and Perspectives of Generative AI Tools in Smart Contract Development
    Khalid, Shawal; Brown, Chris (ACM, 2025-08)
    Smart contracts, self-executing agreements on the blockchain, have emerged as a transformative force in blockchain technology, automating agreements and enabling decentralized applications. However, the stakes are extraordinarily high—manual coding errors in smart contracts have repeatedly led to financial losses. Notable incidents, such as the DAO hack that resulted in a loss of approximately $50 million [36] and the Parity wallet vulnerability that froze approximately $280 million in assets [45], underscore the immense economic risks involved. To support manual development tasks, recent advancements in artificial intelligence (AI) powered by large language models (LLMs) have transformed how software is developed and maintained by automating various software engineering tasks. This research explores the capabilities of generative AI tools for efficient and secure smart contract development. The methodology involves two phases: 1) we distribute a mixed methods survey for blockchain and smart contract developers (𝑛 = 114) to investigate their perspectives towards utilizing LLMs; and 2) we evaluate the effectiveness of generative AI tools, such as ChatGPT, Google Gemini, and ChainGPT, for smart contract development. This evaluation is based on comparing the LLM-generated smart contract code with human-written code, using a diverse dataset of smart contracts gathered from GitHub. Static analysis tools and unit testing are employed to validate the accuracy, correctness, efficiency, and security of the generated code. Our findings highlight the potential of these tools to accelerate smart contract development processes, while also emphasizing the need for human oversight, contributing to the advancement of blockchain technology and its applications.
  • EVINET: Evidential Reasoning Network for Resilient Graph Learning in the Open and Noisy Environments
    Guan, Weijie; Wang, Haohui; Kang, Jian; Liu, Lihui; Zhou, Dawei (ACM, 2025-08-03)
    Graph learning has been crucial to many real-world tasks, but they are often studied with a closed-world assumption, with all possible labels of data known a priori. To enable effective graph learning in an open and noisy environment, it is critical to inform the model users when the model makes a wrong prediction to in-distribution data of a known class, i.e., misclassification detection or when the model encounters out-of-distribution from novel classes, i.e., out-ofdistribution detection. This paper introduces Evidential Reasoning Network (EviNet), a framework that addresses these two challenges by integrating Beta embedding within a subjective logic framework. EviNet includes two key modules: Dissonance Reasoning for misclassification detection and Vacuity Reasoning for outof- distribution detection. Extensive experiments demonstrate that EviNet outperforms state-of-the-art methods across multiple metrics in the tasks of in-distribution classification, misclassification detection, and out-of-distribution detection. EviNet demonstrates the necessity of uncertainty estimation and logical reasoning for misclassification detection and out-of-distribution detection and paves the way for open-world graph learning. Our code and data are available at https://github.com/SSSKJ/EviNET.
  • Network Interdiction Goes Neural
    Zhang, Lei; Chen, Zhiqian; Lu, Chang-Tien; Zhao, Liang (ACM, 2025-08-03)
    Network interdiction problems, arising in critical applications from military strategy to disease control, involve a complex attackerdefender dynamic: one player optimizes a network-based objective, while the other strategically modifies the network to impede that objective. The inherent bi-level optimization and combinatorial nature of these problems pose a significant computational challenge, often rendering traditional exact solvers impractical and hindering the development of effective heuristics. While Graph Neural Networks (GNNs) have demonstrated promise in solving singlelevel combinatorial optimization problems on graphs, their direct application to bi-level interdiction problems remains limited. In this paper, we bridge this gap by introducing a novel approach that leverages the power of GNNs to learn Mixed-Integer Linear Programming (MILP) formulations of network interdiction problems. By representing the problem in this structured mathematical form, we empower a multipartite GNN with the representational capacity to effectively capture the complex interplay between the two players. This approach aligns the neural network with the underlying mathematical structure of interdiction problems, leading to improved performance. Through extensive experiments on two network interdiction tasks, we demonstrate the superiority of our proposed method over both baseline GNN models and traditional exact solvers, showcasing its potential for real-world applications.
  • Non-exchangeable Conformal Prediction for Temporal Graph Neural Networks
    Wang, Tuo; Kang, Jian; Yan, Yujun; Kulkarni, Adithya; Zhou, Dawei (ACM, 2025-08-03)
    Conformal prediction for graph neural networks (GNNs) offers a promising framework for quantifying uncertainty, enhancing GNN reliability in high-stakes applications. However, existing methods predominantly focus on static graphs, neglecting the evolving nature of real-world graphs. Temporal dependencies in graph structure, node attributes, and ground truth labels violate the fundamental exchangeability assumption of standard conformal prediction methods, limiting their applicability. To address these challenges, in this paper, we introduce NCPNet, a novel end-to-end conformal prediction framework tailored for temporal graphs. Our approach extends conformal prediction to dynamic settings, mitigating statistical coverage violations induced by temporal dependencies. To achieve this, we propose a diffusion-based non-conformity score that captures both topological and temporal uncertainties within evolving networks. Additionally, we develop an efficiency-aware optimization algorithm that improves the conformal prediction process, enhancing computational efficiency and reducing coverage violations. Extensive experiments on diverse real-world temporal graphs, including WIKI, REDDIT, DBLP, and IBM Anti-Money Laundering dataset, demonstrate NCPNet’s capability to ensure guaranteed coverage in temporal graphs, achieving up to a 31% reduction in prediction set size on the WIKI dataset, significantly improving efficiency compared to state-of-the-art methods. Our data and code are available at https://github.com/ODYSSEYWT/NCPNET.
  • Chasing the Timber Trail: Machine Learning to Reveal Harvest Location Misrepresentation
    Sarkar, Shailik; Yousuf, Raquib Bin; Wang, Linhan; Mayer, Brian; Mortier, Thomas; Deklerck, Victor; Truszkowski, Jakub; Simeone, John; Norman, Marigold; Saunders, Jade; Lu, Chang-Tien; Ramakrishnan, Naren (ACM, 2025-08-03)
    Illegal logging poses a significant threat to global biodiversity, climate stability, and depresses international prices for legal wood harvesting and responsible forest products trade, affecting livelihoods and communities across the globe. Stable isotope ratio analysis (SIRA) is rapidly becoming an important tool for determining the harvest location of traded, organic, products. The spatial pattern in stable isotope ratio values depends on factors such as atmospheric and environmental conditions and can thus be used for geographic origin identification. We present here the results of a deployed machine learning pipeline where we leverage both isotope values and atmospheric variables to determine timber harvest location. Additionally, the pipeline incorporates uncertainty estimation to facilitate the interpretation of harvest location determination for analysts. We present our experiments on a collection of oak (Quercus spp.) tree samples from its global range. Our pipeline outperforms comparable state-of-the-art models determining geographic harvest origin of commercially traded wood products, and has been used by European enforcement agencies to identify harvest location misrepresentation. We also identify opportunities for further advancement of our framework and how it can be generalized to help identify the origin of falsely labeled organic products throughout the supply chain.
  • Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding
    Zeng, Tong; Wu, Longfeng; Shi, Liang; Zhou, Dawei; Guo, Feng (ACM, 2025-08-03)
    Vision Large Language Models (VLLMs) have demonstrated impressive capabilities in general visual tasks such as image captioning and visual question answering. However, their effectiveness in specialized, safety-critical domains like autonomous driving remains largely unexplored. Autonomous driving systems require sophisticated scene understanding in complex environments, yet existing multimodal benchmarks primarily focus on normal driving conditions, failing to adequately assess VLLMs’ performance in safety-critical scenarios. To address this, we introduce DVBench—a pioneering benchmark designed to evaluate the performance of VLLMs in understanding safety-critical driving videos. Built around a hierarchical ability taxonomy that aligns with widely adopted frameworks for describing driving scenarios used in assessing highly automated driving systems, DVBench features 10,000 multiple-choice questions with human-annotated ground-truth answers , enabling a comprehensive evaluation of VLLMs’ capabilities in perception and reasoning. Experiments on 14 state-of-the-art VLLMs, ranging from 0.5B to 72B parameters, reveal significant performance gaps, with no model achieving over 40% accuracy, highlighting critical limitations in understanding complex driving scenarios. To probe adaptability, we fine-tuned selected models using domain-specific data from DVBench, achieving accuracy gains ranging from 5.24 to 10.94 percentage points, with relative improvements of up to 43.59%. This improvement underscores the necessity of targeted adaptation to bridge the gap between generalpurpose vision-language models and mission-critical driving applications. DVBench establishes an essential evaluation framework and research roadmap for developing VLLMs that meet the safety and robustness requirements for real-world autonomous systems. We released the benchmark toolbox and the fine-tuned model at: https://github.com/tong-zeng/DVBench.git.