Journal Articles, Association for Computing Machinery (ACM)

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 473
  • Hypergraph-based Zero-shot Multi-modal Product Attribute Value Extraction
    Hu, Jiazhen; Gong, Jiaying; Shen, Hongda; Eldardiry, Hoda (ACM, 2025-04-28)
    It is essential for e-commerce platforms to provide accurate, complete, and timely product attribute values, in order to improve the search and recommendation experience for both customers and sellers. In the real-world scenario, it is difficult for these platforms to identify attribute values for the newly introduced products given no similar product history records for training or retrieval. Besides, how to jointly learn the product representation given various product information in multiple modalities, such as textual modality (e.g., product titles and descriptions) and visual modality (e.g., product images), is also a challenging task. To address these limitations, we propose a novel method for extracting multi-label product attribute-value pairs from multiple modalities in the zero-shot scenario, where labeled data is absent during training. Specifically, our method constructs heterogeneous hypergraphs, where product information from different modalities is represented by different types of nodes, and the text and image nodes are embedded and learned through CLIP encoders to effectively capture and integrate multi-modal product information. Then, the complex interrelations among these nodes are modeled through the hyperedges. By learning informative node representations, our method can accurately predict links between unseen product nodes and attribute-value nodes, enabling zero-shot attribute value extraction. We conduct extensive experiments and ablation studies on several categories of the public MAVE dataset and the results demonstrate that our proposed method significantly outperforms several state-of-theart generative model baselines in multi-label, multi-modal product attribute value extraction in the zero-shot setting.
  • MENTORPDM: Learning Data-Driven Curriculum for Multi-Modal Predictive Maintenance
    Zhang, Shuaicheng; Wang, Tuo; Adams, Stephen; Bhattacharya, Sanmitra; Tiyyagura, Sunil; Bowen, Edward; Veeramani, Balaji; Zhou, Dawei (ACM, 2025-07-20)
    Predictive Maintenance (PDM) systems are essential for preemptive monitoring of sensor signals to detect potential machine component failures in industrial assets such as bearings in rotating machinery. Existing PDM systems face two primary challenges: 1) Irregular Signal Acquisition, where data collection from the sensors is intermittent, and 2) Signal Heterogeneity, where the full spectrum of sensor modalities is not effectively integrated. To address these challenges, we propose a Curriculum Learning Framework for Multi-Modal Predictive Maintenance – MentorPDM. MentorPDM consists of 1) a graph-augmented pretraining module that captures intrinsic and structured temporal correlations across time segments via a temporal contrastive learning objective and 2) a bi-level curriculum learning module that captures task complexities for weighing the importance of signal modalities and samples via modality and sample curricula. Empirical results from MentorPDM show promising performance with better generalizability in PDM tasks compared to existing benchmarks. The efficacy of the MentorPDM model will be further demonstrated in real industry testbeds and platforms.
  • Chainlet Orbits: Topological Address Embedding for Blockchain
    Azad, Poupak; Coskunuzer, Baris; Kantarcioglu, Murat; Akcora, Cuneyt (ACM, 2025-07-20)
    The rise of cryptocurrencies like Bitcoin has not only increased trade volumes but also broadened the use of graph machine learning techniques, such as address embeddings, to analyze transactions and decipher user patterns. Traditional analysis methods rely on simple heuristics and extensive data gathering, while more advanced Graph Neural Networks encounter challenges such as scalability, poor interpretability, and label scarcity in massive blockchain transaction networks. To overcome existing techniques’ computational and interpretability limitations, we introduce a topological approach, Chainlet Orbits, which embeds blockchain addresses by leveraging their topological characteristics in temporal transactions. We employ our innovative address embeddings to investigate financial behavior and e-crime in the Bitcoin and Ethereum networks, focusing on distinctive substructures that arise from user behavior. Our model demonstrates exceptional performance in node classification experiments compared to GNN-based approaches. Furthermore, our approach embeds all daily nodes of the largest blockchain transaction network, Bitcoin, and creates explainable machine learning models in less than 17 minutes which takes days for GNN-based approaches.
  • Probabilistic Hypergraph Recurrent Neural Networks for Time-series Forecasting
    Chen, Hongjie; Rossi, Ryan; Kim, Sungchul; Mahadik, Kanak; Eldardiry, Hoda (ACM, 2025-07-20)
    Leveraging graph structures for time-series forecasting has garnered significant attention due to their effective relationship modeling between nodes and their associated time-series. However, in scenarios entities communicate in a broadcasting manner, graph models fall short of pairwise modeling. Hypergraph models address this by capturing beyond-pairwise interactions among node time-series. Nevertheless, most hypergraph models overlook the dynamics between nodes and their incident hyperedges, assuming constant node-hyperedge connections. In this paper, we introduce a novel model, Probabilistic Hypergraph Recurrent Neural Networks (PHRNN), which leverages node-hyperedge dynamics for accurate time-series forecasting. PHRNN associates each timeseries with a node and models node interactions on a hypergraph, capturing beyond-pairwise interactions. Moreover, PHRNN learns a probabilistic hypergraph in which node-hyperedge relations are modeled as probabilistic distributions instead of fixed values, capturing dynamic node-hyperedge relations. PHRNN further integrates a prior knowledge KNN hypergraph as regularization when learning the probabilistic hypergraph structure. To the best of our knowledge, PHRNN is the first time-series forecasting model that incorporates hypergraph modeling and probabilistic relationship modeling. Forecasting results from extensive experiments show that PHRNN outperforms state-of-the-art graph and hypergraph baselines on real-world datasets.
  • Systematic Use of Random Self-Reducibility in Cryptographic Code against Physical Attacks
    Erata, Ferhat; Chiu, TingHung; Etim, Anthony; Nampally, Srilalith; Raju, Tejas; Ramu, Rajashree; Piskac, Ruzica; Antonopoulos, Timos; Xiong, Wenjie; Szefer, Jakub (ACM, 2024-10-27)
    This work presents a novel, black-box software-based countermeasure against physical attacks including power side-channel and fault-injection attacks. The approach uses the concept of random self-reducibility and self-correctness to add randomness and redundancy in the execution for protection. Our approach is at the operation level, is not algorithm-specific, and thus, can be applied for protecting a wide range of algorithms. The countermeasure is empirically evaluated against attacks over operations like modular exponentiation, modular multiplication, polynomial multiplication, and number theoretic transforms. An end-to-end implementation of this countermeasure is demonstrated for RSA-CRT signature algorithm and Kyber Key Generation public key cryptosystems. The countermeasure reduced the power side-channel leakage by two orders of magnitude, to an acceptably secure level in TVLA analysis. For fault injection, the countermeasure reduces the number of faults to 95.4% in average.
  • CRAZNS: A Case for Conventional Namespace Support for RAID with ZNS SSDs
    Kim, Hangyul; Song, Inho; Noh, Sam H. (ACM, 2025-03-31)
    Zoned Namespace (ZNS) SSDs are flash-based SSDs that maintain several zones in storage. Each zone has its own write pointer to prevent any write requests from occurring in front of or behind it. While ZNS SSDs achieve improved write performance with the write pointer, they also face limitations as in-place updates are not allowed. This limitation poses a challenge in building a Redundant Array of ZNS SSDs as metadata and partial parity logs can be done more efficiently with overwrites. In this paper, we advocate for the use of a conventional namespace in ZNS RAID, that is, RAID that uses ZNS SSDs, and to support this, we design and implement CRAZNS, a ZNS RAID-5 that makes use of a conventional namespace. Compared to RAIZN, the state-of-the-art ZNS RAID, CRAZNS uses 4 GB extra storage space for the conventional namespace, but is able to use the maximum number of zoned namespaces that are possible and saves almost 26 GiB of storage space by eliminating the need for Metadata zones. Performance evaluations show that for individual applications performance between RAIZN and CRAZNS were similar, but in terms of small write throughput, CRAZNS is 1.2× higher than RAIZN. Also, CRAZNS enhances overall throughput by 1.1× over RAIZN as more zones can be kept open.
  • A Software Caching Runtime for Embedded NVRAM Systems
    Williams, Harrison; Hicks, Matthew (ACM, 2024-04-27)
    Increasingly sophisticated low-power microcontrollers are at the heart of millions of IoT and edge computing deployments, with developers pushing large-scale data collection, processing, and inference to end nodes. Advanced workloads on resource-constrained systems depend on emerging technologies to meet performance and lifetime demands. High-performance Non-Volatile RAMs (NVRAMs) are one such technology enabling a new class of systems previously made impossible by memory limitations, including ultralow- power designs using program state non-volatility and sensing systems storing and processing large blocks of data. Unfortunately, existing NVRAM significantly underperforms SRAM’s access latency/energy cost and flash’s read performance—condemning systems dependent on NVRAM to pay a steep energy and time penalty for software execution. We observe that this performance penalty stems predominately from instruction fetches into NVRAM, which represent >75% of memory accesses in typical embedded software. To eliminate this performance bottleneck, we propose SwapRAM, a new operating model for NVRAM-based platforms which repurposes underutilized SRAM as an instruction cache, maximizing the proportion of accesses directed towards higher-performance SRAM. SwapRAM consists of a set of compile-time code transformations and a runtime management system that transparently and dynamically copies code into SRAM throughout execution, with an extensible logic to delay eviction of hot code. Across nine embedded benchmarks running on a real FRAM platform, SwapRAM’s software-based design increases execution speed by up to 46% (average 26%) and reduces energy consumption by up to 36% (average 24%) compared to a baseline system using the existing hardware cache.
  • The Evolution of Information Seeking in Software Development: Understanding the Role and Impact of AI Assistants
    Al Haque, Ebtesam; Brown, Chris; LaToza, Thomas D.; Johnson, Brittany (ACM, 2025-06-23)
    About 32% of a software practitioners’ day involves seeking and using information to support task completion. Although the information needs of software practitioners have been studied extensively, the impact of AI-assisted tools on their needs and informationseeking behaviors remains largely unexplored. To addresses this gap, we conducted a mixed-method study to understand AI-assisted information seeking behavior of practitioners and its impact on their perceived productivity and skill development. We found that developers are increasingly using AI tools to support their information seeking, citing increased efficiency as a key benefit. Our findings also amplify caveats that come with effectively using AI tools for information seeking, especially for learning and skill development, such as the importance of foundational developer knowledge that can guide and inform the information provided by AI tools. Our efforts have implications for the effective integration of AI tools into developer workflows as information retrieval systems and learning aids.
  • The Impact of Generative AI on Test & Evaluation: Challenges and Opportunities
    Freeman, Laura; Robert, John; Wojton, Heather (ACM, 2025-06-23)
    Generative Artificial Intelligence (GenAI) is transforming software development processes, including test and evaluation (T&E). From automating test case design to enabling continuous testing in DevOps pipelines, AI-driven tools enhance the efficiency, accu-racy, and speed of software testing. At the same time, the integra-tion of AI components into software-reliant systems introduces new challenges for verification and validation (V&V). Traditional T&E methodologies must evolve to address issues such as AI bias, hal-lucinated outputs, and the complexity of validating non-determin-istic behaviors. This position paper examines how existing T&E methods must evolve to account for AI’s stochastic nature, and con-versely how GenAI is transforming T&E practices across the soft-ware development lifecycle (SDLC).
  • Smart Building Operations and Virtual Assistants Using LLM
    Ly, Reachsak; Shojaei, Alireza; Gao, Xinghua (ACM, 2025-06-23)
    Conventional AI-powered smart home assistants primarily function as voice-activated control systems with limited adaptability and contextual understanding. Similarly, while traditional artificial intelligence has advanced autonomous building research, it often relies on predefined rules and struggles with real-time decisionmaking in dynamic building environments. This paper introduces a novel Generative AI-driven framework that integrates Large Language Models (LLMs) to create a smart generative AI-based virtual assistant and an operation automation system for building infrastructure. The AI systems autonomously manage building operations by analyzing real-time occupancy patterns and adjusting environmental conditions based on predefined comfort thresholds. The proposed system also facilitates seamless human-building interaction through an LLM-powered virtual assistant. The framework is validated through a prototype implementation in a real-world building equipped with smart appliances, with evaluations focusing on the AI systems’ accuracy, reliability, and scalability. The findings demonstrate that the prototype system can autonomously adjust building conditions, optimize energy usage, and provide intelligent assistance for building operation tasks.
  • From Prompts to Properties: Rethinking LLM Code Generation with Property-Based Testing
    Bose, Dibyendu Brinto (ACM, 2025-06-23)
    Large Language Models (LLMs) have shown promise in automated code generation, but ensuring correctness remains a significant challenge. Traditional unit testing evaluates functional correctness but often fails to capture deeper logical constraints. We apply Property-Based Testing (PBT) as an alternative evaluation strategy to StarCoder and CodeLlama on MBPP and HumanEval. Our results reveal that while pass@k evaluation shows moderate success, PBT exposes additional correctness gaps. A significant portion of generated solutions only partially adhere to correctness properties (30–32%), while 18–23% fail outright. Property extraction is also imperfect, with 9–13% of constraints missing. These findings highlight that unit test-based evaluations may overestimate solution correctness by not capturing fundamental logical errors. Our study demonstrates that combining unit testing with PBT can offer a more comprehensive assessment of generated code correctness, revealing limitations that traditional verification approaches miss.
  • Towards LLM-Based Automatic Playtest
    Zhao, Yan; Tang, Chiawei (ACM, 2025-06-23)
    Playtest is the process in which people play a video game for testing. It is critical for the quality assurance of gaming software. Manual playtest is time-consuming and expensive. However, automating this process is challenging, as playtest typically requires for the domain knowledge and problem-solving skills that most conventional testing tools lack. Recent advancements in artificial intelligence (AI) have opened up new possibilities of applying Large Language Models (LLMs) to playtest. However, significant challenges remain: current LLMs cannot visually perceive game environments; most existing research focuses on text-based games or games with robust API; While many non-text games lack APIs to provide textual descriptions of game states, making it almost impossible to naïvely apply LLMs for playtest. This paper introduces Lap, our novel approach of LLM-based Automatic Playtest, which uses ChatGPT to test match-3 games—a category of games where players match three or more identical tiles in a row or column to earn points. Lap encompasses three key phrases: processing of game environments, prompting-based action generation, and action execution. Given a match-3 game, Lap takes a snapshot of the game board and converts it to a numeric matrix; it then prompts ChatGPT-O1-mini API to suggest moves based on that matrix; finally, Lap tentatively applies the suggested moves to earn points and trigger changes in the game board. It repeats the above-mentioned three steps iteratively until timeout. For evaluation, we conducted a case study by applying Lap to an open-source match-3 game—CasseBonbons, and empirically compared Lap with three existing tools. Our results are promising: Lap outperformed existing tools by achieving higher code coverage and triggering more program crashes. Our research will shed light on future research of automatic testing and LLM application.
  • How Well Do ChatGPT Models Maintain Software?
    Kabir, Md Mahir Asef; Hassan, Sk Adnan (ACM, 2025-06-23)
    Since the launch of ChatGPT in 2022, people have conducted various studies to investigate its capabilities in code generation, bug-fixing, test generation, and program comprehension. While ChatGPT has demonstrated strong capabilities in several aspects of software engineering, their effectiveness in maintaining software remains under-explored. Motivated by such a lack of study, we conducted an empirical study to systematically evaluate the performance of ChatGPT in software maintenance. Specifically, we distilled 58 software maintenance tasks from 58 GitHub projects. For each task, we prompted two ChatGPT models—ChatGPT-3.5 and ChatGPT-4o—to separately revise a given Java file, in response to a prescribed maintenance request. Once the models returned results,we assessed each model’s capability by comparing those revisions with developers’ modifications recorded in the version history. We found that ChatGPT-3.5 correctly revised code for 30 of the 58 tasks, while ChatGPT-4o correctly fulfilled 31 tasks. Neither model fulfilled all tasks successfully mainly because they either truncated Java files unnecessarily, missed project-specific logic, or failed to cover all corner cases. This phenomenon implies that ChatGPT can help developers in software maintenance, but is unlikely to replace developers completely. Our study characterizes ChatGPT’s capabilities in software maintenance and its progression across model versions. It also sheds light on ChatGPT’s potential roles in future software-maintenance practices.
  • AutoPyDep: A Recommendation System for Python Dependency Management Utilizing Graph-Based Analytics
    Bose, Dibyendu Brinto; Chan, Travis; Trimble, Matthew; Brown, Chris (ACM, 2025-06-23)
    Managing software dependencies is increasingly challenging due to the complexity of modern development, often resulting in “dependency hell” with version conflicts, build failures, and runtime errors. To address these issues, we present AutoPyDep, a recommendation system for Python library dependency management. AutoPyDep features dependency analysis, relationship mapping, and predictive modeling for release categories and dates. By transforming release notes from 23 Python libraries into a graph network, we leverage NLP techniques and a community-based deepWalk algorithm to generate embeddings for tasks such as release category prediction and release date forecasting. Key contributions include a voting classifier achieving a robust F1 score of 0.8 and an ARIMA model with a Mean Absolute Error (MAE) of 1.8 months. AutoPyDep enhances dependency management accuracy, offering actionable insights for developers and supporting improved decision-making in software development. A demonstration of our tool is shared in this link: https://drive.google.com/file/d/1C0NJPPSYEdMot5Lbc2nsuuFPvPn9iTMH/view?usp=drive_link.
  • DevCoach: Supporting Students Learning the Software Development Life Cycle with a Generative AI powered Multi-Agent System
    Wang, Tianjia; Trimble, Matthew; Brown, Chris (ACM, 2025-06-23)
    The software development life cycle (SDLC) is vital for ensuring the quality of software systems. However, learning SDLC concepts presents unique challenges, such as the need for effective collaboration, real-time interaction, and access to diverse skill sets represented in software development teams. To address these problems, we present DevCoach, a generative AI powered multi-agent system designed to support students learning the SDLC. DevCoach allows students to interact with generative AI agents simulating the different roles in the software development team, engaging in tasks across different phases of SDLC. Through a user study (𝑛 = 20), we evaluate the system’s effectiveness in enhancing learning, impact on SDLC deliverables, and support for Community of Inquiry (CoI) elements necessary for effective and supportive learning environments. Our results reveal that students using DevCoach achieved significantly higher learning gains and improved task completion rates across all SDLC phases. The system also supports CoI elements, particularly perceived social presence. Participants also lauded the immediate context-aware feedback, interactive learning environment, and diverse expertise provided by the roles within the multi-agent team. These findings demonstrate the potential of generative AI to enhance software engineering education by making it more effective, engaging, and interactive, providing students with collaborative and practical learning experiences.
  • How do Software Engineering Candidates Prepare for Technical Interviews?
    Bell, Brian; Thomas, Teresa; Lee, Sang Won; Brown, Chris (ACM, 2025-06-23)
    To obtain employment, aspiring software engineers must complete technical interviews—a hiring process which involves candidates writing code while communicating to an audience. However, the complexities of tech interviews are difficult to prepare for and seldom faced in computing curricula. To this end, we seek to understand how candidates prepare for technical interviews, investigating the effects of preparation methods and the role of education.We distributed a survey to candidates (𝑛 = 131) actively preparing for technical interviews. Our results suggest candidates rarely train in authentic settings and courses fail to support preparation efforts— leading to stress and unpreparedness. Based on our findings, we provide implications for stakeholders to enhance tech interview preparation for candidates pursuing software engineering roles.
  • Bridging Fairness and Uncertainty: Theoretical Insights and Practical Strategies for Equalized Coverage in GNNs
    Wu, Longfeng; Zhou, Yao; Kang, Jian; Zhou, Dawei (ACM, 2025-04-28)
    Graph Neural Networks (GNNs) have become indispensable tools in many domains, such as social network analysis, financial fraud detection, and drug discovery. Prior research primarily concentrated on improving prediction accuracy while overlooking how reliable the model predictions are. Conformal prediction on graphs emerges as a promising solution, offering statistically sound uncertainty estimates with a pre-defined coverage level. Despite the promising progress, existing works only focus on achieving model coverage guarantees without considering fairness in the coverage within different demographic groups. To bridge the gap between conformal prediction and fair coverage across different groups, we pose the fundamental question: Can fair GNNs enable the uncertainty estimates to be fairly applied across demographic groups? To answer this question, we provide a comprehensive analysis of the uncertainty estimation in fair GNNs employing various strategies. We prove theoretically that fair GNNs can enforce consistent uncertainty bounds across different demographic groups, thereby minimizing bias in uncertainty estimates. Furthermore, we conduct extensive experiments on five commonly used datasets across seven state-of-the-art fair GNN models to validate our theoretical findings. Additionally, based on the theoretical and empirical insights, we identify and analyze the key strategies from various fair GNN models that contribute to ensuring equalized uncertainty estimates. Our work estimates a solid foundation for future exploration of the practical implications and potential adjustments needed to enhance fairness in GNN applications across various domains. For reproducibility, we publish our data and code at https://github.com/wulongfeng/EqualizedCoverage_CP.
  • Third International Workshop on Multimodal Content Analysis for Social Good
    Naseem, Usman; Thapa, Surendrabikram; Lee, Roy; Nasim, Mehwish (ACM, 2025-05-08)
    The third edition of the International Workshop on Multimedia Content Analysis for Social Good (MM4SG 2025) was held alongside the prestigious Web Conference 2025. This workshop aimed to tackle the critical challenge of analyzing and moderating multimodal content across digital platforms. In today’s era, where diverse forms of multimodal data—including memes, text-embedded images, and fabricated content—can rapidly shape public opinion and influence societal narratives, the demand for sophisticated and ethical content moderation strategies has become increasingly urgent. MM4SG 2025 provided a unique forum for interdisciplinary collaboration, bringing together researchers and practitioners from natural language processing, machine learning, computational social science, and ethics to address these pressing concerns. This paper highlights the key themes, discussions, and contributions of the third edition of the MM4SG workshop, with a particular focus on the intersection of computational linguistics and multimodal content analysis. It also explores future directions for the workshop, including expanding its scope and impact in subsequent editions.
  • Addressing the Challenges of Mental Health Conversations with Large Language Models
    Shiwakoti, Shuvam; Shah, Siddhant Bikram; Razzak, Imran; Thapa, Surendrabikram; Naseem, Usman (ACM, 2025-05-08)
    Virtual Mental Health Assistants offer a promising solution to address the growing demand for accessible and scalable mental healthcare. However, existing dialogue generation models struggle with the complexities inherent in mental health conversations. In this paper, we explore the limitations of current Medical Dialogue Generation models by conducting experiments on the large language model ChatMGL.We propose modifications to ChatMGL, including finetuning the model on a mental health dataset without proximal policy optimization and incorporating dialogue act labels, to enhance its ability to handle the complex nature of mental health dialogues. Our results demonstrate that these modifications outperform baseline models in terms of ROUGE and BERT scores. Our work suggests that specialized fine-tuning and incorporating domain-specific knowledge can improve the efficacy of virtual assistants for mental health support.
  • Towards Agentic AI for Science: Hypothesis Generation, Comprehension, Quantification, and Validation
    Huang, Lifu; Koutra, Danai; Kulkarni, Adithya; Prioleau, Temiloluwa; Wu, Qingyun; Yan, Yujun; Yang, Yaoqing; Zou, James; Zhou, Dawei (ACM, 2025-05-08)