Scholarly Works, Computer Science

Permanent URI for this collection

Research articles, presentations, and other scholarship

Browse

Recent Submissions

Now showing 1 - 20 of 730
  • A Dynamic Characteristic Aware Index Structure Optimized for Real-world Datasets
    Yang, Jin; Yoon, Heejin; Yun, Gyeongchan; Noh, Sam; Choi, Young-ri (ACM, 2024-12)
    Many datasets in real life are complex and dynamic, that is, their key densities are varied over the whole key space and their key distributions change over time. It is challenging for an index structure to efficiently support all key operations for data management, in particular, search, insert, and scan, for such dynamic datasets. In this paper, we present DyTIS (Dynamic dataset Targeted Index Structure), an index that targets dynamic datasets. DyTIS, though based on the structure of Extendible hashing, leverages the CDF of the key distribution of a dataset, and learns and adjusts its structure as the dataset grows. The key novelty behind DyTIS is to group keys by the natural key order and maintain keys in sorted order in each bucket to support scan operations within a hash index. We also define what we refer to as a dynamic dataset and propose a means to quantify its dynamic characteristics. Our experimental results show that DyTIS provides higher performance than the state-of-the-art learned index for the dynamic datasets considered. We also analyze the effects of the dynamic characteristics of datasets, including sequential datasets, as well as the effect of multiple threads on the performance of the indexes.
  • An Interactive Visual Presentation of Core Database Design Concepts
    Abdelaziz, Noha; Farghally, Mohammed; Mohammed, Mostafa; Soliman, Taysir (ACM, 2024-12-05)
    Database design is a core topic in Computer Science (CS) curricula at the university level. Students often encounter difficulties and misconceptions while learning these concepts. Previous research attempted to address these learning difficulties through interactive visual demonstrations. However, most of these resources are not well integrated into the curriculum, and lack a proper educational evaluation. In this paper, we present a set of online interactive visualizations that we name DataBase Visualizations (DBVs), that address common database design learning difficulties in an introductory undergraduate database course. Core database design concepts are visualized step-by-step, facilitating a deep understanding of relationship establishment and mapping onto a relational schema. DBVs could be easily embedded in an online eTextbook facilitating integration with the existing curriculum. We present our findings from an evaluation study of the effectiveness of DBVs when applied to a semester-long undergraduate database course in a large public institution in the middle east. Results indicate that intervention group students had significantly higher scores on a post-test offered as part of the final compared to control group students using primarily traditional textual content. Furthermore, intervention group students were surveyed at the end of the semester asking them about the value of DBVs to their learning process and suggestions for improvement. Survey results indicate that DBVs were clear, engaging, and easy to use. We believe that DBVs will be helpful to undergraduate database instructors in their teaching of basic database design concepts.
  • Mutating Matters: Analyzing the Influence of Mutation Testing in Programming Courses
    Mansur, Rifat Sabbir; Shaffer, Clifford; Edwards, Stephen (ACM, 2024-12-05)
    Mutation testing is used to gauge the quality of software test suites by introducing small faults, called “mutations”, into code to assess if a test suite can detect them. Although it has been applied extensively in the software industry, mutation testing’s use in programming courses faces both computational and pedagogical barriers. This study examines the impact of mutation testing on student performance in a post-CS2 Data Structures and Algorithms course with 3-4 week life-cycle programming projects. We collected a semester of data with projects using only code coverage (control group) and another semester that used mutation testing (experimental group). We investigated three aspects of mutation testing impact: the quality of student-written test suites, the correctness and complexity of students’ solution code, and the degree of incremental test writing. Our findings suggest that students using mutation testing, as a group, demonstrated higher quality test suites and wrote better solution code compared to students using traditional code coverage methods. Students using mutation testing were more likely to exhibit incremental testing practices.
  • AI in and for K-12 Informatics Education. Life after Generative AI.
    Barendsen, Erik; Lonati, Violetta; Quille, Keith; Altin, Rukiye; Divitini, Monica; Hooshangi, Sara; Karnalim, Oscar; Kiesler, Natalie; Melton, Madison; Suero Montero, Calkin; Morpurgo, Anna (ACM, 2024-12-05)
    The use and adoption of Generative AI (GenAI) has revolutionised various sectors, including computing education. However, this narrow focus comes at a cost to the wider AI in and for educational research. This working group aims to explore current trends and explore multiple sources of information to identify areas of AI research in K-12 informatics education that are being underserved but needed in the post-GenAI AI era. Our research focuses on three areas: curriculum, teacher-professional learning and policy. The denouement of this aims to identify trends and shortfalls for AI in and for K-12 informatics education. We will systematically review the current literature to identify themes and emerging trends in AI education at K-12. This will be done under two facets, curricula and teacher-professional learning. In addition, we will conduct interviews and surveys with educators and AI experts. Next, we will examine the current policy (such as the European AI Act, and European Commission guidelines on the use of AI and data in education and training as well as international counterparts). Policies are often developed by both educators and experts in the domain, thus providing a source of topics or areas that may be added to our findings. Finally, by synthesising insights from educators, AI experts, and policymakers, as well as the literature and policy, our working group seeks to highlight possible future trends and shortfalls.
  • Blocking Tracking JavaScript at the Function Granularity
    Amjad, Abdul Haddi; Munir, Shaoor; Shafiq, Zubair; Gulzar, Muhammad Ali (ACM, 2024-12-02)
    Modern websites extensively rely on JavaScript to implement both functionality and tracking. Existing privacy-enhancing content blocking tools struggle against mixed scripts, which simultaneously implement both functionality and tracking. Blocking such scripts would break functionality, and not blocking themwould allowtracking. We propose NoT.js, a fine-grained JavaScript blocking tool that operates at the function-level granularity. NoT.js’s strengths lie in analyzing the dynamic execution context, including the call stack and calling context of each JavaScript function, and then encoding this context to build a rich graph representation. NoT.js trains a supervised machine learning classifier on a webpage’s graph representation to first detect tracking at the function-level and then automatically generates surrogate scripts that preserve functionality while removing tracking. Our evaluation of NoT.js on the top-10K websites demonstrates that it achieves high precision (94%) and recall (98%) in detecting tracking functions, outperforming the state-of-the-art while being robust against off-the-shelf JavaScript obfuscation. Fine-grained detection of tracking functions allows NoT.js to automatically generate surrogate scripts, which our evaluation shows that successfully remove tracking functions without causing major breakage. Our deployment of NoT.js shows that mixed scripts are present on 62.3% of the top-10K websites, with 70.6% of the mixed scripts being third-party that engage in tracking activities such as cookie ghostwriting.
  • A First Look at Security and Privacy Risks in the RapidAPI Ecosystem
    Liao, Song; Cheng, Long; Luo, Xiapu; Song, Zheng; Cai, Haipeng; Yao, Danfeng (Daphne); Hu, Hongxin (ACM, 2024-12-02)
    With the emergence of the open API ecosystem, third-party developers can publish their APIs on the API marketplace, significantly facilitating the development of cutting-edge features and services. The RapidAPI platform is currently the largest API marketplace and it provides over 40,000 APIs, which have been used by more than 4 million developers. However, such open API also raises security and privacy concerns associated with APIs hosted on the platform. In this work, we perform the first large-scale analysis of 32,089 APIs on the RapidAPI platform. By searching in the GitHub code and Android apps, we find that 3,533 RapidAPI keys, which are important and used in API request authorization, have been leaked in the wild. These keys can be exploited to launch various attacks, such as Resource Exhaustion Running, Theft of Service, Data Manipulation, and User Data Breach attacks. We also explore risks in API metadata that can be abused by adversaries. Due to the lack of a strict certification system, adversaries can manipulate the API metadata to perform typosquatting attacks on API URLs, impersonate other developers or renowned companies, and publish spamming APIs on the platform. Lastly, we analyze the privacy non-compliance of APIs and applications, e.g., Android apps, that call these APIs with data collection. We find that 1,709 APIs collect sensitive data and 94% of them don’t provide a complete privacy policy. For the Android apps that call these APIs, 50% of them in our study have privacy non-compliance issues.
  • RESONANT: Reinforcement Learning-based Moving Target Defense for Credit Card Fraud Detection
    Abdel Messih, George; Cody, Tyler; Beling, Peter; Cho, Jin-Hee (ACM, 2024-11-11)
    According to security.org, as of 2023, 65% of credit card (CC) users in the US have been subjected to fraud at some point in their lives, which equates to about 151 million Americans. The proliferation of advanced machine learning (ML) algorithms has contributed to detecting credit card fraud (CCF). However, using a single or static ML-based defense model against a constantly evolving adversary takes its structural advantage, which enables the adversary to reverse engineer the defense’s strategy over the rounds of an iterated game. This paper proposes an adaptive moving target defense (MTD) approach based on deep reinforcement learning (DRL), termed RESONANT, to identify the optimal switching points to another ML classifier for credit card fraud detection. It identifies optimal moments to strategically switch between different ML-based defense models (i.e., classifiers) to invalidate any adversarial progress and always take a step ahead of the adversary. We take this approach in an iterated game theoretic manner where the adversary and defender take action in turns in the CCF detection contexts. Via extensive simulation experiments, we investigate the performance of our proposed RESONANT against that of the existing state-of-the-art counterparts in terms of the mean and variance of detection accuracy and attack success ratio to measure the defensive performance. Our results demonstrate the superiority of RESONANT over other counterparts, including static and naïve ML and MTD selecting a defense model at random (i.e., Random-MTD). Via extensive simulation experiments, our results show that our proposed RESONANT can outperform the existing counterparts up to two times better performance in detection accuracy using AUC (i.e., Area Under the Curve of the Receiver Operating Characteristic (ROC) curve) and system security against attacks using attack success ratio (ASR).
  • DeePSP-GIN: identification and classification of phage structural proteins using predicted protein structure, pretrained protein language model, and graph isomorphism network
    Emon, Muhit Islam; Das, Badhan; Thukkaraju, Ashrith; Zhang, Liqing (ACM, 2024-11-22)
    Phages are vital components of the microbial ecosystem, and their functions and roles are largely determined by their structural proteins. Accurately annotating phage structural proteins (PSPs) is essential for understanding phage biology and their interactions with bacterial hosts, which can pave the way for innovative strategies to combat bacterial infections and develop phage-based therapies. However, the sequence diversity of PSPs makes their identification and annotation challenging. While various computational methods are available for predicting PSPs, they currently lack the integration of protein structural information, an important aspect for understanding protein function. With the advent of deep learning models, protein structures can be predicted accurately and quickly from protein sequences, creating new opportunities for PSP prediction and analysis. We developed DeePSP-GIN, a graph isomorphism network (GIN) - based deep learning model leveraging predicted protein structures and protein language model for PSP identification and classification. To the best of our knowledge, DeePSP-GIN is the first method utilizing predicted protein structural information for PSP prediction tasks. It offers dual functionality of identifying PSP and non-PSP sequences and classifying PSPs into seven major classes. DeePSP-GIN converts predicted protein structures from PDB 3D coordinates into graphs and extracts node features from protein language model-generated embeddings. The GIN is then applied to the constructed graphs to learn the discriminating features. The experimental results show that DeePSP-GIN outperforms the state-of-the-art methods in both PSP identification and classification tasks in terms of F1-score. DeePSP-GIN achieves a 1.04% higher F1-score than the nearest competing method in the PSP identification task. Additionally, its overall F1-score in the PSP classification task is approximately 34.38% higher than that of the second-best method. The source code of DeePSP-GIN is available at https://github.com/muhit-emon/DeePSP-GIN under the MIT license.
  • FHIRViz: Multi-Agent Platform for FHIR Visualization to Advance Healthcare Analytics
    ALMutairi, Mariam; AlKulaib, Lulwah; Wang, Shengkun; Chen, Zhiqian; Almutairi, Youssif; Alenazi, Thamer; Luther, Kurt; Lu, Chang-Tien (ACM, 2024-11-22)
    The shift to electronic health records (EHRs) has enhanced patient care and research, but data sharing and complex clinical terminology remain challenges. The Fast Healthcare Interoperability Resource (FHIR) addresses interoperability issues, though extracting insights from FHIR data is still difficult. Traditional analytics often miss critical clinical context, and managing FHIR data requires advanced skills that are in short supply. This study presents FHIRViz, a novel analytics tool that integrates FHIR data with a semantic layer via a knowledge graph. It employs a large language model (LLM) system to extract insights and visualize them effectively. A retrieval vector store improves performance by saving successful generations for fine-tuning. FHIRViz translates clinical queries into actionable insights with high accuracy. Results show FHIRViz with GPT-4 achieving 92.62% accuracy, while Gemini 1.5 Pro reaches 89.34%, demonstrating the tool’s potential in overcoming healthcare data analytics challenges.
  • An Empirical Evaluation of Method Signature Similarity in Java Codebases
    Khan, Mohammad; Elhussiny, Mohamed; Tobin, William; Gulzar, Muhammad (ACM, 2024-09-11)
    Modern programming languages have transformed software development by providing capabilities of enhancing productivity and reducing code redundancy. One such feature is allowing developers to choose meaningful method names for implementation and functionality. As programs evolve into APIs and libraries, developers often design methods with similar signatures to streamline code management and improve comprehensibility. In this paper, we conduct a comprehensive study to evaluate the prevalence, usage, and perception of methods with similar signatures, including both conventionally overloaded and textually similar methods. Through analyzing 6.4 million lines of code across 167 well-established Java repositories on GitHub, we statistically assess the occurrence of these methods and their impact on usability and software quality. Additionally, we explore the evolution of these methods through a longitudinal analysis of historical commit snapshots. Our research reveals that both overloaded and textually similar methods are common in leading Java repositories and are primarily driven by specific software design requirements, program logic, and developer’s programming habits. As software matures, development shifts towards maintenance tasks that rarely necessitate design changes. Our longitudinal analysis corroborates this by indicating minimal changes in methods with similar signatures in the later stages of a repository’s life.
  • Understanding User Behavior for Enhancing Cybersecurity Training with Immersive Gamified Platforms
    Donekal Chandrashekar, Nikitha; Lee, Anthony; Azab, Mohamed; Gracanin, Denis (MDPI, 2024-12-18)
    In modern digital infrastructure, cyber systems are foundational, making resilience against sophisticated attacks essential. Traditional cybersecurity defenses primarily address technical vulnerabilities; however, the human element, particularly decision-making during cyber attacks, adds complexities that current behavioral studies fail to capture adequately. Existing approaches, including theoretical models, game theory, and simulators, rely on retrospective data and static scenarios. These methods often miss the real-time, context-specific nature of user responses during cyber threats. To address these limitations, this work introduces a framework that combines Extended Reality (XR) and Generative Artificial Intelligence (Gen-AI) within a gamified platform. This framework enables continuous, high-fidelity data collection on user behavior in dynamic attack scenarios. It includes three core modules: the Player Behavior Module (PBM), Gamification Module (GM), and Simulation Module (SM). Together, these modules create an immersive, responsive environment for studying user interactions. A case study in a simulated critical infrastructure environment demonstrates the framework’s effectiveness in capturing realistic user behaviors under cyber attack, with potential applications for improving response strategies and resilience across critical sectors. This work lays the foundation for adaptive cybersecurity training and user-centered development across critical infrastructure.
  • A perturbation approach for refining Boolean models of cell cycle regulation
    Banerjee, Anand; Rahaman, Asif Iqbal; Mehandale, Alok; Kraikivski, Pavel (PLOS, 2024-09-06)
    Considerable effort is required to build mathematical models of large protein regulatory networks. Utilizing computational algorithms that guide model development can significantly streamline the process and enhance the reliability of the resulting models. In this article, we present a perturbation approach for developing data-centric Boolean models of cell cycle regulation. To evaluate networks, we assign a score based on their steady states and the dynamical trajectories corresponding to the initial conditions. Then, perturbation analysis is used to find new networks with lower scores, in which dynamical trajectories traverse through the correct cell cycle path with high frequency. We apply this method to refine Boolean models of cell cycle regulation in budding yeast and mammalian cells.
  • Evidence of horizontal gene transfer and environmental selection impacting antibiotic resistance evolution in soil-dwelling Listeria
    Goh, Ying-Xian; Anupoju, Sai Manohar Balu; Nguyen, Anthony; Zhang, Hailong; Ponder, Monica A.; Krometis, Leigh-Anne H.; Pruden, Amy; Liao, Jingqiu (Nature Research, 2024-11-19)
    Soil is an important reservoir of antibiotic resistance genes (ARGs) and understanding how corresponding environmental changes influence their emergence, evolution, and spread is crucial. The soil-dwelling bacterial genus Listeria, including L. monocytogenes, the causative agent of listeriosis, serves as a keymodel for establishing this understanding. Here, we characterize ARGs in 594 genomes representing 19 Listeria species that we previously isolated from soils in natural environments across the United States. Among the five putatively functional ARGs identified, lin,which confers resistance to lincomycin, is the most prevalent, followed by mprF, sul, fosX, and norB. ARGs are predominantly found in Listeria sensu stricto species, with those more closely related to L. monocytogenes tending to harbor more ARGs. Notably, phylogenetic and recombination analyses provide evidence of recent horizontal gene transfer (HGT) in all five ARGs within and/or across species, likelymediated by transformation rather than conjugation and transduction. In addition, the richness and genetic divergence of ARGs are associated with environmental conditions, particularly soil properties (e.g., aluminum and magnesium) and surrounding land use patterns (e.g., forest coverage). Collectively, our data suggest that recent HGT and environmental selection play a vital role in the acquisition and diversification of bacterial ARGs in natural environments.
  • Red is Sus: Automated Identification of Low-Quality Service Availability Claims in the US National Broadband Map
    Nabi, Syed Tauhidun; Wen, Zhuowei; Ritter, Brooke; Hasan, Shaddi (ACM, 2024-11-04)
    The FCC’s National Broadband Map aspires to provide an unprecedented view into broadband availability in the US. However, this map, which also determines eligibility for public grant funding, relies on self-reported data from service providers that in turn have incentives to strategically misrepresent their coverage. In this paper, we develop an approach for automatically identifying these low-quality service claims in the National Broadband Map. To do this, we develop a novel dataset of broadband availability consisting of 750k observations from more than 900 US ISPs, derived from a combination of regulatory data and crowdsourced speed tests. Using this dataset, we develop a model to classify the accuracy of service provider regulatory filings and achieve AUCs over 0.98 for unseen examples. Our approach provides an effective technique to enable policymakers, civil society, and the public to identify portions of the National Broadband Map that are likely to have integrity challenges.
  • Technology Use in the Black Church: Perspectives of Black Church Leaders Preliminary Findings
    Thompson, Gabriella; Otoo, Nissi; Fisher, Jaden; Sibi, Irene; Smith, Angela; Ogbonnaya-Ogburu, Ihudiya (ACM, 2024-11-11)
    Historically, the Black church has played a pivotal role in civic engagement and social justice, and continues to do so today. Yet, few researchers have explored how decisions around technology use are made in the church. To address this gap, we conducted semi-structured interviews with five Black church leaders to understand how church leaders interact with digital technologies, both in general and specifically with the communities that they serve. We found that while Black Church leaders are eager to engage with technology, most of the engagement with outside communities is through in-person contact; opportunities to give online have a financial penalty in comparison to traditional methods of tithing and donating; lastly, technology use within outreach and ministries is highly dependent by ministry leaders – many whom volunteer their time.We contribute to research that focuses on technology use in religious organizations and community engagement of community-based organizations.
  • Designing Technology to Support the Hospital Classroom: Preliminary Findings
    Rasberry, Nadra; Essandoh, Joshua; Do, Ethan; Ogbonnaya-Ogburu, Ihudiya (ACM, 2024-11-11)
    Hospital teachers are state-employed educators who provide K-12 instruction to children in the hospital. We conducted research to understand how technology is used in hospital classrooms, an area which has been relatively underexplored. We conducted semistructured interviews with five hospital teachers to understand their experience of using technology in and outside the classroom. Our findings revealed that hospital teachers often rely on older curricula given the changing education atmosphere; learning is often assessed through in-classroom observations of mastery; and technology and internet use by students is often restricted, which may inhibit opportunities to use AI and other technical resources in the classroom.We contribute a deeper understanding of technology use in the hospital classroom.
  • Evaluation of Interactive Demonstration in Voice-assisted Counting for Young Children
    Karunaratna, Sulakna; Vargas-Diaz, Daniel; Kim, Jisun; Wang, Jenny; Choi, Koeun; Lee, Sang Won (ACM, 2024-11-11)
    In recent years, the number of AI voice agent applications designed to help young children learn math has increased. However, the impact of interactivity within these applications on children’s learning and engagement remains unexplored. While current apps may employ various levels of interactions, such as visual, haptic, sound, and animation, the efficacy of these interactions in facilitating children’s learning remains uncertain. This research investigates how varying levels of interactivity in touch-based interfaces, combined with an AI voice agent, affect the learning of counting skills in children aged 2 to 4 years.We examine three conditions: baseline (no demonstration), animated demonstration, and interactive demonstration. By examining how these different levels of interactivity influence children’s engagement with math apps, this study seeks to enhance our understanding of effective design strategies for educational technology targeting early childhood education. The findings of this research hold the potential to inform the development of interfaces for math games that leverage both touch-based interactions and AI voice assistants to support young children’s learning of foundational mathematical concepts.
  • Investigating Characteristics of Media Recommendation Solicitation in r/ifyoulikeblank
    Bhuiyan, Md Momen; Hu, Donghan; Jelson, Andrew; Mitra, Tanushree; Lee, Sang Won (ACM, 2024-11-08)
    Despite the existence of search-based recommender systems like Google, Netflix, and Spotify, online users sometimes may turn to crowdsourced recommendations in places like the r/ifyoulikeblank subreddit. In this exploratory study, we probe why users go to r/ifyoulikeblank, how they look for recommendation, and how the subreddit users respond to recommendation requests. To answer, we collected sample posts from r/ifyoulikeblank and analyzed them using a qualitative approach. Our analysis reveals that users come to this subreddit for various reasons, such as exhausting popular search systems, not knowing what or how to search for an item, and thinking crowd have better knowledge than search systems. Examining users query and their description, we found novel information users provide during recommendation seeking using r/ifyoulikeblank. For example, sometimes they ask for artifacts recommendation based on the tools used to create them. Or, sometimes indicating a recommendation seeker's time constraints can help better suit recommendations to their needs. Finally, recommendation responses and interactions revealed patterns of how requesters and responders refine queries and recommendations. Our work informs future intelligent recommender systems design.
  • Simplify, Consolidate, Intervene: Facilitating Institutional Support with Mental Models of Learning Management System Use
    Hassan, Taha; Edmison, Bob; Williams, Daron; Cox II, Larry; Louvet, Matthew; Knijnenburg, Bart; McCrickard, D. (ACM, 2024-11-08)
    Measuring instructors' adoption of learning management system (LMS) tools is a critical first step in evaluating the efficacy of online teaching and learning at scale. Existing models for LMS adoption are often qualitative, learner-centered, and difficult to leverage towards institutional support. We propose depth-of-use (DOU): an intuitive measurement model for faculty's utilization of a university-wide LMS and their needs for institutional support. We hypothesis-test the relationship between DOU and course attributes like modality, participation, logistics, and outcomes. In a large-scale analysis of metadata from 30000+ courses offered at Virginia Tech over two years, we find that a pervasive need for scale, interoperability and ubiquitous access drives LMS adoption by university instructors. We then demonstrate how DOU can help faculty members identify the opportunity-cost of transition from legacy apps to LMS tools. We also describe how DOU can help instructional designers and IT organizational leadership evaluate the impact of their support allocation, faculty development and LMS evangelism initiatives.
  • ThreatKG: An AI-Powered System for Automated Open-Source Cyber Threat Intelligence Gathering and Management
    Gao, Peng; Liu, Xiaoyuan; Choi, Edward; Ma, Sibo; Yang, Xinyu; Song, Dawn (ACM, 2023-11-19)
    Open-source cyber threat intelligence (OSCTI) has become essential for keeping up with the rapidly changing threat landscape. However, current OSCTI gathering and management solutions mainly focus on structured Indicators of Compromise (IOC) feeds, which are lowlevel and isolated, providing only a narrow view of potential threats. Meanwhile, the extensive and interconnected knowledge found in the unstructured text of numerous OSCTI reports (e.g., security articles, threat reports) available publicly is still largely underexplored. To bridge the gap, we propose THREATKG, an automated system for OSCTI gathering and management. THREATKG efficiently collects a large number of OSCTI reports from multiple sources, leverages specialized AI-based techniques to extract high-quality knowledge about various threat entities and their relationships, and constructs and continuously updates a threat knowledge graph by integrating new OSCTI data. THREATKG features a modular and extensible design, allowing for the addition of components to accommodate diverse OSCTI report structures and knowledge types. Our extensive evaluations demonstrate THREATKG’s practical effectiveness in enhancing threat knowledge gathering and management.