Scholarly Works, Computer Science

Permanent URI for this collection

Research articles, presentations, and other scholarship

Browse

Recent Submissions

Now showing 1 - 20 of 963
  • Evaluating CS1-LLM: Integrating LLMs and Examining Student Outcomes in an Introductory Computer Science Course
    Vadaparty, Annapurna; Smith, David H. IV; Srinath, Samvrit; Padala, Mounika; Alvarado, Christine; Gorson Benario, Jamie; Porter, Leo; Zingaro, Daniel (ACM, 2026-02-09)
    Large language models (LLMs) have broad implications for education in general, impacting the foundations of what we teach and how we assess. This is especially true in computing, where LLMs tuned for coding have demonstrated shockingly good performance on the types of assignments historically used in introductory CS (CS1) courses. As a result, CS1 courses will need to change in terms of the skills that are taught and how they are assessed. Computing education researchers have begun to study student use of LLMs, but there remains much to be understood about the ways that these tools affect student outcomes. In this paper, we present the design and evaluation of a new CS1 course at a large research-intensive university that integrates the use of LLMs for student learning. We describe the design principles used to create our course, our new course objectives, and evaluation of student outcomes and perceptions throughout the course as measured by assessment scores and surveys. Our findings suggest that 1) student exam performance outcomes, including differences among demographic groups, are largely similar to historical outcomes for courses without integration of LLM tools, 2) large, open-ended projects may be particularly valuable in an LLM context, and 3) students predominantly found the LLM tools helpful, although some had concerns regarding overreliance on the tools.
  • Comparison of cultural preferences and cultural practices in website design in Pakistan
    Nizamani, Sehrish Basir; Nizamani, Saad; Basir, Nazish; Khoumbati, Khalil; Nizamani, Sarwat; Memon, Shahzad (Springer, 2025-11)
    Purpose: Websites are typically influenced by the cultural context in which they are created and used. A website that is designed and used based on the preferences of its users and their culture is considered usable. Individuals’ cultural preferences refer to their level of cultural comfort, whereas cultural practices are shared perceptions of how people behave in a culture regularly as a whole. This article discusses the comparison of the web design preferences of users with the actual working practices of three categories of websites in Pakistan. Methods: The disparity between preferences and practices is examined utilizing Hofstede’s six cultural dimensions. Website design practices are collected through content analysis thematic coding is used to systematically categorize and analyze the data. Results: The results reflect that web design practices in Pakistan correspond to preferences in information density, information presentation, navigation, data restriction, error messages, content terminology, and gender roles. Mixed practices are observed in search results, cultural signs, colours, the purpose of images, menu choices, people’s images, user paths, the frequency of important links, input and feedback options and content density.
  • Whole-Genome Sequencing Reveals Breed-Specific SNPs, Indels, and Signatures of Selection in Royal White and White Dorper Sheep
    Liao, Mingsi; Kravitz, Amanda; Haak, David C.; Sriranganathan, Nammalwar; Cockrum, Rebecca R. (MDPI, 2026-03-05)
    Whole-genome sequencing (WGS) is a powerful tool for uncovering genome-wide variation, identifying selection signatures, and guiding genetic improvement in livestock. Royal White (RW) and White Dorper (WD) sheep are economically important meat-type hair breeds in the U.S., yet their genomic architecture remains poorly characterized. In this study, WGS was performed on 20 ewes (n = 11 RW, n = 9 WD) to identify and annotate SNPs and small insertions and deletions (indels). Functional annotation, gene enrichment, population structure, and selective sweep analysis were also performed. Selective sweep analysis was conducted by integrating the fixation index (FST), nucleotide diversity (π), and Tajima’s D to identify candidate regions under putative recent positive selection. A total of 21,957,139 SNPs and 2,866,600 indels were identified in RW sheep, whereas 18,641,789 SNPs and 2,397,368 indels were identified in WD sheep. In RW sheep, candidate genes under selection were associated with health and parasite resistance (NRXN1, HERC6, TGFB2) and growth traits (JADE2). In WD sheep, selective sweep regions included genes linked to immune response and parasite resistance (TRIM14), body weight (PLXDC2), and reproduction (STPG3). These findings were supported by sheep-specific quantitative trait loci (QTL) annotations and previously reported SNP–trait associations. This study provides the first WGS-based genomic comparison between RW and WD sheep, establishing a foundation for future genetic improvement, including targeted selection for enhanced immune function, disease resistance, and other economically important traits in these breeds.
  • Applying the Midas Touch of Reproducibility to High-Performance Computing
    Minor, A. C.; Feng, Wu-chun (IEEE, 2022-09-19)
    With the exponentially improving serial performance of CPUs from the 1980s and 1990s slowing to a standstill by the 2010s, the high-performance computing (HPC) community has seen parallel computing become ubiquitous, which, in turn, has led to a proliferation of parallel programming models, including CUDA, OpenACC, OpenCL, OpenMP, and SYCL. This diversity in hardware platform and programming model has forced application users to port their codes from one hardware platform to another (e.g., CUDA on NVIDIA GPU to HIP or OpenCL on AMD GPU) and demonstrate reproducibility via adhoc testing. To more rigorously ensure reproducibility between codes, we propose Midas, a system to ensure that the results of the original code match the results of the ported code by leveraging the power of snapshots to capture the state of a system before and after the execution of a kernel.
  • Characterization and Optimization of the Fitting of Quantum Correlation Functions
    Chuang, Pi-Yueh; Shah, Niteya; Barry, Patrick; Cloet, Ian; Constantinescu, Emil M.; Sato, Nobuo; Qiu, Jian-Wei; Feng, Wu-chun (IEEE, 2024-09)
    This case study presents a characterization and optimization of an application code for extracting parton distribution functions from high energy electron-proton scattering data. Profiling this application code reveals that the phase-space density computation accounts for 93% of the overall execution time for a single iteration on a single core. When executing multiple iterations in parallel on a multicore system, the application spends 78% of its overall execution time idling due to load imbalance. We address these issues by first transforming the application code from Python to C++ and then tackling the application load imbalance via a hybrid scheduling strategy that combines dynamic and static scheduling. These techniques result in a 62% reduction in CPU idle time and a 2.46x speedup in overall execution time per node. In addition, the typically enabled power-management mechanisms in supercomputers (e.g., AMD Turbo Core, Intel Turbo Boost, and RAPL) can significantly impact intra-node scalability when more than 50% of the CPU cores are used. This finding underscores the importance of understanding system interactions with power management, as they can adversely impact application performance, and highlights the necessity of intra-node scaling tests to identify performance degradation that inter-node scaling tests might otherwise overlook.
  • Experiences with VITIS AI for Deep Reinforcement Learning
    Chaudhury, Nabayan; Gondhalekar, Atharva; Feng, Wu-chun (IEEE, 2024-09)
    Deep reinforcement learning has found use cases in many applications, such as natural language processing, self-driving cars, and spacecraft control applications. Many use cases of deep reinforcement learning seek to achieve inference with low latency and high accuracy. As such, this work articulates our experiences with the AMD Vitis AI toolchain to improve the latency and accuracy of inference in deep reinforcement learning. In particular, we evaluate the soft actor-critic (SAC) model that is trained to solve the MuJoCo humanoid environment, where the objective of the humanoid agent is to learn a policy that allows it to stay in motion for as long as possible without falling over. During the training phase, we prune the model using the weight sparsity pruner from the Vitis AI optimizer at different timesteps. Our experimental results show that pruning leads to an improvement in the evaluation of the reinforcement learning policy, where the trained agent can remain balanced in the environment and accumulate higher rewards, compared to a trained agent without pruning. Specifically, we observe that pruning the network during training can deliver up to 20% better mean episode length and 23% higher reward (better accuracy), compared to a network without any pruning. Additionally, there is an improvement in decision-making latency up to 20%, which is the time between the observation of the agent's state and a control decision.
  • On the Scalability of Computing Genomic Diversity Using SparkLeBLAST: A Feasibility Study
    Prabhu, Ritvik; Moussad, Bernard; Youssef, Karim; Vatai, Emil; Feng, Wu-chun (IEEE, 2024-09)
    Studying the genomic diversity of viruses can help us understand how viruses evolve and how that evolution can impact human health. Rather than use a laborious and tedious wet-lab approach to conduct a genomic diversity study, we take a computational approach, using the ubiquitous NCBI BLAST and our parallel and distributed SparkLeBLAST, across 53 patients (40,000,000 query sequences) on Fugaku, the world's fastest homogeneous supercomputer with 158,976 nodes, where each code contains a 48-core A64FX processor and 32 GB RAM. To project how long BLAST and SparkLeBLAST would take to complete a genomic diversity study of COVID-19, we first perform a feasibility study on a subset of 50 query sequences from a single COVID-19 patient to identify bottlenecks in sequence alignment processing. We then create a model using Amdahl's law to project the run times of NCBI BLAST and SparkLeBLAST on supercomputing systems like Fugaku. Based on the data from this 50-sequence feasibility study, our model predicts that NCBI BLAST, when running on all the cores of the Fugaku supercomputer, would take approximately 26.7 years to complete the full-scale study. In contrast, SparkLeBLAST, using both our query and database segmentation, would reduce the execution time to 0.026 years (i.e., 22.9 hours) - resulting in more than a 10,000× speedup over using the ubiquitous NCBI BLAST.
  • Optimizing and Scaling the 3D Reconstruction of Single-Particle Imaging
    Shah, Niteya; Sweeney, Christine; Ramakrishnaiah, Vinay; Donatelli, Jeffrey; Feng, Wu-chun (IEEE, 2024-05)
    An X-ray free electron laser (XFEL) facility can produce on the order of 1,000,000 extremely bright X-ray light pulses per second. Using an XFEL to image the atomic structure of a molecule requires fast analysis of an enormous amount of data, estimated to exceed one terabyte per second and requiring petabytes of storage. The SpiniFEL application provides such analysis by determining the 3D structure of proteins from single-particle imaging (SPI) experiments performed using XFELs, but it needs significantly better performance and efficiency to scale and keep up with the terabyte-per-second data production. Thus, this paper addresses the high-performance computing optimizations and scaling needed to improve this 3D reconstruction of SPI data. First, we optimize data movement, memory efficiency, and algorithms to improve the per-node computational efficiency and deliver a 5.28× speedup over the baseline GPU implementation.In addition, we achieved a 485× speedup for the post-analysis reconstruction resolution, which previously took as long as the 3D reconstruction of SPI data. Second, we present a novel distributed shared-memory computational algorithm to hide data latency and load-balance network traffic, thus enabling the processing of 128× more orientations than previously possible. Third, we conduct an exploratory study over the hyperparameter space for the SpiniFEL application to identify the optimal parameters for our underlying target hardware, which ultimately led to an up to 1.25× speedup for the number of streams. Overall, we achieve a 6.6× speedup (i.e., 5.28×1.25) over the previous fastest GPUMPI-based SpiniFEL realization.
  • Improved 2-D Chest CT Image Enhancement With Multi-Level VGG Loss
    Chaturvedi, Ayush; Prabhu, Ritvik; Yadav, Mukund; Feng, Wu-chun; Cao, Guohua (IEEE, 2025-03)
    Chest CT scans play an important role in diagnosing abnormalities associated with the lungs, such as tuberculosis, sarcoidosis, pneumonia, and, more recently, COVID-19. However, because conventional normal-dose chest CT scans require a much larger amount of radiation than x-rays, practitioners seek to replace conventional CT with low-dose CT (LDCT). LDCT often generates a low-quality CT image that poses noise and, in turn, negatively affects the accuracy of diagnosis. Therefore, in the context of COVID-19, due to the large number of affected populations, efficient image-denoising techniques are needed for LDCT images. Here, we present a deep learning (DL) model that combines two neural networks to enhance the quality of low-dose chest CT images. The DL model leverages a previously developed densenet and deconvolution-based network (DDNet) for feature extraction and extends it with a pretrained VGG network inside the loss function to suppress noise. Outputs from selected multiple levels in the VGG network (ML-VGG) are leveraged for the loss calculation. We tested our DDNet with ML-VGG loss using several sources of CT images and compared its performance to DDNet without VGG loss as well as DDNet with an empirically selected single-level VGG loss (DDNet-SL-VGG) and other state-of-the-art DL models. Our results show that DDNet combined with ML-VGG (DDNet-ML-VGG) achieves state-of-the-art denoising capabilities and improves the perceptual and quantitative image quality of chest CT images. Thus, DDNet with multilevel VGG loss could potentially be used as a post-acquisition image enhancement tool for medical professionals to diagnose and monitor chest diseases with higher accuracy.
  • Looking Back to Look Forward: 15 Years of the Green500
    Adhinarayanan, Vignesh; Feng, Wu-chun (IEEE, 2025-01)
    We revisit a Computer article from 15 years ago that introduced the Green500 -- a list ranking the most energy-efficient supercomputers. Our exploration centers on the advancements achieved during this time, highlighting a notable trend: the energy efficiency of supercomputers has approximately doubled every two years.
  • On the Landscape of Graph Clustering at Scale
    Dey, Saikat; Jha, Sonal; Wanye, Frank; Feng, Wu-chun (IEEE, 2025-06)
    Graph clustering, also known as community detection, is used to partition and analyze data across a gamut of disciplines, leading to new insights in fields like bioinformatics, networking, and cybersecurity. To keep pace with the exponential growth in collected data, much of the graph clustering research has increasingly pivoted towards developing parallel and distributed clustering algorithms. However, little work has been done to rigorously characterize such algorithms with respect to each other when using the same software stack, hardware stack, and graph dataset inputs. In this manuscript, we identify three open-source, state-of-the-art graph clustering algorithms and characterize the trade-offs between their accuracy and performance on real-world graphs. We show that the ideal choice of graph clustering algorithm depends on the (1) use case, (2) runtime requirements, and (3) accuracy requirements of the user. We provide guidelines for selecting the appropriate state-of-the-practice graph clustering algorithm and conduct a performance characterization of these algorithms through which we identify opportunities for future research in scalable and accurate graph clustering algorithms.
  • Scalable and Maintainable Distributed Sequence Alignment Using Spark
    Youssef, Karim; Elnady, Yusuf; Tilevich, Eli; Feng, Wu-chun (IEEE, 2025-07)
    The exponential growth of genomic data presents a challenge to bioinformatics research. NCBI BLAST, a popular pairwise sequence alignment tool, does not scale with the hundreds of gigabytes (GB) of sequenced data. Therefore, mpiBLAST was widely adopted and scaled up to 65,536 processors. However, mpiBLAST is tightly coupled with an obsolete NCBI BLAST version, creating a challenge to upgrading mpiBLAST with the ever-changing NCBI BLAST code. Recent parallel BLAST implementations, like SparkBLAST, use parallelism wrappers separate from NCBI BLAST to overcome this issue. However, query partitioning, a parallel method that duplicates the genome database on each compute node, makes SparkBLAST scale poorly with databases larger than a single node's memory. Thus, no parallel BLAST utility simultaneously addresses performance, scalability, and software maintainability. To fill this gap, we introduce SparkLeBLAST, a parallel BLAST tool that uses the Spark framework and efficient data partitioning to combine mpiBLAST's performance and scalability with SparkBLAST's simplicity and maintainability. SparkLeBLAST democratizes scalable genomic analysis for domain scientists without extensive distributed computing experience. SparkLeBLAST runs up to 6.68× faster than SparkBLAST. SparkLeBLAST also accelerates taxonomic assignment of COVID-19 genomic diversity analysis by 20.9× as it speeds up the BLAST search component by 88.6× using 128 compute nodes.
  • Optimizing Management of Persistent Data Structures in High-Performance Analytics
    Youssef, Karim; Iwabuchi, Keita; Gokhale, Maya; Feng, Wu-chun; Pearce, Roger (IEEE, 2026-01-01)
    Large-scale data analytics workflows ingest massive input data into various data structures, including graphs and key-value datastores. These data structures undergo multiple transformations and computations and are typically reused in incremental and iterative analytics workflows. Persisting in-memory views of these data structures enables reusing them beyond the scope of a single program run while avoiding repetitive raw data ingestion overheads. Memory-mapped I/O enables persisting in-memory data structures without data serialization and deserialization overheads. However, memory-mapped I/O lacks the key feature of persisting consistent snapshots of these data structures for incremental ingestion and processing. The obstacles to efficient virtual memory snapshots using memory-mapped I/O include background writebacks outside the application’s control, and the significantly high storage footprint of such snapshots. To address these limitations, we present Privateer, a memory and storage management tool that enables storage-efficient virtual memory snapshotting while also optimizing snapshot I/O performance. We integrated Privateer into Metall, a state-of-the-art persistent memory allocator for C++, and the Lightning Memory-Mapped Database (LMDB), a widely-used key-value datastore in data analytics and machine learning. Privateer optimized application performance by 1.22× when storing data structure snapshots to node-local storage, and up to 16.7× when storing snapshots to a parallel file system. Privateer also optimizes storage efficiency of incremental data structure snapshots by up to 11× using data deduplication and compression.
  • Bot Automation Using Large Language Models (LLMs) and Plugins
    Ramakrishnan, Naren; Butler, Patrick; Mayer, Brian B.; Neeser, Andrew (2024-07)
    The aim of this research study was to create tools that automate information extraction pipelines to support business processes in contract and procurement management. The research team was specifically asked to explore opportunities to use Large Language Models (LLMs) to accomplish this task. After reviewing the problem space and the potential solutions, the team designed and created a tool to generate reports on the status of entries from the Contractor Performance Assessment Reporting System (CPARS), broken down by contracting division. This tool automates the extraction of the Contracting Officer’s Representative (COR) status information. The team also explored methods for using LLM pipelines to automate other potential contractual management tasks and presented some demonstrations of possible uses. The research indicated that LLMs have significant potential to enhance contract and procurement management processes, e.g., automating field extraction from existing contracts, assisting contract generation and customization, rapid contract analysis, and streamlining routine document processing tasks. Based on demonstrations the sponsor agreed on their potential. Yet, while the potential benefits are substantial there are concerns with data privacy and security, accuracy and reliability, legal and compliance issues, and integration with existing systems. To mitigate these concerns and maximize benefits, the research team suggests focusing on local, open-source LLM solutions like LLaMA or Phi. These models can be deployed on-premises, ensuring data privacy and security while providing powerful LLM capabilities including customization and specialization.
  • AI-Based DPCAP FAR/DFARS Change Support Tool
    Ramirez-Marquez, Jose; Gorman, Joshua; Akram, Amer; Buettner, Douglas J.; Mayer, Brian B.; Butler, Patrick; Ramakrishnan, Naren; Freedman, Bradley (2025-04-02)
    The Department of Defense’s Defense Pricing, Contracting, and Acquisition Policy Contract Policy Directorate in the Office of the Assistant Secretary of Defense is responsible for periodic updates to the Federal Acquisition Regulation (FAR) and Defense FAR Supplement (DFARS) based on changes in the National Defense Authorization Act (NDAA), Small Business Administration rule changes, U.S. Department of Labor rule changes, or from executive orders. Reading through and assessing these documents for changes that require corresponding changes to acquisition regulations is labor-intensive. Further, when rule changes are proposed to the public for comments, reading and summarizing these public comments can range from straightforward to very labor-intensive. In this paper, we report our initial research results to greatly improve the efficiency of analyzing the NDAA language for required updates of the FAR and DFARS, and issuance of memoranda and guidance using artificial intelligence, including large language models and advanced natural language processing techniques to provide an improvement in staff efficiency for these laborious tasks.
  • Test and Evaluation of Large Language Models to Support Informed Government Acquisition
    Chandrasekaran, Jaganmohan; Mayer, Brian B.; Frase, Heather; Lanus, Erin; Butler, Patrick; Adams, Stephen C.; Gregersen, Jared; Ramakrishnan, Naren; Freeman, Laura J. (2025-04-02)
    As large language models (LLMs) continue to advance and find applications in critical decision-making systems, robust and thorough test and evaluation (T&E) of these models will be necessary to ensure we reap their promised benefits without the risks that often come with LLMs. Most existing applications of LLMs are in specific areas like healthcare, marketing, and customer support and thus these domains have influenced their T&E processes. When investigating LLMs for government acquisition, we encounter unique challenges and opportunities. Key challenges include managing the complexity and novelty of Artificial Intelligence (AI) systems and implementing robust risk management practices that can pass muster with the stringency of government regulatory requirements. Data management and transparency are critical concerns, as is the need for ensuring accuracy (performance). Unlike traditional software systems developed for specific functionalities, LLMs are capable of performing a wide variety of functionalities (e.g., translation, generation). Furthermore, the primary mode of interaction with an LLM is through natural language. These unique characteristics necessitate a comprehensive evaluation across diverse functionalities and accounting for the variability in the natural language inputs/outputs. Thus, the T&E for LLMs must support evaluating the model’s linguistic capabilities (understanding, reasoning, etc.), generation capabilities (e.g., correctness, coherence, and contextually relevant responses), and other quality attributes (fairness, security, lack of toxicity, robustness). T&E must be thorough, robust, and systematic to fully realize the capabilities and limitations (e.g., hallucinations and toxicity) of LLMs and to ensure confidence in their performance. This work aims to provide an overview of the current state of T&E methods for ascertaining the quality of LLMs and structured recommendations for testing LLMs, thus resulting in a process for assuring warfighting capability.
  • Evaluating Assessment Practices in Team-Based Computing Capstone Projects
    Hooshangi, Sara; Shakil, Asma; Riddle, Steve; Aydin, Ilknur; Nasir, Nayla; Parupudi, Tejasvi; Rehman, Attiqa; Scott, Michael James; Vahrenhold, Jan; Weerasinghe, Amali; Wu, Xi (ACM, 2025-06-27)
    Team-based capstone projects are vital in preparing computer science students for real-world work by developing teamwork, communication, and industry-relevant technical skills. Their assessment, however, is challenging, requiring alignment between academic criteria and external stakeholder expectations, fair evaluation of individual contributions, recognition of diverse skills, and clarity on external partners' involvement in the evaluation process. The high stakes of these projects further demand transparent and equitable assessment methods that are perceived as fair by all involved. Our working group (WG) addresses the challenges of capstone project assessment by examining the perspectives of instructors, students, and external stakeholders to support fair and effective evaluation. Building on insights from our previous WG and a comprehensive review of the literature, we used a mixed-methods approach combining online surveys (quantitative) and in-depth interviews (qualitative) with instructors, students, and external stakeholders. In total, we collected 66 survey responses and conducted 30 interviews across multiple countries and institutions, capturing a diverse range of global perspectives on capstone course assessments. Insights from instructors and students revealed several commonalities, for example, in the types of assessed components and the challenges of identifying and addressing non-contributing group members. Our findings also revealed clear variation between instructor and student perspectives on how contributions are measured and weighted. Instructors were reluctant to rely heavily on peer or self-evaluation due to concerns about reliability, preferring scaffolded assessments and early-warning systems to gather contribution data and moderate team dynamics. They viewed contribution-based grading as positive but resource-intensive. Students, in contrast, emphasized the need for more transparency, formative feedback, and accurate recognition of individual contributions. They also expressed concerns about the lack of recognition for hidden labor (e.g., project management, team coordination), assessor inconsistency, and a reluctance to critique peers. Instructors treated peer input as supplementary evidence, whereas students perceived it as high-stakes and socially risky. Stakeholder involvement in assessment was generally limited to providing formative feedback and participating in final showcase events. We also identified generative AI as a rapidly evolving challenge, with both students and instructors seeking guidance on acceptable use and exploring opportunities to automate aspects of assessment. Our results offer actionable evidence-based guidance for designing transparent and equitable assessment practices in team-based computing capstones.
  • Enabling Open Educational Resource Adoption through Integrated Sharing in PrairieLearn
    Poulsen, Seth; Herman, Geoffrey; Silva, Mariana; Fowler, Max; Smith, David H. IV; Porter, Leo; Ritschel, Nico; Zilles, Craig; West, Matthew (ACM, 2026-02-18)
    This paper introduces the PrairieLearn Question Sharing System (PQSS), which enables instructors to share question generators with other instructors, either as open educational resources or privately. PQSS is integrated into PrairieLearn, an open-source, problemdriven online learning platform. PQSS addresses a critical need for more open-source assessments by making it easier for instructors to share assessments and for instructors to use those assessments. Instructors often do not share questions due to the time it takes to publish them and the lack of recognition for their work. Because it is directly integrated into PrairieLearn, PQSS reduces the aforementioned friction of sharing and using shared questions, and we can report usage statistics to help question authors receive recognition for their work. In this paper, we share design and implementation details of the system, as well as experiences using it to share course content across courses and between universities.
  • A Call for Critical Technology to Enable Innovative and Alternative Grading Practices
    Decker, Adrienne; Edwards, Stephen H.; Edmison, Bob; Pérez-Quiñones, Manuel; Rorrer, Audrey (ACM, 2026-02-18)
    The call for alternative grading practices has been made both inside and outside the computing education community. Various practices exist to provide assessment and feedback to students that do not rely strictly on points out of one hundred percent, weighted averages, high stakes assignments, and grading for behaviors instead of learning. However, modern classrooms, especially computer science classrooms, rely on a myriad of digital tools to organize and maintain the course structure. Tools like learning management systems, automatic grading systems, submission systems, and practice systems all exist for computing students and faculty to use to help support the learning of programming concepts. By and large, these systems all rely on an underlying mechanism of points and aggregating points for scoring. In the face of such technological choices, adopting alternative grading practices can prove challenging for instructors and confusing for students. In this position paper, we advocate addressing key research problems to make these systems easier to use with alternative grading practices. These include comprehensive support for categorical grading, comprehensive support for rework and resubmission, and improved protocols for communication of scores and feedback. We propose an extension to LTI to support the needs of alternative grading practices, and we provide an initial design for this LTI extension. We discuss current problems and potential solutions and challenge the community to work on these problems and consider the design of future systems to embrace grading approaches that go beyond just points-based scoring.
  • A Multi-Institutional Study on Peer Instruction: Evaluating Text-Chat with Assigned Group Members vs Verbal Discussion
    Gu, Xingjian; Ericson, Barbara; Wu, Zihan; Ellis, Margaret O'Neil; Pearce, Janice; Rodger, Susan; Velasco, Yesenia (ACM, 2026-02-18)
    In Peer Instruction (PI) an instructor displays a challenging multiplechoice question during lecture that students answer individually, discuss verbally with nearby peers, answer individually again, and finally, the instructor leads a discussion of the question. Peer Instruction typically increases student learning and motivation over traditional lecture. We added a text-chat mode to improve PI for remote synchronous learning. This feature assigns students to discussion groups to maximize the number of groups that have members with different answers. The tool was pilot tested in Winter 2022 and revised. In Fall 2022 and Winter 2023, it was tested at one institution. In Fall 2024, it was tested at four institutions. We conducted a log file analysis of student data from 1394 students and analyzed surveys with 848 student responses. We found that questions answered using the text-chat had a significantly higher improvement than those using traditional verbal discussion, although the two modes were tested with different questions. Interestingly, most of the students preferred to discuss the question verbally, although some preferred the text-chat discussion. These results inform efforts to improve the effectiveness of Peer Instruction and increase its adoption.