Scholarly Works, Computer Science
Permanent URI for this collection
Research articles, presentations, and other scholarship
Browse
Recent Submissions
- Motivational climate predicts effort and achievement in a large computer science course: examining differences across sexes, races/ethnicities, and academic majorsJones, Brett D.; Ellis, Margaret; Gu, Fei; Fenerci, Hande (2023-11-13)Background The motivational climate within a course has been shown to be an important predictor of students’ engagement and course ratings. Because little is known about how students’ perceptions of the motivational climate in a computer science (CS) course vary by sex, race/ethnicity, and academic major, we investigated these questions: (1) To what extent do students’ achievement and perceptions of motivational climate, cost, ease, and effort vary by sex, race/ethnicity, or major? and (2) To what extent do the relationships between students’ achievement and perceptions of motivational climate, cost, and effort vary by sex, race/ethnicity, and major? Participants were enrolled in a large CS course at a large public university in the southeastern U.S. A survey was administered to 981 students in the course over three years. Path analyses and one-way MANOVAs and ANOVAs were conducted to examine differences between groups. Results Students’ perceptions of empowerment, usefulness, interest, and caring were similar across sexes and races/ethnicities. However, women and Asian students reported lower success expectancies. Students in the same academic major as the course topic (i.e., CS) generally reported higher perceptions of the motivational climate than students who did not major or minor in the course topic. Final grades in the course did not vary by sex or race/ethnicity, except that the White and Asian students obtained higher grades than the Black students. Across sex, race/ethnicity, and major, students’ perceptions of the motivational climate were positively related to effort, which was positively related to achievement. Conclusions One implication is that females, Asian students, and non-CS students may need more support, or different types of support, to help them believe that they can succeed in computer science courses. On average, these students were less confident in their abilities to succeed in the course and were more likely to report that they did not have the time needed to do well in the course. A second implication for instructors is that it may be possible to increase students’ effort and achievement by increasing students’ perceptions of the five key constructs in the MUSIC Model of Motivation: eMpowerment, Usefulness, Success, Interest, and Caring.
- RoVista: Measuring and Analyzing the Route Origin Validation (ROV) in RPKILi, Weitong; Lin, Zhexiao; Ashiq, Md. Ishtiaq; Aben, Emile; Fontugne, Romain; Phokeer, Amreesh; Chung, Taejoong (ACM, 2023-10-24)The Resource Public Key Infrastructure (RPKI) is a system to add security to the Internet routing. In recent years, the publication of Route Origin Authorization (ROA) objects, which bind IP prefixes to their legitimate origin ASN, has been rapidly increasing. However, ROAs are effective only if the routers use them to verify and filter invalid BGP announcements, a process called Route Origin Validation (ROV). There are many proposed approaches to measure the status of ROV in the wild, but they are limited in scalability or accuracy. In this paper, we present RoVista, an ROV measurement framework that leverages IP-ID side channel and in-the-wild RPKI-invalid prefix. With over 20 months of longitudinal measurement, RoVista successfully covers more than 28K ASes where 63.8% of ASes have derived benefits from ROV, although the percentage of fully protected ASes remains relatively low at 12.3%. In order to validate our findings, we have also sought input from network operators. We then evaluate the security impact of current ROV deployment and reveal misconfigurations that will weaken the protection of ROV. Lastly, we compare RoVista with other approaches and conclude with a discussion of our findings and limitations.
- Physics-Guided Deep Generative Model For New Ligand DiscoverySagar, Dikshant; Risheh, Ali; Sheikh, Nida; Forouzesh, Negin (ACM, 2023-09-03)Structure-based drug discovery aims to identify small molecules that can attach to a specific target protein and change its functionality. Recently, deep learning has shown great promise in generating drug-like molecules with specific biochemical features and conditioned with structural features. However, they usually fail to incorporate an essential factor: the underlying physics which guides molecular formation and binding in real-world scenarios. In this work, we describe a physics-guided deep generative model for new ligand discovery, conditioned not only on the binding site but also on physics-based features that describe the binding mechanism between a receptor and a ligand. The proposed hybrid model has been tested on large protein-ligand complexes and small host-guest systems. Using the top-𝑁 methodology, on average more than 75% of the generated structures by our hybrid model were stronger binders than the original reference ligand. All of them had higher Δ𝐺𝑏𝑖𝑛𝑑 (affinity) values than the ones generated by the previous state-of-the-art method by an average margin of 1.88 kcal/mol. The visualization of the top-5 ligands generated by the proposed physics-guided model and the reference deep learning model demonstrate more feasible conformations and orientations by the former. The future directions include training and testing the hybrid model on larger datasets, adding more relevant physics-based features, and interpreting the deep learning outcomes from biophysical perspectives.
- Text-to-ESQ: A Two-Stage Controllable Approach for Efficient Retrieval of Vaccine Adverse Events from NoSQL DatabaseZhang, Wenlong; Zeng, Kangping; Yang, Xinming; Shi, Tian; Wang, Ping (ACM, 2023-09-03)The Vaccine Adverse Event Reporting System (VAERS) contains detailed reports of adverse events following vaccine administration. However, efficiently and accurately searching for specific information from VAERS poses significant challenges, especially for medical experts. Natural language querying (NLQ) methods tackle the challenge by translating the input questions into executable queries, allowing for the exploration of complex databases with large amounts of information. Most existing studies focus on the relational database and solve the Text-to-SQL task. However, the capability of full-text for Text-to-SQL is greatly limited by the data structures and functionality of the SQL databases. In addition, the potential of natural language querying has not been comprehensively explored in the healthcare domain. To overcome these limitations, we investigate the potential of NoSQL databases, specifically Elasticsearch, and forge a new research direction for NLQ, which we refer to as Text-to-ESQ generation. This exploration requires us to re-design various aspects of NLQ, such as the target application and the advantages of NoSQL database. In our approach, we develop a two-stage controllable (TSC) framework consisting of a question-to-question (Q2Q) translation module and an ESQ condition extraction (ECE) module. These modules are carefully designed to efficiently retrieve information from the VEARS data stored in a NoSQL database. Additionally, we construct a dedicated question-ESQ pair dataset called VAERSESQ, to support the task in the healthcare domain. Extensive experiments were conducted on the VAERSESQ dataset to evaluate the proposed methods. The results, both quantitative and qualitative, demonstrate the accuracy and efficiency of our approach in generating queries for NoSQL databases, thus enabling efficient retrieval of VEARS data.
- GRAPPEL: A Graph-based Approach for Early Risk Assessment of Acute Hypertension in Critical CareJha, Sonal; Feng, Wu-chun (ACM, 2023-09-03)An acute hypertensive episode (AHE) refers to a period of extremely high blood pressure (BP) that can arise suddenly in critical care, and, if not identified early, can subject patients to the risk of severe organ damage and even potential mortality. The early assessment of AHE risk saves lives by enabling proactive medical intervention. We propose GRAPPEL, a novel graph-based approach that assesses a patient’s risk of experiencing an AHE before it occurs based on the analysis of their BP recorded over time in critical care. Our algorithm consists of two major components: (1) the construction of a time-evolving graph representation of a patient’s time-series BP data to encode the temporal BP variations into a graph and (2) the generation of real-time AHE risk scores based on quantifying the graph changes at each time step, triggered by the arrival of a new BP record. Notably, GRAPPEL provides real-time and early AHE risk assessment based solely on BP records that can be irregularly spaced in time, making it suitable for critical care environments. Via our extensive experiments on 3,476 critical-care visit records, we demonstrate the superiority of our approach over existing methods by achieving an AUC-ROC score of 91% in identifying patients at risk of experiencing an AHE up to 170 minutes in advance (and an AUC-ROC score of 94% up to 20 minutes in advance).
- MArBLE: Hierarchical Multi-Armed Bandits for Human-in-the-Loop Set ExpansionWahed, Muntasir; Gruhl, Daniel; Lourentzou, Ismini (ACM, 2023-10-21)The modern-day research community has an embarrassment of riches regarding pre-trained AI models. Even for a simple task such as lexicon set expansion, where an AI model suggests new entities to add to a predefined seed set of entities, thousands of models are available. However, deciding which model to use for a given set expansion task is non-trivial. In hindsight, some models can be ‘off topic’ for specific set expansion tasks, while others might work well initially but quickly exhaust what they have to offer. Additionally, certain models may require more careful priming in the form of samples or feedback before being fine-tuned to the task at hand. In this work, we frame this model selection as a sequential non-stationary problem, where there exist a large number of diverse pre-trained models that may or may not fit a task at hand, and an expert is shown one suggestion at a time to include in the set or not, i.e., accept or reject the suggestion. The goal is to expand the list with the most entities as quickly as possible. We introduce MArBLE, a hierarchical multi-armed bandit method for this task, and two strategies designed to address cold-start problems. Experimental results on three set expansion tasks demonstrate MArBLE’s effectiveness compared to baselines.
- Knowledge-Enhanced Multi-Label Few-Shot Product Attribute-Value ExtractionGong, Jiaying; Chen, Wei-Te; Eldardiry, Hoda (ACM, 2023-10-21)Existing attribute-value extraction (AVE) models require large quantities of labeled data for training. However, new products with new attribute-value pairs enter the market every day in real-world e- Commerce. Thus, we formulate AVE in multi-label few-shot learning (FSL), aiming to extract unseen attribute value pairs based on a small number of training examples. We propose a Knowledge- Enhanced Attentive Framework (KEAF) based on prototypical networks, leveraging the generated label description and category information to learn more discriminative prototypes. Besides, KEAF integrates with hybrid attention to reduce noise and capture more informative semantics for each class by calculating the label-relevant and query-related weights. To achieve multi-label inference, KEAF further learns a dynamic threshold by integrating the semantic information from both the support set and the query set. Extensive experiments with ablation studies conducted on two datasets demonstrate that our proposed model significantly outperforms other SOTA models for information extraction in few-shot learning.
- Chatterbox Opener: A Game to Support Healthy Communication and RelationshipsWang, Wei-Lu; Haqq, Derek; Saaty, Morva; Cao, Yusheng; Fan, Jixiang; Patel, Jaitun V.; McCrickard, D. Scott (ACM, 2023-10-06)Computer Mediation Communication (CMC) applications are utilized to foster closer relationships between individuals. Various shared experience strategy designs were widely applied to technologies in order to enhance communications and interactions in family relationships. However, there needs to be more research on how shared experience approaches work in different family communication patterns. This paper presents insights into the effectiveness of three types of shared experience approaches for different family communication patterns and design considerations for game design from a diary study of Chatterbox Opener, the game we developed for families and couples to enhance communication orientation.
- Fit to Draw: An Elevation of Location-Based ExergamesSaxena, Roshni; Gaydos, Zachary; Saaty, Morva; Haqq, Derek; Nair, Priyanka; Grutzik, Gary; Wang, Wei Lu; Patel, Jaitun (ACM, 2023-10-06)Many location-based games have a multiplayer aspect; however, this is typically inconsequential to the actual gameplay, which is usually geared toward a single-player experience. Thus, we present Fit to Draw, a multiplayer location-based exergame that combines simple picture-guessing gameplay with physical movement. While other location-based games have the gameplay elements tangentially related to physical movement, Fit to Draw requires players to walk outdoors to draw a picture based on a given word. Companion players then guess what other players drew to earn points, providing a multiplayer and social experience that many other location-based games do not have. The goals of Fit to Draw are to motivate users to exercise, enjoy the outdoors, socialize, and have an opportunity to be creative.
- In-the-Wild Experiences with an Interactive Glanceable AR System for Everyday UseLu, Feiyu; Pavanatto, Leonardo; Bowman, Douglas A. (ACM, 2023-10-13)Augmented reality head-worn displays (AR HWDs) of the near future will be worn all day every day, delivering information to users anywhere and anytime. Recent research has explored how information can be presented on AR HWDs to facilitate easy acquisition without intruding on the user’s physical tasks. However, it remains unclear what users would like to do beyond passive viewing of information, and what are the best ways to interact with everyday content displayed in AR HWDs. To address this gap, our research focuses on the implementation of a functional prototype that leverages the concept of Glanceable AR while incorporating various interaction capabilities for users to take quick actions on their personal information. Instead of being overwhelmed and continuously attentive to virtual information, our system centers around the idea that virtual information should stay invisible and unobtrusive when not needed but is quickly accessible and interactable. Through an in-the-wild study involving three AR experts, our findings shed light on how to design interactions in AR HWDs to support everyday tasks, as well as how people perceive using feature-rich Glanceable AR interfaces during social encounters.
- PrivMon: A Stream-Based System for Real-Time Privacy Attack Detection for Machine Learning ModelsKo, Myeongseob; Yang, Xinyu; Ji, Zhengjie; Just, Hoang Anh; Gao, Peng; Kumar, Anoop; Jia, Ruoxi (ACM, 2023-10-16)Machine learning (ML) models can expose the private information of training data when confronted with privacy attacks. Specifically, a malicious user with black-box access to a ML-as-a-service platform can reconstruct the training data (i.e., model inversion attacks) or infer the membership information (i.e., membership inference attacks) simply by querying the ML model. Despite the pressing need for effective defenses against privacy attacks with black-box access, existing approaches have mostly focused on enhancing the robustness of the ML model via modifying the model training process or the model prediction process. These defenses can compromise model utility and require the cooperation of the underlying AI platform (i.e., platform-dependent). These constraints largely limit the real-world applicability of existing defenses. Despite the prevalent focus on improving the model’s robustness, none of the existing works have focused on the continuous protection of already deployed ML models from privacy attacks by detecting privacy leakage in real-time. This defensive task becomes increasingly important given the vast deployment of MLas- a-service platforms these days. To bridge the gap, we propose PrivMon, a new stream-based system for real-time privacy attack detection for ML models. To facilitate wide applicability and practicality, PrivMon defends black-box ML models against a wide range of privacy attacks in a platform-agnostic fashion: PrivMon only passively monitors model queries without requiring the cooperation of the model owner or the AI platform. Specifically, PrivMon takes as input a stream of ML model queries and provides an efficient attack detection engine that continuously monitors the stream to detect the privacy attack in real-time, by identifying self-similar malicious queries. We show empirically and theoretically that PrivMon can detect a wide range of realistic privacy attacks within a practical time frame and successfully mitigate the attack success rate. Code is available at https://github.com/ruoxi-jia-group/privmon.
- A Diary Study in Social Virtual Reality: Impact of Avatars with Disability Signifiers on the Social Experiences of People with DisabilitiesZhang, Kexin; Deldari, Elmira; Yao, Yaxing; Zhao, Yuhang (ACM, 2023-10-22)People with disabilities (PWD) have shown a growing presence in the emerging social virtual reality (VR). To support disability representation, some social VR platforms start to involve disability features in avatar design. However, it is unclear how disability disclosure via avatars (and the way to present it)would afect PWD’s social experiences and interaction dynamics with others. To fll this gap, we conducted a diary study with 10 PWD who freely explored VRChat—a popular commercial social VR platform—for two weeks, comparing their experiences between using regular avatars and avatars with disability signifers (i.e., avatar features that indicate the user’s disability in real life). We found that PWD preferred using avatars with disability signifers and wanted to further enhance their aesthetics and interactivity. However, such avatars also caused embodied, explicit harassment targeting PWD. We revealed the unique factors that led to such harassment and derived design implications and protection mechanisms to inspire more safe and inclusive social VR.
- DiLogics: Creating Web Automation Programs with Diverse LogicsPu, Kevin; Yang, Jim; Yuan, Angel; Ma, Minyi; Dong, Rui; Wang, Xinyu; Chen, Yan; Grossman, Tovi (ACM, 2023-10-29)Knowledge workers frequently encounter repetitive web data entry tasks, like updating records or placing orders. Web automation increases productivity, but translating tasks to web actions accurately and extending to new specifications is challenging. Existing tools can automate tasks that perform the same logical trace of UI actions (e.g., input text in each field in order), but do not support tasks requiring different executions based on varied input conditions.We present DiLogics, a programming-by-demonstration system that utilizes NLP to assist users in creating web automation programs that handle diverse specifications. DiLogics first semantically segments input data to structured task steps. By recording user demonstrations for each step, DiLogics generalizes the web macros to novel but semantically similar task requirements. Our evaluation showed that non-experts can effectively use DiLogics to create automation programs that fulfill diverse input instructions. DiLogics provides an efficient, intuitive, and expressive method for developing web automation programs satisfying diverse specifications.
- TaleMate: Collaborating with Voice Agents for Parent-Child Joint Reading ExperiencesVargas-Diaz, Daniel; Karunaratna, Sulakna; Kim, Jisun; Lee, Sang Won; Choi, Koeun (ACM, 2023-10-29)Joint reading is a key activity for early learners, with caregiver-child interactions such as questioning and feedback playing an essential role in children’s cognitive and linguistic development. However, for some parents, actively engaging children in storytelling can be challenging. To address this, we introduce TaleMate—a platform designed to enhance shared reading by leveraging conversational agents that have been shown to support children’s engagement and learning. TaleMate enables a dynamic, participatory reading experience where parents and children can choose which characters they wish to embody. Moreover, the system navigates the challenges posed by digital reading tools, such as decreased parent-child interaction, and builds upon the benefits of traditional and digital reading techniques. TaleMate offers an innovative approach to fostering early reading habits, bridging the gap between traditional joint reading practices and the digital reading landscape.
- Context-Aware Sit-Stand Desk for Promoting Healthy and Productive BehaviorsHu, Donghan; Bae, Joseph; Lim, Sol; Lee, Sang Won (ACM, 2023-10-29)To mitigate the risk of chronic diseases caused by prolonged sitting, sit-stand desks are promoted as an effective intervention to foster healthy behaviors among knowledge workers by allowing periodic posture switching between sitting and standing. However, conventional systems either let users manually switch the mode, and some research visited automated notification systems with pre-set time intervals. While this regular notification can promote healthy behaviors, such notification can act as external interruptions that hinder individuals’ working productivity. Notably, knowledge workers are known to be reluctant to change their physical postures when concentrating. To address these issues, we propose considering work context based on their screen activities to encourage computer users to alternate their postures when it can minimize disruption, promoting healthy and productive behaviors. To that end, we are in the process of building a context-aware sit-stand desk that can promote healthy and productive behaviors. To that end, we have completed two modules: an application that monitors users’ computer’s ongoing activities and a sensor module that can measure the height of sit-stand desks for data collection. The collected data includes computer activities, measured desk height, and their willingness to switch to standing modes and will be used to build an LSTM prediction model to suggest optimal time points for posture changes, accompanied by appropriate desk height. In this work, we acknowledge previous relevant research, outline ongoing deployment efforts, and present our plan to validate the effectiveness of our approach via user studies.
- VizPI: A Real-Time Visualization Tool for Enhancing Peer Instruction in Large-Scale Programming LecturesTang, Xiaohang; Chen, Xi; Wong, Sam; Chen, Yan (ACM, 2023-10-29)Peer instruction (PI) has shown significant potential in facilitating student engagement and collaborative learning. However, the implementation of PI for large-scale programming lectures has proven challenging due to difficulties in monitoring student engagement, discussion topics, and code changes. This paper introduces VizPI, an interactive web tool that enables instructors to conduct, monitor, and assess PI for programming exercises in real-time. With features that visualize the progress of student discussions and code submissions, VizPI allows for more effective oversight of PI activities and the provision of personalized feedback at scale. Our work aims to transform the pedagogical approach to PI in programming education, making it more engaging and adaptable to student needs.
- Deception in Drone Surveillance Missions: Strategic vs. Learning ApproachesWan, Zelin; Cho, Jin-Hee; Zhu, Mu; Anwar, Ahmed H.; Kamhoua, Charles; Singh, Munindar (ACM, 2023-10-23)Unmanned Aerial Vehicles (UAVs) have been used for surveillance operations, search and rescue missions, and delivery services. Given their importance and versatility, they naturally become targets for cyberattacks. Denial-of-Service (DoS) attacks are commonly considered to exhaust their resources or crash UAVs (or drones). This work proposes a unique proactive defense using honey drones (HD) for UAVs during surveillance operations. These HDs use lightweight virtual machines to lure and redirect potential DoS attacks. Both the choice of target by the attacker and the HD’s deceptive tactics are influenced by the strength of the radio signal. However, a critical trade-off exists in that stronger signals can deplete battery life, while weaker signals can negatively affect the connectivity of a drone fleet network. To address this, we formulate an optimization problem to select the best strategies for an attacker or defender in selecting their signal strength level. We propose a novel HD-based defense to identify the optimal setting using deep reinforcement learning (DRL) or game theory and compare their performance with that of non-HD-based methods, such as Intrusion Detection Systems and ContainerDrone. Our experiments demonstrate the unique benefits and superior efficacy of each HD-based defense across various attack scenarios.
- Bridging the Gap between Spatial and Spectral Domains: A Unified Framework for Graph Neural NetworksChen, Zhiqian; Chen, Fanglan; Zhang, Lei; Ji, Taoran; Fu, Kaiqun; Zhao, Liang; Chen, Feng; Wu, Lingfei; Aggarwal, Charu; Lu, Chang-Tien (ACM, 2023-10)Deep learning's performance has been extensively recognized recently. Graph neural networks (GNNs) are designed to deal with graph-structural data that classical deep learning does not easily manage. Since most GNNs were created using distinct theories, direct comparisons are impossible. Prior research has primarily concentrated on categorizing existing models, with little attention paid to their intrinsic connections. The purpose of this study is to establish a unified framework that integrates GNNs based on spectral graph and approximation theory. The framework incorporates a strong integration between spatial- and spectral-based GNNs while tightly associating approaches that exist within each respective domain.
- Rare Category Analysis for Complex Data: A ReviewZhou, Dawei; He, Jingrui (ACM, 2023-10)Despite the sheer volume of data being collected, it is often the rare categories that are of the most important in many high impact domains, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from spam image detection in social media to rare disease diagnosis in the medical decision support system. This survey aims to provide a concise review of the state-of-the-art techniques on complex rare category analysis, where the majority classes have a smooth distribution while the minority classes exhibit the compactness property in the feature space or subspace. More specifically, we start with the introduction, problem definition, and unique challenges of complex rare category analysis, then present a comprehensive review of recent advances that are designed for this problem setting, from rare category exploration without any label information to the exposition step that characterizes rare examples with a compact representation, from representing rare patterns in a salient embedding space to interpreting the prediction results and providing relevant clues for the end-users' interpretation; finally, we discuss the potential problems and shed light on the future directions of complex rare category analysis.
- Measurement of Embedding Choices on Cryptographic API Completion TasksXiao, Ya; Song, Wenjia; Ahmed, Salman; Ge, Xinyang; Viswanath, Bimal; Meng, Na; Yao, Danfeng (ACM, 2023-10)In this paper, we conduct a measurement study to comprehensively compare the accuracy impacts of multiple embedding options in cryptographic API completion tasks. Embedding is the process of automatically learning vector representations of program elements. Our measurement focuses on design choices of three important aspects, program analysis preprocessing, token-level embedding, and sequence-level embedding. Our findings show that program analysis is necessary even under advanced embedding. The results show 36.20% accuracy improvement on average when program analysis preprocessing is applied to transfer byte code sequences into API dependence paths. With program analysis and the token-level embedding training, the embedding dep2vec improves the task accuracy from 55.80% to 92.04%. Moreover, only a slight accuracy advantage (0.55% on average) is observed by training the expensive sequence-level embedding compared with the token-level embedding. Our experiments also suggest the differences made by the data. In the cross-app learning setup and a data scarcity scenario, sequence-level embedding is more necessary and results in a more obvious accuracy improvement (5.10%)