Journal Articles, Association for Computing Machinery (ACM)

Permanent URI for this collection

https://hdl.handle.net/10919/105038

Browse

Now showing 1 - 20 of 379

Survey of Approaches and Techniques for Security Verification of Computer Systems
Erata, Ferhat; Deng, Shuwen; Zaghloul, Faisal; Xiong, Wenjie; Demir, Onur; Szefer, Jakub (ACM)
This paper surveys the landscape of security verification approaches and techniques for computer systems at different levels: from a software-application level all the way to the physical hardware level. Different existing projects are compared, based on the tools used and security aspects being examined. Since many systems require both hardware and software components to work together to provide the system's promised security protections, it is not sufficient to verify just the software levels or just the hardware levels in a mutually exclusive fashion. This survey especially highlights system levels that are verified by the different existing projects and presents to the readers the state of the art in hardware and software system security verification. Few approaches come close to providing full-system verification, and there is still much room for improvement.
Memetic algorithms for Spatial Partitioning problems
Biswas, Subhodip; Chen, Fanglan; Chen, Zhiqian; Lu, Chang-Tien; Ramakrishnan, Naren (ACM)
Spatial optimization problems (SOPs) are characterized by spatial relationships governing the decision variables, objectives, and/or constraint functions. In this article, we focus on a specific type of SOP called spatial partitioning, which is a combinatorial problem due to the presence of discrete spatial units. Exact optimization methods do not scale with the size of the problem, especially within practicable time limits. This motivated us to develop population-based metaheuristics for solving such SOPs. However, the search operators employed by these population-based methods are mostly designed for real-parameter continuous optimization problems. For adapting these methods to SOPs, we apply domain knowledge in designing spatially-aware search operators for efficiently searching through the discrete search space while preserving the spatial constraints. To this end, we put forward a simple yet effective algorithm called SPATIAL and test it on the school (re)districting problem. Detailed experimental investigations are performed on real-world datasets to evaluate the performance of SPATIAL. Besides, ablation studies are performed to understand the role of the individual components of SPATIAL. Additionally, we discuss how SPATIAL is helpful in the real-life planning process, its applicability to different scenarios, and motivate future research directions.
Toward a Design Theory of Game-Mediated Social Experiences - A Study of Among Us
Haqq, Derek; Saaty, Morva; Rukaj, Jonathan; Marulkar, Saylee; Israel, Justin; Newton, Emily; Patel, Rudra; Tan, Stephen; McCrickard, D. Scott (ACM, 2021-10-15)
The COVID-19 pandemic has greatly affected face-to-face social interaction with and among relational partners - relatives, friends, and others. Prior to the pandemic, many people relied on face-toface social play experiences to help them maintain relationships and satisfy relatedness needs. However, growing coronavirus-related concerns have made such activities unwelcome or inaccessible, leading many to turn to technology-mediated experiences, as a safer alternative means of supporting recreational play with nonproximal relational partners. But how does one design technologymediated recreational play experiences to satisfy a diverse range of user needs, interests, and preferences? To explore this area of interest we study the social experience afforded by the multiplayer game, Among Us. We conduct a diary study with students enrolled in an undergraduate HCI course and report on the findings of a post-study reflective activity. Our findings highlight that casual interdependent games that explicitly and implicitly foster social interaction among players, do provide opportunities for satisfying remote play social experiences when augmented by rich communication technologies.
Traces of Time through Space: Advantages of Creating Complex Canvases in Collaborative Meetings
Kirshenbaum, Nurit; Davidson, Kylie; Harden, Jesse; North, Christopher L.; Kobayashi, Dylan; Theriot, Ryan; Tabalba, Roderick; Rogers, Michael; Belcaid, Mahdi; Burks, Andrew; Bharadwaj, Krishna; Renambot, Luc; Johnson, Andrew; Long, Lance; Leigh, Jason (ACM, 2021-11-05)
Technology have long been a partner of workplace meeting facilitation. The recent outbreak of COVID-19 and the cautionary measures to reduce its spread have made it more prevalent than ever before in the form of online-meetings. In this paper, we recount our experiences during weekly meetings in three modalities: using SAGE2 - a collaborative sharing software designed for large displays - for co-located meetings, using a conventional projector for co-located meetings, and using the Zoom video-conferencing tool for distributed meetings. We view these meetings through the lens of effective meeting attributes and share ethnographic observations and attitudinal survey conducted in our research lab. We discuss patterns of content sharing, either sequential, parallel, or semi-parallel, and the potential advantages of creating complex canvases of content. We see how the SAGE2 tool affords parallel content sharing to create complex canvases, which represent queues of ideas and contributions (past, present, and future) using the space on a large display to suggest the progression of time through the meeting.
X3D Field Trips for Remote Learning
Polys, Nicholas F.; Meaney, Kathleen; Munsell, John F.; Addlestone, Benjamin J. (ACM, 2021-11-08)
Combinations of immersive media and graphic portrayals can enable the human subjective sense of Presence. This paper collects our experiences and evaluations from six projects that use Extensible 3D (X3D) interactive graphics to deliver spatial experiences across the WWW. X3D enables the combination of spherical panoramas with 3D models and maps to visually transport users to a specific real location at a specific time. Remote users have access to these worlds through a Web-browser or other immersive device; local users in a CAVE can collaborate with natural physical gestures; . We reflect on the graphical and interactive requirements of these projects and provide guidance for future applications. In the face of physical lock-downs and distancing due to the CoVID pandemic, such platforms illustrate the opportunities and challenges in the design and delivery of spatial visualizations, especially for remote learning.
Active Learning for Microarray based Leukemia Classification
Zhu, Kecheng (ACM, 2021-11-12)
In machine learning, data labeling is assumed to be easy and cheap. However, in real word cases especially clinical field, data sets are rare and expensive to obtain. Active learning is an approach that can query the most informative data for the training. This leads to an alternative to deal with the concern mentioned above. The Sampling method is one of the key parts in active learning because it minimizes the training cost of the classifier. By different query method, models with considerable difference could be produced. The difference in model could lead to significant difference in training cost and final accuracy outcome. The approaches that were used to in this experiment is uncertainty sampling, diversity sampling and query by committee. In the experiment, active learning is applied on the microarray data with improving results. The classification on two types leukemia (acute myeloid leukemia and acute lymophoblastic leukemia) indicates a boost in accuracy with the same number of samples compared to passive machine learning. The experiments leads to the conclusion that with small number of samples with randomness in the field of leukemia classification, active learning produce an more active model. Additionally, active learning with query by committee finds the most informative sample with fewest trials.
Clustering Appliance Energy Consumption Data for Occupant Energy-Behavior Modeling
Dongre, Poorvesh; Aldrees, Asma; Gracanin, Denis (ACM, 2021-11-17)
Energy consumption of buildings varies significantly across buildings with similar functions and locations. Occupant behavior is one of the most significant sources of uncertainty related to energy consumption in buildings. A deeper understanding of occupant energy behavior can help in designing personalized behavior intervention strategies to save energy and predict energy consumption. This paper uses the Pecan Street dataset to cluster building occupants based on the energy they consume for each appliance in the household, and then developed load profiles for each of the clusters.
Energy Efficient Timely Transportation: A Comparative Study of Internal Combustion Trucks and Electric Trucks
Su, Junyan; Chen, Minghua; Zeng, Haibo (ACM, 2021-11-17)
We carry out a comparative study of energy consumption of the conventional internal combustion truck and the modern electric truck, traveling from origin to destination over the national highway subject to a hard deadline. We focus on understanding energy saving of the latter over the former and key contributing factors. Our study is unique in that (i) it is based on extensive simulations using real-world data over the U.S. highway system, and (ii) we factor in the power system energy-conversion efficiency when calculating the energy consumption of electric trucks for fair comparison. The results show that on average the electric truck save 10% energy as compared to the internal combustion truck, and this saving will improve as power systems incorporate more renewable generation. Furthermore, the energy saving mainly comes from the energy efficiency of electric motors, and other electric-truck features, e.g., regenerative braking, only have minor contributions.
Xar-Trek: Run-time Execution Migration among FPGAs and Heterogeneous-ISA CPUs
Horta, Edson; Chuang, Ho-Ren; VSathish, Naarayanan Rao; Philippidis, Cesar; Barbalace, Antonio; Olivier, Pierre; Ravindran, Binoy (ACM, 2021-12-06)
Datacenter servers are increasingly heterogeneous: from x86 host CPUs, to ARM or RISC-V CPUs in NICs/SSDs, to FPGAs. Previous works have demonstrated that migrating application execution at run-time across heterogeneous-ISA CPUs can yield significant performance and energy gains, with relatively little programmer effort. However, FPGAs have often been overlooked in that context: hardware acceleration using FPGAs involves statically implementing select application functions, which prohibits dynamic and transparent migration. We present Xar-Trek, a new compiler and run-time software framework that overcomes this limitation. Xar-Trek compiles an application for several CPU ISAs and select application functions for acceleration on an FPGA, allowing execution migration between heterogeneous-ISA CPUs and FPGAs at run-time. Xar-Trek’s run-time monitors server workloads and migrates application functions to an FPGA or to heterogeneous-ISA CPUs based on a scheduling policy. We develop a heuristic policy that uses application workload profiles to make scheduling decisions. Our evaluations conducted on a system with x86-64 server CPUs, ARM64 server CPUs, and an Alveo accelerator card reveal 88%-1% performance gains over no-migration baselines.
Understanding the Practices of Global Censorship through Accurate, End-to-End Measurements
Jin, Lin; Hao, Shuai; Wang, Haining; Cotton, Chase (ACM, 2021-12-15)
It is challenging to conduct a large scale Internet censorship measurement, as it involves triggering censors through artificial requests and identifying abnormalities from corresponding responses. Due to the lack of ground truth on the expected responses from legitimate services, previous studies typically require a heavy, unscalable manual inspection to identify false positives while still leaving false negatives undetected. In this paper, we propose Disguiser, a novel framework that enables end-to-end measurement to accurately detect the censorship activities and reveal the censor deployment without manual efforts. The core of Disguiser is a control server that replies with a static payload to provide the ground truth of server responses. As such, we send requests from various types of vantage points across the world to our control server, and the censorship activities can be recognized if a vantage point receives a different response. In particular, we design and conduct a cache test to pre-exclude the vantage points that could be interfered by cache proxies along the network path. Then we perform application traceroute towards our control server to explore censors’ behaviors and their deployment. With Disguiser, we conduct 58 million measurements from vantage points in 177 countries. We observe 292 thousand censorship activities that block DNS, HTTP, or HTTPS requests inside 122 countries, achieving a 10−6 false positive rate and zero false negative rate. Furthermore, Disguiser reveals the censor deployment in 13 countries.
Graph Deep Factors for Probabilistic Time-series Forecasting
Chen, Hongjie; Rossi, Ryan; Kim, Sungchul; Mahadik, Kanak; Eldardiry, Hoda (ACM, 2022)
Deep probabilistic forecasting techniques can model large collections of time-series. However, recent techniques explicitly assume either complete independence (local model) or complete dependence (global model) between time-series in the collection. This corresponds to the two extreme cases where every time-series is disconnected from every other time-series or likewise, that every time-series is related to every other time-series resulting in a completely connected graph. In this work, we propose a deep hybrid probabilistic graph-based forecasting framework called Graph Deep Factors (GraphDF) that goes beyond these two extremes by allowing nodes and their time-series to be connected to others in an arbitrary fashion. GraphDF is a hybrid forecasting framework that consists of a relational global and relational local model. In particular, a relational global model learns complex non-linear time-series patterns globally using the structure of the graph to improve both forecasting accuracy and computational efficiency. Similarly, instead of modeling every time-series independently, a relational local model not only considers its individual time-series but also the time-series of nodes that are connected in the graph. The experiments demonstrate the effectiveness of the proposed deep hybrid graph-based forecasting model compared to the state-of-the-art methods in terms of its forecasting accuracy, runtime, and scalability. Our case study reveals that GraphDF can successfully generate cloud usage forecasts and opportunistically schedule workloads to increase cloud cluster utilization by 47.5% on average. Furthermore, we target addressing the common nature of many time-series forecasting applications where time-series are provided in a streaming version, however, most methods fail to leverage the newly incoming time-series values and result in worse performance over time. In this paper, we propose an online incremental learning framework for probabilistic forecasting. The framework is theoretically proven to have lower time and space complexity. The framework can be universally applied to many other machine learning-based methods.
Supervised Contrastive Learning for Interpretable Long-Form Document Matching
Jha, Akshita; Rakesh, Vineeth; Chandrashekar, Jaideep; Samavedhi, Adithya; Reddy, Chandan (ACM, 2022)
Recent advancements in deep learning techniques have transformed the area of semantic text matching. However, most state-of-the-art models are designed to operate with short documents such as tweets, user reviews, comments, etc. These models have fundamental limitations when applied to long-form documents such as scientific papers, legal documents, and patents. When handling such long documents, there are three primary challenges: (i) the presence of different contexts for the same word throughout the document, (ii) small sections of contextually similar text between two documents, but dissimilar text in the remaining parts (this defies the basic understanding of "similarity"), and (iii) the coarse nature of a single global similarity measure which fails to capture the heterogeneity of the document content. In this paper, we describe CoLDE: Contrastive Long Document Encoder ? a transformer-based framework that addresses these challenges and allows for interpretable comparisons of long documents. CoLDE uses unique positional embeddings and a multi-headed chunkwise attention layer in conjunction with a supervised contrastive learning framework to capture similarity at three different levels: (i) high-level similarity scores between a pair of documents, (ii) similarity scores between different sections within and across documents, and (iii) similarity scores between different chunks in the same document and across other documents. These fine-grained similarity scores aid in better interpretability. We evaluate CoLDE on three long document datasets namely, ACL Anthology publications, Wikipedia articles, and USPTO patents. Besides outperforming the state-of-the-art methods on the document matching task, CoLDE is also robust to changes in document length and text perturbations and provides interpretable results. The code for the proposed model is publicly available at https://github.com/InterDigitalInc/CoLDE.
CARES: Context-Aware Trust Estimation for Realtime Crowdsensing Services in Vehicular Edge Networks
Jang, Si Young; Park, Sung Kyu; Cho, Jin-Hee; Lee, Dongman (ACM, 2022)
A growing number of smart vehicles makes it possible to envision a crowdsensing service where vehicles can share video data of their surroundings for seeking out traffic conditions and car accidents ahead. However, the service may need to deal with situations that malicious vehicles propagate false information to divert other vehicles away to arrive at the destinations earlier or lead them to dangerous locations. This paper proposes a context-aware trust estimation scheme that can allow roadside units in a vehicular edge network to provide real-time crowdsensing services in a reliable manner by selectively using information from trustworthy sources. Our proposed scheme is novel in that its trust estimation does not require any prior knowledge towards vehicles on roads but quickly obtains the accurate trust value of each vehicle by leveraging transfer learning and its Q-learning based dynamic adjustment scheme autonomously estimates trust levels of incoming vehicles with the aim of detecting malicious vehicles and accordingly filtering out untrustworthy input from them. Based on an extensive simulation study, we prove that the proposed scheme outperforms existing ones in terms of malicious vehicle detection accuracy.
A Characterization Study of Merge Conflicts in Java Projects
Shen, Bowen; Gulzar, Muhammad Ali; He, Fei; Meng, Na (ACM, 2022)
In collaborative software development, programmers create branches to add features and fix bugs, and merge branches to integrate edits. When edits from different branches textually overlap (i.e., textual conflicts) or lead to compilation and runtime errors (i.e., build and test conflicts), it is challenging for developers to remove conflicts. Prior work proposed tools to detect and solve conflicts. However, many questions are not fully investigated, such as what types of conflicts exist in practice and how developers or tools handle them. For this paper, we used automated textual merge, compilation, and testing to reveal 3 types of conflicts in 208 open-source repositories: textual conflicts, build conflicts (i.e., conflicts causing build errors), and test conflicts (i.e., conflicts triggering test failures). We manually inspected 538 conflicts and their resolutions to characterize merge conflicts. Our analysis revealed three phenomena. First, higher-order conflicts (i.e., build and test conflicts) are harder to handle, while existing tools mainly focus on textual conflicts. Second, developers resolved most higher-order conflicts by applying similar edits to multiple program locations. Third, developers resolved 64% of true textual conflicts by keeping complete edits from either a left or right branch. Our work will shed light on future research of software merge.
Automated Urban Planning for Reimagining City Configuration via Adversarial Learning: Quantification, Generation, and Evaluation
Wang, Dongjie; Fu, Yanjie; Liu, Kunpeng; Chen, Fanglan; Wang, Pengyang; Lu, Chang-Tien (ACM, 2022)
Urban planning refers to the efforts of designing land-use configurations given a region. However, there is a time-consuming and labor-intensive process for designing effective configurations, which motivates us to ask: can AI accelerate the urban planning process, so that human planners only adjust generated configurations for specific needs? The recent advance of deep generative models inspires us to automate urban planning from an adversarial learning perspective. However, three major challenges arise: 1) how to define a quantitative land-use configuration? 2) how to automate configuration planning? 3) how to evaluate the quality of a generated configuration? In this paper, we systematically address the three challenges. Specifically, 1) We define a land-use configuration as a longitude-latitude-channel tensor. 2) We formulate the automated urban planning problem into a task of deep generative learning. The objective is to generate a configuration tensor given the surrounding contexts of a target region. In particular, we first construct spatial graphs using geographic and human mobility data to learn graph representations. We then combine each target area and its surrounding context representations as a tuple, and categorize all tuples into positive (well-planned areas) and negative samples (poorly-planned areas). Next, we develop an adversarial learning framework, in which a generator takes the surrounding context representations as input to generate a land-use configuration, and a discriminator learns to distinguish between positive and negative samples. 3) We provide quantitative evaluation metrics and conduct extensive experiments to demonstrate the effectiveness of our framework.
The TL;DR Charter: Speculatively Demystifying Privacy Policy Documents and Terms Agreements
Kotut, Lindah; McCrickard, D. Scott (ACM, 2022-01-14)
Privacy policy and term agreement documents are considered the gateway for software adoption and use. The documents provide a means for the provider to outline expectations of the software use, and also provide an often-separate document outlining how user data is collected, stored, and used--including if it is shared with other parties. A user agreeing with the terms, assumes that they have a full understanding the terms of the agreement and have provided consent. Often however, users do not read the documents because they are long and full of legalistic and inconsistent language, are regularly amended, and may not disclose all the details on what is done to the user data. Enforcing compliance and ensuring user consent have been persistent challenges to policy makers and privacy researchers. This design fiction puts forward an alternate reality and presents a policy-based approach to fording the consent gap with the TL;DR Charter: an agreement governing the parties involved by harnessing the power of formal governments, industry, and other stakeholders, and taking users expectation of privacy into account. The Charter allows us as researchers to examine the implications on trust, decision-making, consent, accountability and the impact of future technologies.
ANTHEM: Attentive Hyperbolic Entity Model for Product Search
Choudhary, Nurendra; Rao, Nikhil; Katariya, Sumeet; Subbian, Karthik; Reddy, Chandan K. (ACM, 2022-02-11)
Product search is a fundamentally challenging problem due to the large-size of product catalogues and the complexity of extracting semantic information from products. In addition to this, the blackbox nature of most search systems also hamper a smooth customer experience. Current approaches in this area utilize lexical and semantic product information to match user queries against products. However, these models lack (i) a hierarchical query representation, (ii) a mechanism to detect and capture inter-entity relationships within a query, and (iii) a query composition method specific to e-commerce domain. To address these challenges, in this paper, we propose an AtteNTive Hyperbolic Entity Model (ANTHEM), a novel attention-based product search framework that models query entities as two-vector hyperboloids, learns inter-entity intersections and utilizes attention to unionize individual entities and inter-entity intersections to predict product matches from the search space. ANTHEM utilizes the first and second vector of hyperboloids to determine the query’s semantic position and to tune its surrounding search volume, respectively. The attention networks capture the significance of intra-entity and inter-entity intersections to the final query space. Additionally, we provide a mechanism to comprehend ANTHEM and understand the significance of query entities towards the final resultant products. We evaluate the performance of our model on real data collected from popular e-commerce sites. Our experimental study on the offline data demonstrates compelling evidence of ANTHEM’s superior performance over state-of-the-art product search methods with an improvement of more than 10% on various metrics. We also demonstrate the quality of ANTHEM’s query encoder using a query matching task.
Factors Influencing Student Performance and Persistence in CS2
Hooshangi, Sara; Ellis, Margaret; Edwards, Stephen (ACM, 2022-02-22)
Performance in CS1 and introductory CS courses has been an area of active research in the CS education research community for more than four decades, but studies related to student performance in CS2 are not as widely available. Past studies have examined the impact of CS1 grade, prior math preparation, and other factors such as homework, test, and project grades, on the overall performance in CS2. In this work, we will build upon the existing research related to CS2 performance with an emphasis on a few factors that have not been previously considered for this course. In addition to typical factors studied by others (i.e. gender, race, CS1 performance), our work also takes into account the impact of various CS1 pathways to CS2 and the number of previous college CS courses (including transfer credits) on student performance in CS2. We also look into both persistence, by distinguishing students who stay in the course versus those who drop from the class before the mid-semester drop deadline, and performance. Gender and race were not significant factors in determining performance in CS2 but undeclared engineering majors stood out as high performers and students’ CS pathway leading to CS2 was also significant. Notably, students with CS1 transfer credit had significantly lower pass rates. Students with only 1 previous CS course credit were less likely to drop or not pass the course.
Helping Student Programmers Through Industrial-Strength Static Analysis: A Replication Study
Senger, Allyson; Edwards, Stephen; Ellis, Margaret (ACM, 2022-02-22)
Static analysis tools evaluate source code to identify problems beyond typical compiler errors. Prior work has shown a statistically significant relationship between correctness and static analysis results. This paper replicates and extends a prior study on Find- Bugs, a static analysis tool aimed at professional Java programmers. The prior study showed a strong link between certain FindBugs issues and problems with program correctness. It also showed they were significantly associated with struggling, as indicated by taking more time, making more submissions, and receiving lower scores. However, the study used small programming exercises involving only a handful of lines of code from one semester of a CS1 course. This replication study uses the same experimental approach, but applies it to full-scale programming assignments from hundreds to thousands of lines in length, across all sections of CS1, CS2, and CS3 at a large public university over a period of 4 academic semesters, and involving 4,244 students in 109 laboratory sections completing 255,222 submission attempts. The goal of this replication study is to confirm how prior results hold up, and to explore how the results apply to full-sized programming assignments. We find a set of FindBugs warnings that are inversely correlated with correctness and confirm that their presence is still significantly associated with struggling on larger programming assignments. However, a larger number of FindBugs issues were identified as valuable. We also discuss how student-friendly messages have been added to provided student-readable feedback for a tool where the native messages were written for professional programmers.
Adelie: Continuous Address Space Layout Re-randomization for Linux Drivers
Nikolaev, Ruslan; Nadeem, Hassan; Stone, Cathlyn; Ravindran, Binoy (ACM, 2022-02-28)
While address space layout randomization (ASLR) has been extensively studied for user-space programs, the corresponding OS kernel’s KASLR support remains very limited, making the kernel vulnerable to just-in-time (JIT) return-oriented programming (ROP) attacks. Furthermore, commodity OSs such as Linux restrict their KASLR range to 32 bits due to architectural constraints (e.g., x86-64 only supports 32-bit immediate operands for most instructions), which makes them vulnerable to even unsophisticated brute-force ROP attacks due to low entropy. Most in-kernel pointers remain static, exacerbating the problem when pointers are leaked. Adelie, our kernel defense mechanism, overcomes KASLR limitations, increases KASLR entropy, and makes successful ROP attacks on the Linux kernel much harder to achieve. First, Adelie enables the position-independent code (PIC) model so that the kernel and its modules can be placed anywhere in the 64-bit virtual address space, at any distance apart from each other. Second, Adelie implements stack re-randomization and address encryption on modules. Finally, Adelie enables efficient continuous KASLR for modules by using the PIC model to make it (almost) impossible to inject ROP gadgets through these modules regardless of gadget’s origin. Since device drivers (typically compiled as modules) are often developed by third parties and are typically less tested than core OS parts, they are also often more vulnerable. By fully re-randomizing device drivers, the last two contributions together prevent most JIT ROP attacks since vulnerable modules are very likely to be a starting point of an attack. Furthermore, some OS instances in virtualized environments are specifically designated to run device drivers, where drivers are the primary target of JIT ROP attacks. Using a GCC plugin that we developed, we automatically modify different kinds of kernel modules. Since the prior art tackles only user-space programs, we solve many challenges unique to the kernel code. Our evaluation shows high efficiency of Adelie’s approach: the overhead of the PIC model is completely negligible and re-randomization cost remains reasonable for typical use cases.

Browse

Browsing Journal Articles, Association for Computing Machinery (ACM) by Issue Date

Results Per Page

Sort Options