Browsing by Author "Min, Changwoo"
Now showing 1 - 11 of 11
Results Per Page
Sort Options
- Cross-ISA Execution Migration of Unikernels: Build Toolchain, Memory Alignment, and VM State Transfer TechniquesMehrab, A K M Fazla (Virginia Tech, 2018-12-12)The data centers are composed of resource-rich expensive server machines. A server, overloadeded with workloads, offloads some jobs to other servers; otherwise, its throughput becomes low. On the other hand, low-end embedded computers are low-power, and cheap OS-capable devices. We propose a system to use these embedded devices besides the servers and migrate some jobs from the server to the boards to increase the throughput when overloaded. The datacenters usually run workloads inside virtual machines (VM), but these embedded boards are not capable of running full-fledged VMs. In this thesis, we propose to use lightweight VMs, called unikernel, which can run on these low-end embedded devices. Another problem is that the most efficient versions of these boards have different instruction set architectures than the servers have. The ISA-difference between the servers and the embedded boards and the migration of the entire unikernel between them makes the migration a non-trivial problem. This thesis proposes a way to provide the unikernels with migration capabilities so that it becomes possible to offload workloads from the server to the embedded boards. This thesis describes a toolchain development process for building migratable unikernel for the native applications. This thesis also describes the alignment of the memory components between unikernels for different ISAs, so that the memory referencing remains valid and consistent after migration. Moreover, this thesis represents an efficient VM state transfer method so that the workloads experience higher execution time and minimum downtime due to the migration.
- Enforcing C/C++ Type and Scope at Runtime for Control-Flow and Data-Flow IntegrityIsmail, Mohannad; Jelesnianski, Christopher; Jang, Yeongjin; Min, Changwoo; Xiong, Wenjie (ACM, 2024-04-27)Control-flow hijacking and data-oriented attacks are becoming more sophisticated. These attacks, especially dataoriented attacks, can result in critical security threats, such as leaking an SSL key. Data-oriented attacks are hard to defend against with acceptable performance due to the sheer amount of data pointers present. The root cause of such attacks is using pointers in unintended ways; fundamentally, these attacks rely on abusing pointers to violate the original scope they were used in or the original types that they were declared as. This paper proposes Scope Type Integrity (STI), a new defense policy that enforces all pointers (both code and data pointers) to conform to the original programmer’s intent, as well as Runtime Scope Type Integrity (RSTI) mechanisms to enforce STI at runtime leveraging ARM Pointer Authentication. STI gathers information about the scope, type, and permissions of pointers. This information is then leveraged by RSTI to ensure pointers are legitimately utilized at runtime. We implemented three defense mechanisms of RSTI, with varying levels of security and performance tradeoffs to showcase the versatility of RSTI. We employ these three variants on a variety of benchmarks and real-world applications for a full security and performance evaluation of these mechanisms. Our results show that they have overheads of 5.29%, 2.97%, and 11.12%, respectively.
- Fireworks: A Fast, Efficient, and Safe Serverless Framework using VM-level post-JIT SnapshotShin, Wonseok; Kim, Wook-Hee; Min, Changwoo (ACM, 2022-03-28)Serverless computing is a new paradigm that is rapidly gaining popularity in Cloud computing. One unique property in serverless computing is that the unit of deployment and execution is a serverless function, which is much smaller than a typical server program. Serverless computing introduces a new pay-as-you-go billing model and provides a high economic benefit from highly elastic resource provisioning. However, serverless computing also brings new challenges such as (1) long start-up times compared to relatively short function execution times, (2) security risks from a highly consolidated environment, and (3) memory efficiency problems from unpredictable function invocations. These problems not only degrade performance but also lower the economic benefits of Cloud providers. To address these challenges without any compromises, we propose a novel VM-level post-JIT snapshot approach and develop a new serverless framework, Fireworks. Our key idea is to synergistically leverage a virtual machine (VM)- level snapshot with a language runtime-level just-in-time (JIT) compilation in tandem. Fireworks leverages JITted serverless function code to reduce both start-up time and execution time of functions and improves memory efficiency by sharing the JITted code. Also, Fireworks can provide a high level of isolation by using a VM as a sandbox to execute a serverless function. Our evaluation results show that Fireworks outperforms state-of-art serverless platforms by 20.6× and provides higher memory efficiency of up to 7.3×.
- Generalized Consensus for Practical Fault-ToleranceGarg, Mohit (Virginia Tech, 2018-09-07)Despite extensive research on Byzantine Fault Tolerant (BFT) systems, overheads associated with such solutions preclude widespread adoption. Past efforts such as the Cross Fault Tolerance (XFT) model address this problem by making a weaker assumption that a majority of processes are correct and communicate synchronously. Although XPaxos of Liu et al. (using the XFT model) achieves similar performance as Paxos, it does not scale with the number of faults. Also, its reliance on a single leader introduces considerable downtime in case of failures. This thesis presents Elpis, the first multi-leader XFT consensus protocol. By adopting the Generalized Consensus specification from the Crash Fault Tolerance model, we were able to devise a multi-leader protocol that exploits the commutativity property inherent in the commands ordered by the system. Elpis maps accessed objects to non-faulty processes during periods of synchrony. Subsequently, these processes order all commands which access these objects. Experimental evaluation confirms the effectiveness of this approach: Elpis achieves up to 2x speedup over XPaxos and up to 3.5x speedup over state-of-the-art Byzantine Fault-Tolerant Consensus Protocols.
- MEMTIS: Efficient Memory Tiering with Dynamic Page Classification and Page Size DeterminationLee, Taehyung; Monga, Sumit Kumar; Min, Changwoo; Eom, Young Ik (ACM, 2023-10-23)The evergrowing memory demand fueled by datacenter workloads is the driving force behind new memory technology innovations (e.g., NVM, CXL). Tiered memory is a promising solution which harnesses such multiple memory types with varying capacity, latency, and cost characteristics in an effort to reduce server hardware costs while fulfilling memory demand. Prior works on memory tiering make suboptimal (often pathological) page placement decisions because they rely on various heuristics and static thresholds without considering overall memory access distribution. Also, deciding the appropriate page size for an application is difficult as huge pages are not always beneficial as a result of skewed accesses within them. We present Memtis, a tiered memory system that adopts informed decision-making for page placement and page size determination. Memtis leverages access distribution of allocated pages to optimally approximate the hot data set to the fast tier capacity. Moreover, Memtis dynamically determines the page size that allows applications to use huge pages while avoiding their drawbacks by detecting inefficient use of fast tier memory and splintering them if necessary. Our evaluation shows that Memtis outperforms state-of-the-art tiering systems by up to 169.0% and their best by up to 33.6%.
- PACTIGHT: Tightly Seal Sensitive Pointers with Pointer AuthenticationIsmail, Mohannad A (Virginia Tech, 2021-12-02)ARM is becoming more popular in desktops and data centers. This opens a new realm in terms of security attacks against ARM, increasing the importance of having an effective and efficient defense mechanism for ARM. ARM has released Pointer Authentication, a new hardware security feature that is intended to ensure pointer integrity with cryptographic primitives. Recently, it has been found to be vulnerable. In this thesis, we utilize Pointer Authentication to build a novel scheme to completely prevent any misuse of security-sensitive pointers. We propose PACTight to tightly seal these pointers from attacks targeting Pointer Authentication itself as well as from control-flow hijacks. PACTight utilizes a strong and unique modifier that addresses the current issues with PAC and its implementations. We implement four defenses by fully integrating with the LLVM compiler toolchain. Through a robust and systemic security and performance evaluation, we show that PACTight defenses are more efficient and secure than their counterparts. We evaluated PACTight on 30 different applications, including NGINX web server and using real PAC instructions, with an average performance and memory overhead of 4.28% and 23.2% respectively even when enforcing its strongest defense. As far as we know, PACTight is the first defense mechanism to demonstrate effectiveness and efficiency with real PAC instructions.
- PRISM: Optimizing Key-Value Store for Modern Heterogeneous Storage DevicesSong, Yongju; Kim, Wook-Hee; Monga, Sumit Kumar; Min, Changwoo; Eom, Young Ik (ACM, 2023-01-27)As data generation has been on an upward trend, storing vast volumes of data cost-effectively as well as efficiently accessing them is paramount. At the same time, today’s storage landscape continues to diversify, from high-bandwidth storage devices such as NVMe SSDs to low-latency non-volatile memory (e.g., Intel Optane DCPMM). These heterogeneous storage devices have the potential to deliver high performance in terms of bandwidth and latency with cost efficiency, while achieving the performance and cost targets together still remains a challenging problem. We provide our solution, PRISM, a novel key-value store that utilizes modern heterogeneous storage devices. PRISM uses heterogeneous storage devices synergistically to harness the advantages of each storage device while suppressing their downsides. We devise new techniques to balance the latency-bandwidth tradeoff when reading from SSD. For ensuring multicore scalability and crash consistency of data across heterogeneous storage media, PRISM proposes cross-storage concurrency control and cross-storage crash consistency protocols. Our evaluation shows that PRISM outperforms state-of-the-art key-value stores by up to 13.1× with significantly lower tail latency.
- Protect the System Call, Protect (Most of) the World with BASTIONJelesnianski, Christopher; Ismail, Mohannad; Jang, Yeongjin; Williams, Dan; Min, Changwoo (ACM, 2023-03-25)System calls are a critical building block in many serious security attacks, such as control-flow hijacking and privilege escalation attacks. Security-sensitive system calls (e.g., execve, mprotect), especially play a major role in completing attacks. Yet, few defense efforts focus to ensure their legitimate usage, allowing attackers to maliciously leverage system calls in attacks. In this paper, we propose a novel System Call Integrity, which enforces the correct use of system calls throughout runtime. We propose three new contexts enforcing (1) which system call is called and how it is invoked (Call Type), (2) how a system call is reached (Control Flow), and (3) that arguments are not corrupted (Argument Integrity). Our defense mechanism thwarts attacks by breaking the critical building block in their attack chains. We implement Bastion, as a compiler and runtime monitor system, to demonstrate the efficacy of the three system call contexts. Our security case study shows that Bastion can effectively stop all the attacks including real-world exploits and recent advanced attack strategies. Deploying Bastion on three popular system callintensive programs, NGINX, SQLite, and vsFTPd, we show Bastion is secure and practical, demonstrating overhead of 0.60%, 2.01%, and 1.65%, respectively.
- Scaling RDMA RPCs with FLOCKMonga, Sumit Kumar (Virginia Tech, 2021-11-30)RDMA-capable networks are gaining traction with datacenter deployments due to their high throughput, low latency, CPU efficiency, and advanced features, such as remote memory operations. However, efficiently utilizing RDMA capability in a common setting of high fan-in, fan-out asymmetric network topology is challenging. For instance, using RDMA programming features comes at the cost of connection scalability, which does not scale with increasing cluster size. To address that, several works forgo some RDMA features by only focusing on conventional RPC APIs. In this work, we strive to exploit the full capability of RDMA, while scaling the number of connections regardless of the cluster size. We present FLOCK, a communication framework for RDMA networks that uses hardware provided reliable connection. Using a partially shared model, FLOCK departs from the conventional RDMA design by enabling connection sharing among threads, which provides significant performance improvements contrary to the widely held belief that connection sharing deteriorates performance. At its core, FLOCK uses a connection handle abstraction for connection multiplexing; a new coalescing-based synchronization approach for efficient network utilization; and a load-control mechanism for connections with symbiotic send-recv scheduling, which reduces the synchronization overheads associated with connection sharing along with ensuring fair utilization of network connections.
- Securely Sharing Randomized Code That FliesJelesnianski, Christopher; Yom, Jinwoo; Min, Changwoo; Jang, Yeongjin (ACM, 2022-09-12)Address space layout randomization was a great role model, being a light-weight defense technique that could prevent early return-oriented programming attacks. Simple yet effective, address space layout randomization was quickly widely adopted. Conversely, today only a trickle of defense techniques arebeing integrated or adopted mainstream. As code reuse attacks have evolved in complexity, defenses have strived to keep up. However, to do so, many have had to take unfavorable tradeoffs like using background threads or protecting only a subset of sensitive code. In reality, these tradeoffs were unavoidable steps necessary to improve the strength of the state of the art. In this article, we present Mardu, an on-demand system-wide runtime re-randomization technique capable of scalable protection of application as well as shared library code that most defenses have forgone. We achieve code sharing with diversification by implementing reactive and scalable rather than continuous or one-time diversification. Enabling code sharing further removes redundant computation like tracking and patching, along with memory overheads required by prior randomization techniques. In its baseline state, the code transformations needed for Mardu security hardening incur a reasonable performance overhead of 5.5% on SPEC and minimal degradation of 4.4% in NGINX, demonstrating its applicability to both compute-intensive and scalable real-world applications. Even when under attack, Mardu only adds from less than 1% to up to 15% depending on application size and complexity.
- Write-Light Cache for Energy Harvesting SystemsChoi, Jongouk; Zeng, Jianping; Lee, Dongyoon; Min, Changwoo; Jung, Changhee (ACM, 2023-06-17)Energy harvesting system has huge potential to enable battery-less Internet of Things (IoT) services. However, it has been designed without a cache due to the difficulty of crash consistency guarantee, limiting its performance. This paper introduces Write-Light Cache (WL-Cache), a specialized cache architecture with a new write policy for energy harvesting systems. WL-Cache combines benefits of a write-back cache and a write-through cache while avoiding their downsides. Unlike a write-through cache, WL-Cache does not access a non-volatile main memory (NVM) at every store but it holds dirty cache lines in a cache to exploit locality, saving energy and improving performance. Unlike a write-back cache, WL-Cache limits the number of dirty lines in a cache. When power is about to be cut off, WL-Cache flushes the bounded set of dirty lines to NVM in a failure-atomic manner by leveraging a just-in-time (JIT) checkpointing mechanism to achieve crash consistency across power failure. For optimization, WL-Cache interacts with a run-time system that estimates the quality of energy source during each power-on period, and adaptively reconfigures the possible number of dirty cache lines at boot time. Our experiments demonstrate that WL-Cache reduces hardware complexity and provides a significant speedup over the state-of-the-art volatile cache design with non-volatile backup. For two representative power outage traces, WL-Cache achieves 1.35x and 1.44x average speedups, respectively, across 23 benchmarks used in prior work.