VTechWorks Repository :: Browsing by Author "Wang, Xiaoguang"

Browsing by Author "Wang, Xiaoguang"

Now showing 1 - 9 of 9

CRIU-RTX: Remote Thread eXecution using Checkpoint/Restore in Userspace
Noor Mohamed, Mohamed Husain (Virginia Tech, 2023-07-21)
Scaling up application performance on single high-end machines is increasingly becoming difficult due to scalability challenges of processor interconnects, cache coherence protocols, and memory bandwidth. Significant prior work has addressed this problem by scaling-out application threads across multiple nodes to exploit resources outside the single machine boundary. Prior works have also leveraged heterogeneous instruction set architecture (ISA) systems to improve application performance as well as energy-efficiency, a major cost driver in datacenters, by augmenting high-end servers with power-efficient embedded boards. Existing works, however, suffer from deployability challenges due to dependencies on the operating system or programming models that require non-trivial application modifications. We introduce CRIU-RTX, a userspace framework to scale-out multi-threaded applications across multiple nodes. Integrated with HetMigrate, a prior work on migrating processes across heterogeneous-ISA systems, CRIU-RTX can suspend a subset of threads in a process and resume their execution on different nodes, including, but not limited to heterogeneous-ISA nodes. CRIU-RTX implements distributed shared memory in userspace, thereby allowing application threads to access distributed memory transparently without any operating system dependency. Our experimental evaluations show 21% to 43% performance gains while scaling-out applications across x86-64 servers, and energy efficiency gains of up to 18% while scaling-out across a cluster of x86-64 server and ARM64 embedded boards. Since CRIU-RTX does not depend on operating system modifications, it can be easily deployed on a diverse set of machines, including, but not limited to ISA-different machines running the stock Linux operating system.
DynaCut: A Framework for Dynamic and Adaptive Program Customization
Mahurkar, Abhijit; Wang, Xiaoguang; Zhang, Hang; Ravindran, Binoy (ACM, 2023-11-27)
Software is becoming increasingly complex and feature-rich, yet only part of any given codebase is frequently used. Existing software customization and debloating approaches target static binaries, focusing on feature discovery, control-flow analysis, and binary rewriting. As a result, the customized program binary has a smaller attack surface as well as less available functionality. This means that once a software’s use scenario changes, the customized binary may not be usable. This paper presents DynaCut, for dynamic software code customization. DynaCut can disable “not being used” code features during software runtime and re-enable them when required again. DynaCut works at the binary level; no source code is needed. To achieve its goal, DynaCut includes a dynamic process rewriting technique that seamlessly and transparently updates the image of a running process, with specific code features blocked or re-enabled. To help identify potentially unused code, DynaCut employs an execution trace-based differential analysis to pinpoint the code related to specific software features, which can be dynamically turned on/off based on user configuration. We also develop automatic methods to locate code that is only temporally used (e.g., initialization code), which can be dropped in a timely manner (e.g., after the initialization phase). We prototype DynaCut and evaluate it using 3 widely used server applications and the SPECint2017_speed benchmark suite. The result shows that, compared to existing static binary customization approaches, DynaCut removes an additional 10% of code on average and up to 56% of temporally executed code due to the dynamic code customization.
DynaCut: A Framework for Dynamic Code Customization
Mahurkar, Abhijit (Virginia Tech, 2021-09-03)
Software systems are becoming increasingly bloated to accommodate a wide array of features, platforms and users. This results not only in wastage of memory but also in an increase in their attack surface. Existing works broadly use binary-rewriting techniques to remove unused code, but this results in a binary that is highly customized for a given usage context. If the usage scenario of the binary changes, the binary has to be regenerated. We present DYNACUT– a framework for Dynamic and Adaptive Code Customization. DYNACUT provides the user with the capability to customize the application to changing usage scenarios at runtime without the need for the source code. DYNACUT achieves this customization by leveraging two techniques: 1) identifying the code to be removed by using execution traces of the application and 2) by rewriting the process dynamically. The first technique uses traces of the wanted features and the unwanted features of the application and generates their diffs to identify the features to be removed. The second technique modifies the process image to add traps and fault-handling code to remove vulnerable but unused code. DYNACUT can also disable temporally unused code – code that is used only during the initialization phase of the application. To demonstrate its effectiveness, we built a prototype of DYNACUT and evaluated it on 9 real-world applications including NGINX, Lighttpd and 7 applications of the SPEC Intspeed benchmark suite. DYNACUT removes upto 56% of executed basic blocks and upto 10% of the application code when used to remove initialization code. The total overhead is in the range of 1.63 seconds for Lighttpd, 4.83 seconds for NGINX and about 39 seconds for perlbench in the SPEC suite.
HetMigrate: Secure and Efficient Cross-architecture Process Live Migration
Bapat, Abhishek Mandar (Virginia Tech, 2023-01-31)
The slowdown of Moore's Law opened a new era of computer research and development. Researchers started exploring alternatives to the traditional CPU design. A constant increase in consumer demands led to the development of CMPs, GPUs, and FPGAs. Recent research proposed the development of heterogeneous-ISA systems and implemented the necessary systems software to make such systems functional. Evaluations have shown that heterogeneous-ISA systems can offer better throughput and energy efficiency than homogeneous-ISA systems. Due to their low cost, ARM servers are now being adopted in data centers (e.g., AWS Graviton). While prior work provided the infrastructure necessary to run applications on heterogeneous-ISA systems, their dependency on a specialized kernel and a custom compiler increases deployment and maintenance costs. This thesis presents HetMigrate, a framework to live-migrate Linux processes over heterogeneous-ISA systems. HetMigrate integrates with CRIU, a Linux mechanism for process migration, and runs on stock Linux operating systems which improves its deployability. Furthermore, HetMigrate transforms the process's state externally without instrumenting state transformation code into the process binaries which has security benefits and also improves deployability. Our evaluations on Redis server and NAS Parallel Benchmarks show that HetMigrate takes an average of 720ms to fully migrate a process across ISAs while maintaining its state. Moreover, live-migrating with HetMigrate reduces the attack surface of a process by up to 72.8% compared to prior work. Additionally, HetMigrate is easier to deploy in real-world systems compared to prior work. To prove the deployability we ran HetMigrate on a variety of environments like cloud instances (e.g. Cloud Lab), local setups virtualized with QEMU/KVM, and a server-embedded board pair. Similar to works in the past, we also evaluated the energy and throughput benefits that heterogeneous-ISA systems can offer by connecting a Xeon server to three embedded boards over the network. We observed that selectively offloading compute-intensive workloads to embedded boards can increase energy efficiency by up to 39% and throughput by up to 52% while increasing the cost by just 10%.
Netswap: Network-based Swapping for Server-Embedded Board Clusters
Errabelly, Sandeep (Virginia Tech, 2023-07-05)
Capital equipment costs and energy costs are the major cost drivers in datacenters. Prior works have explored various techniques, like efficient scheduling algorithms and advanced power management techniques, to maximize resource utilization to reduce the capital and energy costs. The project HEXO has explored heterogeneous-Instruction Set Architecture (ISA) server-embedded clusters to minimize the cost. HEXO's key idea is to migrate stateful virtual machines from high-performance x86-based servers to low-power, low-cost ARM-based embedded boards, reducing server's resource congestion and thereby improving throughput and energy efficiency. However, embedded boards generally have significantly lower onboard memory, typically in the range of 100MB to 4GB. Due to this limitation, high memory-demand applications cannot be migrated to embedded devices. This limits the scope of applications that can be used with heterogeneous-ISA server-embedded clusters such as HEXO. This thesis proposes Netswap, a mechanism that utilizes the server's free memory as remote memory for the embedded board. Netswap comprises three main components: the swap-out and swap-in mechanism, a bitmap-based Free Memory Manager, and the Netswap Remote Daemon. Experimental studies using micro- and macro benchmarks reveal that Netswap improves the throughput and energy efficiency of server-embedded clusters by as much as 40% and 20%, respectively, over server-only baselines.
Rave: A Modular and Extensible Framework for Program State Re-Randomization
Blackburn, Christopher; Wang, Xiaoguang; Ravindran, Binoy (ACM, 2022-11-11)
Dynamic software diversification is an effective way to boost software security. Existing diversification-based approaches often target a single node environment and leverage in-process agents to diversify code and data, resulting in an unnecessary attack surface on a fixed software/hardware stack. This paper presents Rave, a practical system designed to enable out-of-bound program state shuffling on a moving target environment, avoiding any sensitive agent code invoked within the running target. Rave relies on a userspace page fault handling mechanism introduced in the latest Linux kernel and seamlessly integrates with CRIU [10], the battle-tested process migration tool for Linux. Rave consists of two components: librave, a library for static binary analysis and instrumentation, and CRIU-Rave, a runtime that dynamically updates program execution states (e.g., internal stack data layout and the machine node the program runs on). We built a prototype of Rave and evaluated it with four real-world server applications and 13 applications from the SPEC CPU 2017 and the SNU C version of NAS Parallel Benchmarks (NPB) benchmark suites. We demonstrated that Rave can continuously re-randomize the program state (e.g., internal stack layout, instruction sequences, and machine node to run on). The evaluation shows that Rave increases the internal program state entropy with an additional ≈200 ms time overhead for each re-randomization epoch on average.
Remote Software Guard Extension (RSGX)
Sundarasamy, Abilesh (Virginia Tech, 2023-12-21)
With the constant evolution of hardware architecture extensions aimed at enhancing software security, a notable availability gap arises due to the proprietary nature and design-specific characteristics of these features, resulting in a CPU-specific implementation. This gap particularly affects low-end embedded devices that often rely on CPU cores with limited resources. Addressing this challenge, this thesis focuses on providing access to hardware-based Trusted Execution Environments (TEEs) for devices lacking TEE support. RSGX is a framework crafted to transparently offload security-sensitive workloads to an enclave hosted in a remote centralized edge server. Operating as clients, low-end TEE-lacking devices can harness the hardware security features provided by TEEs of either the same or different architecture. RSGX is tailored to accommodate applications developed with diverse TEE-utilizing SDKs, such as the Open Enclave SDK, Intel SGX SDK, and many others. This facilitates easy integration of existing enclave-based applications, and the framework allows users to utilize its features without requiring any source code modifications, ensuring transparent offloading behind the scenes. For the evaluation, we set up an edge computing environment to execute C/C++ applications, including two overhead micro-benchmarks and four popular open-source applications. This evaluation of RSGX encompasses an analysis of its security benefits and a measurement of its performance overhead. We demonstrate that RSGX has the potential to mitigate a range of Common Vulnerability Exposures (CVEs), ensuring the secure execution of confidential computations on hybrid and distributed machines with an acceptable performance overhead.
sMVX: Multi-Variant Execution on Selected Code Paths
Yeoh, Sengming; Wang, Xiaoguang; Jang, Jae-Won; Ravindran, Binoy (ACM, 2024-12-02)
Multi-Variant Execution (MVX) is an effective way to detect memory corruption vulnerabilities, intrusions, or live software updates. A traditional MVX system concurrently runs multiple copies of functionally identical, layout-different program variants. Therefore, a typical memory corruption attack that forges pointers can succeed on at most one variant, leading the other variant(s) to crash. The replicated execution adds software security and reliability but also brings multiple times of CPU and memory usage. This paper presents sMVX, a flexible multi-variant execution system replicating variants only on the selected code paths. sMVX allows end-users to annotate a target program and indicate sensitive code regions for multi-variant execution. Such code regions can be authentication-related code or sensitive functions that handle potentially malicious input data. An sMVX runtime only replicates the sensitive functions and executes them in lockstep. We have implemented a prototype of sMVX using an in-process code monitor. The sMVX monitor supports the selected code paths MVX from within the target program’s address space, but the monitor is isolated from the target’s code by the Intel Memory Protection Keys (MPK). We evaluated the sMVX using a benchmark suite and two server applications. The evaluation demonstrates that sMVX exhibits a comparable performance overhead to state-of-the-art MVX systems but requires 20% fewer CPU cycles and 49% less memory consumption on server applications.
Understanding the Security of Linux eBPF Subsystem
Noor Mohamed, Mohamed Husain; Wang, Xiaoguang; Ravindran, Binoy (ACM, 2023-08-24)
Linux eBPF allows a userspace application to execute code inside the Linux kernel without modifying the kernel code or inserting a kernel module. An in-kernel eBPF verifier preverifies any untrusted eBPF bytecode before running it in kernel context. Currently, users trust the verifier to block malicious bytecode from being executed. This paper studied the potential security issues from existing eBPF-related CVEs. Next, we present a generation-based eBPF fuzzer that generates syntactically and semantically valid eBPF programs to find bugs in the verifier component of the Linux kernel eBPF subsystem. The fuzzer extends the Linux Kernel Library (LKL) project to run multiple lightweight Linux instances simultaneously, with inputs from the automatically generated eBPF instruction sequences. Using this fuzzer, we can outperform the bpf-fuzzer [10] from the iovisor GitHub repository regarding fuzzing speed and the success rate of passing the eBPF verifier (valid generated code). We also found two existing ALU range-tracking bugs that appeared in an older Linux kernel (v5.10).

Browsing by Author "Wang, Xiaoguang"

Results Per Page

Sort Options