Browsing by Author "Nikolaev, Ruslan"
Now showing 1 - 15 of 15
Results Per Page
Sort Options
- Adelie: Continuous Address Space Layout Re-randomization for Linux DriversNikolaev, Ruslan; Nadeem, Hassan; Stone, Cathlyn; Ravindran, Binoy (ACM, 2022-02-28)While address space layout randomization (ASLR) has been extensively studied for user-space programs, the corresponding OS kernel’s KASLR support remains very limited, making the kernel vulnerable to just-in-time (JIT) return-oriented programming (ROP) attacks. Furthermore, commodity OSs such as Linux restrict their KASLR range to 32 bits due to architectural constraints (e.g., x86-64 only supports 32-bit immediate operands for most instructions), which makes them vulnerable to even unsophisticated brute-force ROP attacks due to low entropy. Most in-kernel pointers remain static, exacerbating the problem when pointers are leaked. Adelie, our kernel defense mechanism, overcomes KASLR limitations, increases KASLR entropy, and makes successful ROP attacks on the Linux kernel much harder to achieve. First, Adelie enables the position-independent code (PIC) model so that the kernel and its modules can be placed anywhere in the 64-bit virtual address space, at any distance apart from each other. Second, Adelie implements stack re-randomization and address encryption on modules. Finally, Adelie enables efficient continuous KASLR for modules by using the PIC model to make it (almost) impossible to inject ROP gadgets through these modules regardless of gadget’s origin. Since device drivers (typically compiled as modules) are often developed by third parties and are typically less tested than core OS parts, they are also often more vulnerable. By fully re-randomizing device drivers, the last two contributions together prevent most JIT ROP attacks since vulnerable modules are very likely to be a starting point of an attack. Furthermore, some OS instances in virtualized environments are specifically designated to run device drivers, where drivers are the primary target of JIT ROP attacks. Using a GCC plugin that we developed, we automatically modify different kinds of kernel modules. Since the prior art tackles only user-space programs, we solve many challenges unique to the kernel code. Our evaluation shows high efficiency of Adelie’s approach: the overhead of the PIC model is completely negligible and re-randomization cost remains reasonable for typical use cases.
- Brief announcement: Crystalline: Fast and memory efficient wait-free reclamationNikolaev, Ruslan; Ravindran, Binoy (2021-10-01)We present a new wait-free memory reclamation scheme, Crystalline, that simultaneously addresses the challenges of high performance, high memory efficiency, and wait-freedom. Crystalline guarantees complete wait-freedom even when threads are dynamically recycled, asynchronously reclaims memory in the sense that any thread can reclaim memory retired by any other thread, and ensures (an almost) balanced reclamation workload across all threads. The latter two properties result in Crystalline’s high performance and high memory efficiency, a difficult trade-off for most existing schemes. Our evaluations show that Crystalline exhibits outstanding scalability and memory efficiency, and achieves superior throughput than state-of-the-art reclamation schemes as the number of threads grows.
- Design and Implementation of a Network Server in LibrettOSSung, Mincheol (Virginia Tech, 2018-12-13)Traditional network stacks in monolithic kernels have reliability and security concerns. Any fault in a network stack affects the entire system owing to lack of isolation in the monolithic kernel. Moreover, the large code size of the network stack enlarges the attack surface of the system. A multiserver OS design solves this problem. In contrast to the traditional network stack, a multiserver OS pushes the network stack into the network server as a user process, which performs three enhancements: (i) allows the network server to run in user mode while having its own address space and isolating any fault occurring in the network server; (ii) minimizes the attack surface of the system because the trusted computing base contracts; (iii) enables failure recovery, which is an important feature supported by a multiserver OS. This thesis proposes a network server for LibrettOS, an operating system based on rumprun unikernels and the Xen Hypervisor developed by Virginia Tech. The proposed network server is a service domain providing an L2 frame forwarding service for application domains and based on rumprun such that the existing device drivers of NetBSD can be leveraged with little modification. In this model, the TCP/IP stack runs directly in the address space of applications. This allows retaining the client state even if the network server crashes and makes it possible to recover from a network server failure. We leverage the Xen PCI passthrough to access a NIC (Network Interface Controller) from the network server. Our experimental evaluation demonstrates that the performance of the network server is good and comparable with Linux and NetBSD. We also demonstrate the successful recovery after a failure.
- Design and Implementation of the VirtuOS Operating SystemNikolaev, Ruslan (Virginia Tech, 2014-01-21)Most operating systems provide protection and isolation to user processes, but not to critical system components such as device drivers or other systems code. Consequently, failures in these components often lead to system failures. VirtuOS is an operating system that exploits a new method of decomposition to protect against such failures. VirtuOS exploits virtualization to isolate and protect vertical slices of existing OS kernels in separate service domains. Each service domain represents a partition of an existing kernel, which implements a subset of that kernel's functionality. Service domains directly service system calls from user processes. VirtuOS exploits an exceptionless model, avoiding the cost of a system call trap in many cases. We illustrate how to apply exceptionless system calls across virtualized domains. To demonstrate the viability of VirtuOS's approach, we implemented a prototype based on the Linux kernel and Xen hypervisor. We created and evaluated a network and a storage service domain. Our prototype retains compatibility with existing applications, can survive the failure of individual service domains while outperforming alternative approaches such as isolated driver domains and even exceeding the performance of native Linux for some multithreaded workloads. The evaluation of VirtuOS revealed costs due to decomposition, memory management, and communication, which necessitated a fine-grained analysis to understand their impact on the system's performance. The interaction of virtual machines with multiple underlying software and hardware layers in virtualized environment makes this task difficult. Moreover, performance analysis tools commonly used in native environments were not available in virtualized environments. Our work addresses this problem to enable an in-depth performance analysis of VirtuOS. Our Perfctr-Xen framework provides capabilities for per-thread analysis with both accumulative event counts and interrupt-driven event sampling. Perfctr-Xen is a flexible and generic tool, supports different modes of virtualization, and can be used for many applications outside of VirtuOS.
- A Family of Fast and Memory Efficient Lock- and Wait-Free ReclamationNikolaev, Ruslan; Ravindran, Binoy (ACM, 2024-06-20)Historically, memory management based on lock-free reference counting was very inefficient, especially for read-dominated workloads. Thus, approaches such as epoch-based reclamation (EBR), hazard pointers (HP), or a combination thereof have received significant attention. EBR exhibits excellent performance but is blocking due to potentially unbounded memory usage. In contrast, HP are non-blocking and achieve good memory efficiency but are much slower. Moreover, HP are only lock-free in the general case. Recently, several new memory reclamation approaches such as WFE and Hyaline have been proposed. WFE achieves wait-freedom, but is less memory efficient and performs suboptimally in oversubscribed scenarios; Hyaline achieves higher performance and memory efficiency, but lacks wait-freedom. We present a family of non-blocking memory reclamation schemes, called Crystalline, that simultaneously addresses the challenges of high performance, high memory efficiency, and wait-freedom. Crystalline can guarantee complete wait-freedom even when threads are dynamically recycled, asynchronously reclaims memory in the sense that any thread can reclaim memory retired by any other thread, and ensures (an almost) balanced reclamation workload across all threads. The latter two properties result in Crystalline's high performance and memory efficiency. Simultaneously ensuring all three properties requires overcoming unique challenges. Crystalline supports ubiquitous x86-64 and ARM64 architectures, while achieving superior throughput than prior fast schemes such as EBR as the number of threads grows. We also accentuate that many recent approaches, unlike HP, lack strict non-blocking guarantees when used with multiple data structures. By providing full wait-freedom, Crystalline addresses this problem as well.
- Improving Operating System Security, Reliability, and Performance through Intra-Unikernel Isolation, Asynchronous Out-of-kernel IPC, and Advanced System ServersSung, Mincheol (Virginia Tech, 2023-03-28)Computer systems are vulnerable to security exploits, and the security of the operating system (OS) is crucial as it is often a trusted entity that applications rely on. Traditional OSs have a monolithic design where all components are executed in a single privilege layer, but this design is increasingly inadequate as OS code sizes have become larger and expose a large attack surface. Microkernel OSs and multiserver OSs improve security and reliability through isolation, but they come at a performance cost due to crossing privilege layers through IPCs, system calls, and mode switches. Library OSs, on the other hand, implement kernel components as libraries which avoids crossing privilege layers in performance-critical paths and thereby improves performance. Unikernels are a specialized form of library OSs that consist of a single application compiled with the necessary kernel components, and execute in a single address space, usually atop a hypervisor for strong isolation. Unikernels have recently gained popularity in various application domains due to their better performance and security. Although unikernels offer strong isolation between each instance due to virtualization, there is no isolation within a unikernel. Since the model eliminates the traditional separation between kernel and user parts of the address space, the subversion of a kernel or application component will result in the subversion of the entire unikernel. Thus, a unikernel must be viewed as a single unit of trust, reducing security. The dissertation's first contribution is intra-unikernel isolation: we use Intel's Memory Protection Keys (MPK) primitive to provide per-thread permission control over groups of virtual memory pages within a unikernel's single address space, allowing different areas of the address space to be isolated from each other. We implement our mechanisms in RustyHermit, a unikernel written in Rust. Our evaluations show that the mechanisms have low overhead and retain unikernel's low system call latency property: 0.6% slowdown on applications including memory/compute intensive benchmarks as well as micro-benchmarks. Multiserver OS, a type of microkernel OS, has high parallelism potential due to its inherent compartmentalization. However, the model suffers from inferior performance. This is due to inter-process communication (IPC) client-server crossings that require context switches for single-core systems, which are more expensive than traditional system calls; on multi-core systems (now ubiquitous), they have poor resource utilization. The dissertation's second contribution is Aoki, a new approach to IPC design for microkernel OSs. Aoki incorporates non-blocking concurrency techniques to eliminate in-kernel blocking synchronization which causes performance challenges for state-of-the-art microkernels. Aoki's non-blocking (i.e., lock-free and wait-free) IPC design not only improves performance and scalability, but also enhances reliability by preventing thread starvation. In a multiserver OS setting, the design also enables the reconnection of stateful servers after failure without loss of IPC states. Aoki solves two problems that have plagued previous microkernel IPC designs: reducing excessive transitions between user and kernel modes and enabling efficient recovery from failures. We implement Aoki in the state-of-the-art seL4 microkernel. Results from our experiments show that Aoki outperforms the baseline seL4 in both fastpath IPC and cross-core IPC, with improvements of 2.4x and 20x, respectively. The Aoki IPC design enables the design of system servers for multiserver OSs with higher performance and reliability. The dissertation's third and final contribution is the design of a fault-tolerant storage server and a copy-free file system server. We build both servers using NetBSD OS's rumprun unikernel, which provides robust isolation through hardware virtualization, and is capable of handling a wide range of storage devices including NVMe. Both servers communicate with client applications using Aoki's IPC design, which yields scalable IPC. In the case of the storage server, the IPC also enables the server to transparently recover from server failures and reconnect to client applications, with no loss of IPC state and no significant overhead. In the copy-free file system server's design, applications grant the server direct memory access to file I/O data buffers for high performance. The performance problems solved in the server designs have challenged all prior multiserver/microkernel OSs. Our evaluations show that both servers have a performance comparable to Linux and the rumprun baseline.
- Improving Security of Edge Devices by Offloading Computations to Remote, Trusted Execution EnvironmentsBilbao Munoz, Carlos (Virginia Tech, 2022-01-11)In this thesis we aim to push forward the state-of-the-art security on instruction set architecture (ISA) heterogeneous systems by adopting an edge-computing approach. As the embedded devices market grows, such systems remain affected by a wide range of attacks and are particularly vulnerable to techniques that render the operating system or hypervisor untrusted. The usage of Trusted Execution Environments (TEEs) can help mitigate such threat model(s) immensely, but embedded devices rarely count with the hardware support required. To address this situation and enhance security on embedded devices, we present the RemoteTrust framework, which allows modest devices to offload secure computations on a remote server with hardware-level TEEs. To ease portability, we develop the framework on top of the open-source hardware-agnostic Open Enclave SDK. We evaluate the framework from a security and performance perspectives on a realistic infrastructure. In terms of security, we provide a list of CVEs that could potentially be mitigated by RemoteTrust, and we prevent the Heartbleed attack on a vulnerable server. From a performance perspective, we port C/C++ benchmarks of SPEC CPU 2017, two overhead microbenchmarks and five open-source applications, demonstrating small communication overhead (averaging less than 1 second per 100 remote single-parameter enclave calls).
- Kite: Lightweight Critical Service DomainsMehrab, A K M Fazla; Nikolaev, Ruslan; Ravindran, Binoy (ACM, 2022-03-28)Converged multi-level secure (MLS) systems, such as Qubes OS or SecureView, heavily rely on virtualization and service virtual machines (VMs). Traditionally, driver domains – isolated VMs that run device drivers – and daemon VMs use full-blown general-purpose OSs. It seems that specialized lightweight OSs, known as unikernels, would be a better fit for those. Surprisingly, to this day, driver domains can only be built from Linux. We discuss how unikernels can be beneficial in this context – they improve security and isolation, reduce memory overheads, and simplify software configuration and deployment.We specifically propose to use unikernels that borrow device drivers from existing general-purpose OSs. We present Kite which implements network and storage unikernel-based VMs and serve two essential classes of devices. We compare our approach against Linux using a number of typical micro- and macrobenchmarks used for networking and storage. Our approach achieves performance similar to that of Linux. However, we demonstrate that the number of system calls and ROP gadgets can be greatly reduced with our approach compared to Linux. We also demonstrate that our approach has resilience to an array of CVEs (e.g., CVE-2021-35039, CVE-2016-4963, and CVE- 2013-2072), smaller image size, and improved startup time. Finally, unikernelizing is doable for the remaining (non-driver) service VMs as evidenced by our unikernelized DHCP server.
- Linux Kernel Module Continuous Address Space Re-RandomizationNadeem, Muhammad Hassan (Virginia Tech, 2020-02-28)Address space layout randomization (ASLR) is a technique employed to prevent exploitation of memory corruption vulnerabilities in user-space programs. While this technique is widely studied, its kernel space counterpart known as kernel address space layout randomization (KASLR) has received less attention in the research community. KASLR, as it is implemented today is limited in entropy of randomization. Specifically, the kernel image and its modules can only be randomized within a narrow 1GB range. Moreover, KASLR does not protect against memory disclosure vulnerabilities, the presence of which reduces or completely eliminates the benefits of KASLR. In this thesis, we make two major contributions. First, we add support for position-independent kernel modules to Linux so that the modules can be placed anywhere in the 64-bit virtual address space and at any distance apart from each other. Second, we enable continuous KASLR re-randomization for Linux kernel modules by leveraging the position-independent model. Both contributions increase the entropy and reduce the chance of successful ROP attacks. Since prior art tackles only user-space programs, we also solve a number of challenges unique to the kernel code. Our experimental evaluation shows that the overhead of position-independent code is very low. Likewise, the cost of re-randomization is also small even at very high re-randomization frequencies.
- On Improving the Security of Virtualized Systems through Unikernelized Driver Domain and Virtual Machine Monitor Compartmentalization and SpecializationMehrab, A. K. M. Fazla (Virginia Tech, 2023-03-31)Virtualization is the backbone of cloud infrastructures. Its core subsystems include hypervisors and virtual machine monitors (VMMs). They ensure the isolation and security of co-existent virtual machines (VMs) running on the same physical machine. Traditionally, driver domains -- isolated VMs in a hypervisor such as Xen that run device drivers -- use general-purpose full-featured OSs (e.g., Linux), which has a large attack surface, evident by the increasing number of their common vulnerabilities and exposures (CVEs). We argue for using the unikernel operating system (OS) model for driver domains. In this model, a single application is statically compiled together with the minimum necessary kernel code and libraries to produce a single address-space image, reducing code size by as much as one order of magnitude, which yields security benefits. We develop a driver domain OS, called Kite, using NetBSD OS's rumprun unikernel. Since rumprun is directly based on NetBSD's code, it allows us to leverage NetBSD's large collection of device drivers, including highly specialized ones such as Amazon ENA. Kite's design overcomes several significant challenges including Xen's limited para-virtualization (PV) I/O support in rumprun, lack of Xen backend drivers which prevents rumprun from being used as a driver domain OS, and NetBSD's lack of support for running driver domains in Xen. We instantiate Kite for the two most widely used I/O devices, storage and network, by designing and implementing the storage backend and network backend drivers. Our evaluations reveal that Kite achieves competitive performance to a Linux-based driver domain while using 10x fewer system calls, mitigates a set of CVEs, and retains all the benefits of unikernels including a reduced number of return-oriented programming (ROP) gadgets and advanced gadget-related metrics. General-purpose VMMs include a large number of components that may not be used in many VM configurations, resulting in a large attack surface. In addition, they lack intra-VMM isolation, which degrades security: vulnerabilities in one VMM component can be exploited to compromise other components or that of the host OS and other VMs (by privilege escalation). To mitigate these security challenges, we develop principles for VMM compartmentalization and specialization. We construct a prototype, called Redwood, embodying those principles. Redwood is built by extending Cloud Hypervisor and compartmentalizes thirteen critical components (i.e., virtual I/O devices) using Intel MPK, a hardware primitive available in Intel CPUs. Redwood has fifteen fine-grained modules, each representing a single feature, which increases its configurability and flexibility. Our evaluations reveal that Redwood is as performant as the baseline Cloud Hypervisor, has a 50% smaller VMM image size and 50% fewer ROP gadgets, and is resilient to an array of CVEs. I/O acceleration architectures, such as Data Plane Development Kit (DPDK) enhance VM performance by moving the data plane from the VMM to a separate userspace application. Since the VMM must share its VMs' sensitive information with accelerated applications, it can potentially degrade security. The dissertation's final contribution is the compartmentalization of a VM's sensitive data within an accelerated application using the Intel MPK hardware primitive. Our evaluations reveal that the technique does not cause any degradation in I/O performance and mitigates potential attacks and a class of CVEs.
- POSTER: wCQ: A Fast Wait-Free Queue with Bounded Memory UsageNikolaev, Ruslan; Ravindran, Binoy (ACM, 2022-04-02)The concurrency literature presents a number of approaches for building non-blocking, FIFO, multiple-producer and multiple- consumer (MPMC) queues. However, existing wait-free queues are either not very scalable or suffer from potentially unbounded memory usage. We present a wait-free queue, wCQ, which uses its own variation of the fast-path-slow-path methodology to attain wait-freedom and bound memory usage. wCQ is memory efficient and its performance is often on par with the best known concurrent queue designs.
- rave: A Framework for Code and Memory Randomization of Linux ContainersBlackburn, Christopher Nogueira (Virginia Tech, 2021-07-23)Memory corruption continues to plague modern software systems, as it has for decades. With the emergence of code-reuse attacks which take advantage of these vulnerabilities like Return- Oriented Programming (ROP) or non-control data attacks like Data-Oriented programming (DOP), defenses against these are growing thin. These attacks, and more advanced variations of them, are becoming more difficult to detect and to mitigate. In this arms race, it is critical to not only develop mitigation techniques, but also ways we can effectively deploy those techniques. In this work, we present rave - a framework which takes common design features of defenses against memory corruption and code-reuse and puts them in a real-world setting. Rave consists of two components: librave, the library responsible for static binary analysis and instrumentation, and CRIU-rave, an extended version of the battle-tested process migration tool available for Linux. In our prototype of this framework, we have shown that these tools can be used to rewrite live applications, like NGINX, with enough randomization to disrupt memory corruption attacks. This work is supported in part by ONR under grant N00014-18-1-2022 and NAVSEA/NEEC/NSWC Dahlgren under grant N00174-20-1-0009.
- Secure and Efficient In-Process Monitor and Multi-Variant ExecutionYeoh, SengMing (Virginia Tech, 2021-02-01)Control flow hijacking attacks such as Return Oriented Programming (ROP) and data oriented attacks like Data Oriented Programming (DOP) are problems still plaguing modern software today. While there have been many attempts at hardening software and protecting against these attacks, the heavy performance cost of running these defenses and intrusive modifications required has proven to be a barrier to adoption. In this work, we present Monguard, a high-performance hardware assisted in-process monitor protection system utilizing Intel Memory Protection Keys (MPK) to enforce execute-only memory, combined with code randomization and runtime binary patching to effectively protect and hide in-process monitors. Next, we introduce L-MVX, a flexible lightweight Multi-Variant Execution (MVX) system running in the in-process monitor system that aims to solve some of the performance problems of recent MVX defenses through selective program call graph protection and in-process monitoring, maintaining security guarantees either by breaking attacker assumptions or creating a scenario where a particular attack only works on a single variant.
- wCQ: A Fast Wait-Free Queue with Bounded Memory UsageNikolaev, Ruslan; Ravindran, Binoy (ACM, 2022-07-11)The concurrency literature presents a number of approaches for building non-blocking, FIFO, multiple-producer and multiple-consumer (MPMC) queues. However, only a fraction of them have high performance. In addition, many queue designs, such as LCRQ, trade memory usage for better performance. The recently proposed SCQ design achieves both memory efficiency as well as excellent performance. Unfortunately, both LCRQ and SCQ are only lock-free. On the other hand, existing wait-free queues are either not very performant or suffer from potentially unbounded memory usage. Strictly described, the latter queues, such as Yang & Mellor-Crummey’s (YMC) queue, forfeit wait-freedom as they are blocking when memory is exhausted. We present a wait-free queue, called wCQ. wCQ is based on SCQ and uses its own variation of fast-path-slow-path methodology to attain wait-freedom and bound memory usage. Our experimental studies on x86 and PowerPC architectures validate wCQ’s great performance and memory efficiency. They also show that wCQ’s performance is often on par with the best known concurrent queue designs.
- ZxOS: Zephyr-based Guest Operating System for Heterogeneous ISA MachinesKrishnakumar, Ashwin (Virginia Tech, 2022-03-04)With the fast-approaching limits of single-threaded CPU performance, chip vendors are manufacturing an array of radically different computing architectures, including multicore and heterogeneous architectures, to continue to accelerate computer performance. An important emerging data point in the heterogeneous architecture design space is heterogeneity in instruction-set architecture (ISA). ISA-heterogeneity is emerging in many forms. An exemplar case is Smart I/O devices such as SmartNICs and SmartSSDs that incorporate CPUs of the RISC ISA family (e.g, ARM64, RISC-V), which when integrated with a highperformance server with CPUs of the CISC ISA family (e.g., x86-64) yields a single machine with heterogeneous-ISA CPUs. This thesis presents the design of a shared memory OS for a cache-coherent, shared memory heterogeneous-ISA hardware. The OS, called ZxOS, is built by modifying the open-source ZephyrOS, including its architecture-specific code and page mapping mechanism to create a memory region that can be shared across heterogeneous- ISA CPUs. Since existent heterogeneous-ISA hardware has physically discrete memory for ISA-heterogeneous CPUs, ZxOS targets a software emulation environment that emulates cache-coherent, shared memory heterogeneous-ISA hardware. Our experimental evaluation using a set of micro- and macro-benchmarks demonstrate ZxOS's functionality. In particular, they show that a multithreaded application's threads can be split across (simulated) ISA-heterogeneous cores for parallel execution and that thread's concurrent access of shared memory variables is consistent.