Browsing by Author "Le, Michael V."
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Kernel extension verification is untenableJia, Jinghao; Sahu, Raj; Oswald, Adam; Williams, Dan; Le, Michael V.; Xu, Tianyin (ACM, 2023-06-22)The emergence of verified eBPF bytecode is ushering in a new era of safe kernel extensions. In this paper, we argue that eBPF’s verifier—the source of its safety guarantees—has become a liability. In addition to the well-known bugs and vulnerabilities stemming from the complexity and ad hoc nature of the in-kernel verifier, we highlight a concerning trend in which escape hatches to unsafe kernel functions (in the form of helper functions) are being introduced to bypass verifier-imposed limitations on expressiveness, unfortunately also bypassing its safety guarantees. We propose safe kernel extension frameworks using a balance of not just static but also lightweight runtime techniques. We describe a design centered around kernel extensions in safe Rust that will eliminate the need of the in-kernel verifier, improve expressiveness, allow for reduced escape hatches, and ultimately improve the safety of kernel extensions.
- Practical and Flexible Kernel CFI Enforcement using eBPFJia, Jinghao; Le, Michael V.; Ahmed, Salman; Williams, Dan; Jamjoom, Hani (ACM, 2023-09-10)Enforcing control flow integrity (CFI) in the kernel (kCFI) can prevent control-flow hijack attacks. Unfortunately, current kCFI approaches have high overhead or are inflexible and cannot support complex context-sensitive policies. To overcome these limitations, we propose a kCFI approach that makes use of eBPF (eKCFI) as the enforcement mechanism. The focus of this work is to demonstrate through implementation optimizations how to overcome the enormous performance overhead of this approach, thereby enabling the potential benefits with only modest performance tradeoffs.
- Re-thinking termination guarantee of eBPFSahu, Raj (Virginia Tech, 2024-06-10)In the rapidly evolving landscape of BPF as kernel extensions, where the industry is deploying an increasing count of simultaneously running BPF programs, the need for accounting BPF- induced overhead on latency-sensitive kernel functions is becoming critical. We also find that eBPF's termination guarantee is insufficient to protect systems from BPF programs running extraordinarily long due to compute-heavy operations and runtime factors such as contention. Operators lack a crucial mechanism to identify and avoid installing long-running BPF programs while also requiring a mechanism to abort such BPF programs when found to be adding high latency overhead on performance-critical kernel functions. In this work, we propose a runtime estimator and a dynamic termination mechanism to solve these two issues, respectively. We use a hybrid of static and dynamic analysis to provide a runtime range that we demonstrate to encompass the actual runtime of the BPF program. For safe BPF termination, we propose a short-circuiting approach to skip all costly operations and quickly reach completion. We evaluate the proposed solutions to find the obtained performance estimate as too broad, but when paired with the dynamic termination, can be used by a BPF Orchestrator to impose policies on the overhead due to BPF programs in a call path. The proposed dynamic termination solution has zero overhead on BPF programs for no-termination cases while having a verification overhead proportional to the number of helper calls in a BPF program. In the future, we aim to make BPF execution atomic to guarantee that kernel objects modified within a BPF program are always left in a consistent state in the event of program termination.
- Securing Container-based Clouds with Syscall-aware SchedulingLe, Michael V.; Ahmed, Salman; Williams, Dan; Jamjoom, Hani (ACM, 2023-07-10)Container-based clouds—in which containers are the basic unit of isolation—face security concerns because, unlike Virtual Machines, containers directly interface with the underlying highly privileged kernel through the wide and vulnerable system call interface. Regardless of whether a container itself requires dangerous system calls, a compromised or malicious container sharing the host (a bad neighbor) can compromise the host kernel using a vulnerable syscall, thereby compromising all other containers sharing the host. In this paper, rather than attempting to eliminate host compromise, we limit the effectiveness of attacks by bad neighbors to a subset of the cluster. To do this, we propose a new metric dubbed Extraneous System call Exposure (ExS). Scheduling containers to minimize ExS reduces the number of nodes that expose a vulnerable system call and as a result the number of affected containers in the cluster. Experimenting with 42 popular containers on SySched, our greedy scheduler implementation in Kubernetes, we demonstrate that SySched can reduce up to 46% more victim nodes and up to 48% more victim containers compared to the Kubernetes default scheduling while also reducing overall host attack surface by 20%.