Browsing by Author "Barbalace, Antonio"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
- Aggregate VM: Why Reduce or Evict VM's Resources When You Can Borrow Them From Other Nodes?Chuang, Ho-Ren; Manaouil, Karim; Xing, Tong; Barbalace, Antonio; Olivier, Pierre; Heerekar, Balvansh; Ravindran, Binoy (ACM, 2023-05-08)Hardware resource fragmentation is a common issue in data centers. Traditional solutions based on migration or overcommitment are unacceptably slow, and modern commercial or research solutions like Spot VM may reduce or evict VM’s resources anytime.We propose an alternative solution that does not suffer from these drawbacks, the Aggregate VM.We introduce a new distributed hypervisor design, the resource-borrowing hypervisor, which creates Aggregate VMs: distributed VMs that temporarily aggregate fragmented resources belonging to different host machines, which require mobility of virtual CPUs, memory and IO devices.We implement a prototype, FragVisor, which runs guest software transparently.We also propose minimal modifications to the guest OS that can enable significant performance gains. We evaluate FragVisor over a set of microbenchmarks and IaaS-style real applications. Although Aggregate VMs are not a perfect fit for every type of applications, some workloads enjoy significant speedups compared to overcommitted scenarios (up to 3.9x with 4 distributed vCPUs).We further demonstrate that FragVisor is faster than a state-of-the-art competitor, GiantVM (up to 2.5x).
- A Compiler Framework to Support and Exploit Heterogeneous Overlapping-ISA Multiprocessor PlatformsJelesnianski, Christopher Stanisław (Virginia Tech, 2015-10-28)As the demand for ever increasingly powerful machines continues, new architectures are sought to be the next route of breaking past the brick wall that currently stagnates the performance growth of modern multi-core CPUs. Due to physical limitations, scaling single-core performance any further is no longer possible, giving rise to modern multi-cores. However, the brick wall is now limiting the scaling of general-purpose multi-cores. Heterogeneous-core CPUs have the potential to continue scaling by reducing power consumption through exploitation of specialized and simple cores within the same chip. Heterogeneous-core CPUs join fundamentally different processors each which their own peculiar features, i.e., fast execution time, improved power efficiency, etc; enabling the building of versatile computing systems. To make heterogeneous platforms permeate the computer market, the next hurdle to overcome is the ability to provide a familiar programming model and environment such that developers do not have to focus on platform details. Nevertheless, heterogeneous platforms integrate processors with diverse characteristics and potentially a different Instruction Set Architecture (ISA), which exacerbate the complexity of the software. A brave few have begun to tread down the heterogeneous-ISA path, hoping to prove that this avenue will yield the next generation of super computers. However, many unforeseen obstacles have yet to be discovered. With this new challenge comes the clear need for efficient, developer-friendly, adaptable system software to support the efforts of making heterogeneous-ISA the golden standard for future high-performance and general-purpose computing. To foster rapid development of this technology, it is imperative to put the proper tools into the hands of developers, such as application and architecture profiling engines, in order to realize the best heterogeneous-ISA platform possible with available technology. In addition, it would be in the best interest to create tools to be as "timeless" as possible to expose fundamental concepts industry could benefit from and adopt in future designs. We demonstrate the feasibility of a compiler framework and runtime for an existing heterogeneous-ISA operating system (Popcorn Linux) for automatically scheduling compute blocks within an application on a given heterogeneous-ISA high-performance platform (in our case a platform built with Intel Xeon - Xeon Phi). With the introduced Profiler, Partitioner, and Runtime support, we prove to be able to automatically exploit the heterogeneity in an overlapping-ISA platform, being faster than native execution and other parallelism programming models. Empirically evaluating our compiler framework, we show that application execution on Popcorn Linux can be up to 52% faster than the most performant native execution for Xeon or Xeon Phi. Using our compiler framework relieves the developer from manual scheduling and porting of applications, requiring only a single profiling run per application.
- A Flattened Hierarchical Scheduler for Real-Time Virtual MachinesDrescher, Michael Stuart (Virginia Tech, 2015-05-05)The recent trend of migrating legacy computer systems to a virtualized, cloud-based environment has expanded to real-time systems. Unfortunately, modern hypervisors have no mechanism in place to guarantee the real-time performance of applications running on virtual machines. Past solutions to this problem rely on either spatial or temporal resource partitioning, both of which under-utilize the processing capacity of the host system. Paravirtualized solutions in which the guest communicates its real-time needs have been proposed, but they cannot support legacy operating systems. This thesis demonstrates the shortcomings of resource partitioning using temporally-isolated servers, presents an alternative solution to the scheduling problem called the KairosVM Flattening Scheduling Algorithm, and provides an implementation of the algorithm based on Linux and KVM. The algorithm is analyzed theoretically and an exact schedulability test for the algorithm is derived. Simulations show that the algorithm can schedule more than 90% of all randomly generated tasksets with a utilization less than 0.95. In comparison to the state-of-the-art server based approach, the KairosVM Flattening Scheduling Algorithm is able to schedule more than 20 times more tasksets with utilization of 0.95. Experimental results demonstrate that the Linux-based implementation is able to match the deadline satisfaction ratio of a state-of-the-art server-based approach when the taskset is schedulable using the state-of-the-art approach. When tasksets are unschedulable, the implementation is able to increase the deadline satisfaction ratio of Vanilla KVM by up to 400%. Furthermore, unlike paravirtualized solutions, the implementation supports legacy systems through the use of introspection.
- High Performance Inter-kernel Communication and Networking in a Replicated-kernel Operating SystemAnsary, B M Saif (Virginia Tech, 2016-01-20)Modern computer hardware platforms are moving towards high core-count and heterogeneous Instruction Set Architecture (ISA) processors to achieve improved performance as single core performance has reached its performance limit. These trends put the current monolithic SMP operating system (OS) under scrutiny in terms of scalability and portability. Proper pairing of computing workloads with computing resources has become increasingly arduous with traditional software architecture. One of the most promising emerging operating system architectures is the Multi-kernel. Multi-kernels not only address scalability issues, but also inherently support heterogeneity. Furthermore, provide an easy way to properly map computing workloads to the correct type of processing resources in presence of heterogeneity. Multi-kernels do so by partitioning the resources and running independent kernel instances and co-operating amongst themselves to present a unified view of the system to the application. Popcorn is one the most prominent multi-kernels today, which is unique in the sense that it runs multiple Linux instances on different cores or group of cores, and provides a unified view of the system i.e., Single System Image (SSI). This thesis presents four contributions. First, it introduces a filesystem for Popcorn, which is a vital part to provide a SSI. Popcorn supports thread/process migration that requires migration of file descriptors which is not provided by traditional filesystems as well as popular distributed file systems, this work proposes a scalable messaging based file descriptor migration and consistency protocol for Popcorn. Second, multi-kernel OSs rely heavily on a fast low latency messaging layer to be scalable. Messaging is even more important in heterogeneous systems where different types of cores are on different islands with no shared memory. Thus, another contribution proposes a fast-low latency messaging layer to enable communication among heterogeneous processor islands for Heterogeneous Popcorn. With advances in networking technology, newest Ethernet technologies are able to support up to 40 Gbps bandwidth, but due to scalability issues in monolithic kernels, the number of connections served per second does not scale with this increase in speed.Therefore, the third and fourth contributions try to address this problem with Snap Bean, a virtual network device and Angel, an opportunistic load balancer for Popcorn's network system. With the messaging layer Popcorn gets over 30% performance benefit over OpenCL and Intel Offloading technique (LEO). And with NetPopcorn we achieve over 7 to 8 times better performance over vanilla Linux and 2 to 5 times over state-of-the-art Affinity Accept.
- Offloading Datacenter Jobs to RISC-V Hardware for Improved Performance and Power EfficiencyHeerekar, Balvansh; Philippidis, Cesar; Chuang, Ho-Ren; Olivier, Pierre; Barbalace, Antonio; Ravindran, Binoy (ACM, 2024-09-16)The end of Moore’s Law has brought significant changes in the architecture of servers used in data centers, increasingly incorporating new ISAs beyond x86-64 as well as diverse accelerators. Further, single-board computers have become increasingly efficient and can run certain Linux applications at significantly lower equipment and energy costs compared to traditional servers. Past research has demonstrated that offloading applications at runtime from x86-based servers to ARM-based single-board computers can result in increases in throughput and energy efficiency. The RISC-V architecture has recently gained significant commercial interest, and OS-capable single-board computers with RISC-V cores are increasingly available at the commodity scale. In this paper we propose a system that offloads jobs from an x86 server to a RISC-V single-board computer at runtime, with the goals of improving job throughput and energy saved. Towards this, we port the Popcorn Linux multi-ISA toolchain and runtime framework to RISC-V, enabling the live migration of applications between an x86 Xeon server and a SiFive HiFive RISC-V board. We further propose a scheduling policy, Lowest Slowdown First (LSF) that drives the offloading of long-running and stateful datacenter background jobs from the server to the board, to alleviate workload congestion on the server. LSF’s policy relies on monitoring jobs’ performance on the server, predicting the slowdown they would suffer if running on the board, and migrating the jobs with the lowest estimated slowdown. Our evaluation shows that LSF yields up to 20% increase in throughput while also gaining 16% more energy efficiency for compute-intensive workloads.
- Single System Image in a Linux-based Replicated Operating System KernelRavichandran, Akshay Giridhar (Virginia Tech, 2015-09-15)Recent trends in the computer market suggest that emerging computing platforms will be increasingly parallel and heterogeneous, in order to satisfy the user demand for improved performance and superior energy savings. Heterogeneity is a promising technology to keep growing the number of cores per chip without breaking the power wall. However, existing system software is able to cope with homogeneous architectures, but it was not designed to run on heterogeneous architectures, therefore, new system software designs are necessary. One innovative design is the multikernel OS deployed by the Barrelfish operating system (OS) which partitions hardware resources to independent kernel instances that communicate exclusively by message passing, without exploiting the shared memory available amongst different CPUs in a multicore platform. Popcorn Linux implements an extension of the multikernel OS design, called replicated-kernel OS, with the goal of providing a Linux-based single system image environment on top of multiple kernels, which can eventually run on different ISA processors. A replicated-kernel OS replicates the state of various OS sub-systems amongst kernels that cooperate using message passing to distribute or access various services uniquely available on each kernel. In this thesis, we present mechanisms to distribute signals, namespaces, inter-thread synchronizations and socket state replication. These features are built on top of the existing messaging layer, process or thread migration and address space consistency protocol to provide the application with an illusion of single system image and developers with the SMP programming environment they are most familiar with. The mechanisms developed were unit tested with micro benchmarks to validate their correctness and to measure the gained speed up or additive overhead. Real-world applications were also used to benchmark the developed mechanisms on homogeneous and on heterogeneous architectures. It is found that the contributed Popcorn synchronization mechanism exhibits overhead compared to vanilla Linux on multicore as Linux's equivalent mechanisms are tightly coupled with underlying hardware cache coherency protocol, therefore, faster than software message passing. On heterogeneous platforms, the developed mechanisms allow to transparently map each portion of the application to the processor in the platform on which the execution is faster. Optimizations are recommended to further improve the performance of the proposed synchronization mechanism. However, optimizations may force the userspace application and libraries to be rewritten in order to decouple their synchronization mechanisms of shared memory, therefore losing transparency, which is one of the primary goals of this work.
- Xar-Trek: Run-time Execution Migration among FPGAs and Heterogeneous-ISA CPUsHorta, Edson; Chuang, Ho-Ren; VSathish, Naarayanan Rao; Philippidis, Cesar; Barbalace, Antonio; Olivier, Pierre; Ravindran, Binoy (ACM, 2021-12-06)Datacenter servers are increasingly heterogeneous: from x86 host CPUs, to ARM or RISC-V CPUs in NICs/SSDs, to FPGAs. Previous works have demonstrated that migrating application execution at run-time across heterogeneous-ISA CPUs can yield significant performance and energy gains, with relatively little programmer effort. However, FPGAs have often been overlooked in that context: hardware acceleration using FPGAs involves statically implementing select application functions, which prohibits dynamic and transparent migration. We present Xar-Trek, a new compiler and run-time software framework that overcomes this limitation. Xar-Trek compiles an application for several CPU ISAs and select application functions for acceleration on an FPGA, allowing execution migration between heterogeneous-ISA CPUs and FPGAs at run-time. Xar-Trek’s run-time monitors server workloads and migrates application functions to an FPGA or to heterogeneous-ISA CPUs based on a scheduling policy. We develop a heuristic policy that uses application workload profiles to make scheduling decisions. Our evaluations conducted on a system with x86-64 server CPUs, ARM64 server CPUs, and an Alveo accelerator card reveal 88%-1% performance gains over no-migration baselines.