Resilire: Achieving High Availability Through Virtual Machine Live Migration
High availability is a critical feature of data centers, cloud, and cluster computing environments. Replication is a classical approach to increase service availability by providing redundancy. However, traditional replication methods are increasingly unattractive for deployment due to several limitations such as application-level non-transparency, non-isolation of applications (causing security vulnerabilities), complex system management, and high cost. Virtualization overcomes these limitations through another layer of abstraction, and provides high availability through virtual machine (VM) live migration: a guest VM image running on a primary host is transparently check-pointed and migrated, usually at a high frequency, to a backup host, without pausing the VM; the VM is resumed from the latest checkpoint on the backup when a failure occurs. A virtual cluster (VC) generalizes the VM concept for distributed applications and systems: a VC is a set of multiple VMs deployed on different physical machines connected by a virtual network. This dissertation presents a set of VM live migration techniques, their implementations in the Xen hypervisor and Linux operating system kernel, and experimental studies conducted using benchmarks (e.g., SPEC, NPB, Sysbench) and production applications (e.g., Apache webserver, SPECweb). We first present a technique for reducing VM migration downtimes called FGBI. FGBI reduces the dirty memory updates that must be migrated during each migration epoch by tracking memory at block granularity. Additionally, it determines memory blocks with identical content and shares them to reduce the increased memory overheads due to block-level tracking granularity, and uses a hybrid compression mechanism on the dirty blocks to reduce the migration traffic. We implement FGBI in the Xen hypervisor and conduct experimental studies, which reveal that the technique reduces the downtime by 77% and 45% over competitors including LLM and Remus, respectively, with a performance overhead of 13%. We then present a lightweight, globally consistent checkpointing mechanism for virtual cluster, called VPC, which checkpoints the VC for immediate restoration after (one or more) VM failures. VPC predicts the checkpoint-caused page faults during each checkpointing interval, in order to implement a lightweight checkpointing approach for the entire VC. Additionally, it uses a globally consistent checkpointing algorithm, which preserves the global consistency of the VMs' execution and communication states, and only saves the updated memory pages during each checkpointing interval. Our Xen-based implementation and experimental studies reveal that VPC reduces the solo VM downtime by as much as 45% and reduces the entire VC downtime by as much as 50% over competitors including VNsnap, with a memory overhead of 9% and performance overhead of 16%. The dissertation's third contribution is a VM resumption mechanism, called VMresume, which restores a VM from a (potentially large) checkpoint on slow-access storage in a fast and efficient way. VMresume predicts and preloads the memory pages that are most likely to be accessed after the VM's resumption, minimizing otherwise potential performance degradation due to cascading page faults that may occur on VM resumption. Our experimental studies reveal that VM resumption time is reduced by an average of 57% and VM's unusable time is reduced by 73.8% over native Xen's resumption mechanism. Traditional VM live migration mechanisms are based on hypervisors. However, hypervisors are increasingly becoming the source of several major security attacks and flaws. We present a mechanism called HSG-LM that does not involve the hypervisor during live migration. HSG-LM is implemented in the guest OS kernel so that the hypervisor is completely bypassed throughout the entire migration process. The mechanism exploits a hybrid strategy that reaps the benefits of both pre-copy and post-copy migration mechanisms, and uses a speculation mechanism that improves the efficiency of handling post-copy page faults. We modify the Linux kernel and develop a new page fault handler inside the guest OS to implement HSG-LM. Our experimental studies reveal that the technique reduces the downtime by as much as 55%, and reduces the total migration time by as much as 27% over competitors including Xen-based pre-copy, post-copy, and self-migration mechanisms. In a virtual cluster environment, one of the main challenges is to ensure equal utilization of all the available resources while avoiding overloading a subset of machines. We propose an efficient load balancing strategy using VM live migration, called DCbalance. Differently from previous work, DCbalance records the history of mappings to inform future placement decisions, and uses a workload-adaptive live migration algorithm to minimize VM downtime. We improve Xen's original live migration mechanism and implement the DCbalance technique, and conduct experimental studies. Our results reveal that DCbalance reduces the decision generating time by 79%, the downtime by 73%, and the total migration time by 38%, over competitors including the OSVD virtual machine load balancing mechanism and the DLB (Xen-based) dynamic load balancing algorithm. The dissertation's final contribution is a technique for VM live migration in Wide Area Networks (WANs), called FDM. In contrast to live migration in Local Area Networks (LANs), VM migration in WANs involve migrating disk data, besides memory state, because the source and the target machines do not share the same disk service. FDM is a fast and storage-adaptive migration mechanism that transmits both memory state and disk data with short downtime and total migration time. FDM uses page cache to identify data that is duplicated between memory and disk, so as to avoid transmitting the same data unnecessarily. We implement FDM in Xen, targeting different disk formats including raw and Qcow2. Our experimental studies reveal that FDM reduces the downtime by as much as 87%, and reduces the total migration time by as much as 58% over competitors including pre-copy or post-copy disk migration mechanisms and the disk migration mechanism implemented in BlobSeer, a widely used large-scale distributed storage service.
- Doctoral Dissertations