A Scalable Leader Based Consensus Algorithm

dc.contributor.authorGulati, Ishaanen
dc.contributor.committeechairNikolopoulos, Dimitrios S.en
dc.contributor.committeememberJi, Boen
dc.contributor.committeememberBack, Godmaren
dc.contributor.departmentComputer Scienceen
dc.description.abstractPresent-day commonly used systems like Cassandra, Spanner, and CockroachDB require high availability and strict consistency guarantees. High availability is attained through redundancy. In the field of computing, redundancy is attained through state machine repli- cation. Protocols like Raft, Multi-Paxos, ZAB, or other variants of Paxos are commonly used to achieve state machine replication. These protocols choose one of the processes from multiple processes running on various machines in a distributed setting as the leader. The leader is responsible for client interactions, replicating client operations on all the followers, and maintaining a consistent view across the system. In these protocols, the leader is more loaded than other nodes or followers in the system, making the leader a significant scalabil- ity bottleneck for multi-datacenter and edge deployments. The overall commit throughput and latency are further exacerbated in majority agreement with the hardware and network heterogeneity. This work aims to reduce the load on the leader by using reduced dynamic latency-aware flexible quorums while maintaining strict correctness guarantees like linearizability. In this thesis, we implement dynamic reduced-size commit quorums to reduce the leader’s load and improve throughput and latency, called FDRaft. The commit quorums are computed based on an exponentially moving weighted average of the followers’ time to respond to the leader, accounting for the heterogeneity in hardware and network. The reduced commit quorum requires a bigger election quorum, but elections rarely happen, and a single leader can serve for significant durations. We evaluate this protocol using a key-value store built on FDRaft and Raft and compare multi-datacenter and edge deployments. The evaluation shows 2x improved throughput and around 55% improved latency over Raft during normal operations and 45% improvement over Raft with vanilla flexible-quorums under failure conditions.en
dc.description.abstractgeneralIn our day-to-day life, we rely heavily on different internet applications, be it Instagram for sharing pictures, Amazon for our shopping, Doordaash for our food orders, Spotify for listening to music, or Uber for traveling. These applications share many commonalities, like the scale at which they operate, maintaining strict latency guarantees, high availability to serve the users, and using databases to maintain shared states. The data is replicated across multiple servers to provide fault tolerance against failures. The replication across multiple servers is achieved through state-machine replication. In state-machine replication, multiple servers start with the same initial state and perform operations in the same order to reach the same final state. This process of replication in computing is achieved through a consensus algorithm. Con- sensus means agreement, and consensus algorithms are used to reach an agreement for a particular value. Raft, Multi-Paxos, or any other variant of Paxos are the commonly used consensus algorithms to achieve agreement on a particular value in a distributed setting. In these algorithms, one of the servers is chosen as the leader responsible for client interactions, replicating and maintaining the same state across all the servers, even when faced with server and network failures. Every time the leader receives a client operation, it starts the consensus process by forwarding the client request to all the servers and committing the client request after receiving an agreement from the majority. As the leader does most of the work, it is more loaded than other servers and becomes a significant scalability bottleneck. The leader bottleneck becomes more evident in multi-datacenters and edge deployments. The hardware and network heterogeneity also severely affects the overall commit throughput and latency in majority agreement. In this thesis, we reduce the load on the leader by building a smaller-sized dynamic commit quorum with latency-aware server selection based on an exponentially weighted moving av- erage of the followers’ response time to the leader’s requests without compromising safety and liveness properties. Our design also provides a higher efficiency for throughput and commit latency. We evaluate this protocol against multiple workloads and failure conditions and find that it outperforms Raft by 2x in terms of throughput and around 55% in latency over Raft during normal operations. It also shows improvement in throughput and latency by 45% over Raft with vanilla flexible-quorums under failure conditions.en
dc.publisherVirginia Techen
dc.rightsCC0 1.0 Universalen
dc.subjectDistributed Systemsen
dc.subjectFault Toleranceen
dc.subjectState Machine Replicationen
dc.subjectFlexible Commit Quorumsen
dc.titleA Scalable Leader Based Consensus Algorithmen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
1.76 MB
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
1.5 KB
Item-specific license agreed upon to submission