A Scalable Leader Based Consensus Algorithm
Present-day commonly used systems like Cassandra, Spanner, and CockroachDB require high availability and strict consistency guarantees. High availability is attained through redundancy. In the field of computing, redundancy is attained through state machine repli- cation. Protocols like Raft, Multi-Paxos, ZAB, or other variants of Paxos are commonly used to achieve state machine replication. These protocols choose one of the processes from multiple processes running on various machines in a distributed setting as the leader. The leader is responsible for client interactions, replicating client operations on all the followers, and maintaining a consistent view across the system. In these protocols, the leader is more loaded than other nodes or followers in the system, making the leader a significant scalabil- ity bottleneck for multi-datacenter and edge deployments. The overall commit throughput and latency are further exacerbated in majority agreement with the hardware and network heterogeneity. This work aims to reduce the load on the leader by using reduced dynamic latency-aware flexible quorums while maintaining strict correctness guarantees like linearizability. In this thesis, we implement dynamic reduced-size commit quorums to reduce the leader’s load and improve throughput and latency, called FDRaft. The commit quorums are computed based on an exponentially moving weighted average of the followers’ time to respond to the leader, accounting for the heterogeneity in hardware and network. The reduced commit quorum requires a bigger election quorum, but elections rarely happen, and a single leader can serve for significant durations. We evaluate this protocol using a key-value store built on FDRaft and Raft and compare multi-datacenter and edge deployments. The evaluation shows 2x improved throughput and around 55% improved latency over Raft during normal operations and 45% improvement over Raft with vanilla flexible-quorums under failure conditions.