On Improving Distributed Transactional Memory through Nesting, Partitioning and Ordering

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

Distributed Transactional Memory (DTM) is an emerging, alternative concurrency control model that aims to overcome the challenges of distributed-lock based synchronization. DTM employs transactions in order to guarantee consistency in a concurrent execution. When two or more transactions conflict, all but one need to be delayed or rolled back.

Transactional Memory supports code composability by nesting transactions. Nesting how- ever can be used as a strategy to improve performance. The closed nesting model enables partial rollback by allowing a sub-transaction to abort without aborting its parent, thus reducing the amount of work that needs to be retried. In the open nesting model, sub- transactions can commit to the shared state independently of their parents. This reduces isolation and increases concurrency.

Our first main contribution in this dissertation are two extensions to the existing Transac- tional Forwarding Algorithm (TFA). Our extensions are N-TFA and TFA-ON, and support closed nesting and open nesting, respectively. We additionally extend the existing SCORe algorithm with support for open nesting (we call the result SCORe-ON). We implement these algorithms in a Java DTM framework and evaluate them. This represents the first study of transaction nesting in the context of DTM, and contributes the first DTM implementation which supports closed nesting or open nesting.

Closed nesting through our N-TFA implementation proved insufficient for any significant throughput improvements. It ran on average 2% faster than flat nesting, while performance for individual tests varied between 42% slowdown and 84% speedup. The workloads that benefit most from closed nesting are characterized by short transactions, with between two and five sub-transactions.

Open nesting, as exemplified by our TFA-ON and SCORe-ON implementations, showed promising results. We determined performance improvement to be a trade-off between the overhead of additional commits and the fundamental conflict rate. For write-intensive, high- conflict workloads, open nesting may not be appropriate, and we observed a maximum speedup of 30%. On the other hand, for lower fundamental-conflict workloads, open nesting enabled speedups of up to 167% in our tests.

In addition to the two nesting algorithms, we also develop Hyflow2, a high-performance DTM framework for the Java Virtual Machine, written in Scala. It has a clean Scala API and a compatibility Java API. Hyflow2 was on average two times faster than Hyflow on high-contention workloads, and up to 16 times faster in low-contention workloads.

Our second main contribution for improving DTM performance is automated data partition- ing. Modern transactional processing systems need to be fast and scalable, but this means many such systems settled for weak consistency models. It is however possible to achieve all of strong consistency, high scalability and high performance, by using fine-grained partitions and light-weight concurrency control that avoids superfluous synchronization and other over- heads such as lock management. Independent transactions are one such mechanism, that rely on good partitions and appropriately defined transactions. On the downside, it is not usually straightforward to determine optimal partitioning schemes, especially when dealing with non-trivial amounts of data. Our work attempts to solve this problem by automating the partitioning process, choosing the correct transactional primitive, and routing transactions appropriately.

Our third main contribution is Alvin, a system for managing concurrently running trans- actions on a geographically replicated data-store. Alvin supports general-purpose transactions, and guarantees strong consistency criteria. Through a novel partial order broadcast protocol, Alvin maximizes the parallelism of ordering and local transaction processing, resulting in low client-perceived latency. Alvin can process read-only transactions either lo- cally or globally, according to the desired consistency criterion. Conflicting transactions are ordered across all sites. We built Alvin in the Go programming language. We conducted our evaluation study on Amazon EC2 infrastructure and compared against Paxos- and EPaxos- based state machine replication protocols. Our results reveal that Alvin provides significant speed-up for read-dominated TPC-C workloads: as much as 4.8x when compared to EPaxos on 7 datacenters, and up to 26% in write-intensive workloads.

Our fourth and final contribution is M2Paxos, a multi-leader implementation of Generalized Consensus. Single leader-based consensus protocols are known to stop scaling once the leader reaches its saturation point. Ordering commands based on conflicts is appealing due to the potentially higher parallelism, but is imperfect due to the higher quorum sizes required for fast decisions and the need to compare commands and track their dependencies. M2Paxos on the other hand exploits fast decisions (i.e., delivery of a command in two communication delays) by leveraging a classic quorum size, matching a majority of nodes deployed. M2Paxos does not establish command dependencies based on conflicts, but it binds accessed objects to nodes, making sure commands operating on the same object will be ordered by the same node. Our evaluation study of M2Paxos (also built in Go) confirms the effectiveness of this approach, getting up to 7⨉ improvements in performance over state- of-the-art consensus and generalized consensus algorithms.

Distributed Transactional Memory, Distributed Systems, Nested Transactions, Automated Partitioning, Consensus