On Partial Aborts and Reducing Validation Costs in Fault-tolerant Distributed Transactional Memory

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

Distributed Transactional Memory (DTM) is an emerging synchronization abstraction thatpromises to alleviate the scalability, programmability, and composability challenges of lock-based distributed synchronization. With DTM, programmers organize code that read/writeshared memory objects, both local and remote, as memory transactions. An underlying DTMframework guarantees atomicity, isolation, and consistency properties for those transactionsthrough speculative concurrency control. In DTM, restarting an aborted transaction from the beginning can degrade performance astransactional conflicts may have occurred in the later part of the transaction, wasting work.The partial abort method alleviates this difficulty by enabling a transaction to be rolled backto the point where objects in the transaction's read-set and write-set are still consistent.In this thesis, we present protocols for supporting partial aborts in QR-DTM, a fault-tolerant DTM that uses quorum protocols for distributed concurrency control in the presence offailures. We focus on two partial abort models: closed nesting, which allows transactions to be nested and inner transactions to be rolled back without rolling back outer transactions;and checkpointing, which allows the transaction state to be saved in checkpoints throughout execution and transactions to be rolled back to those checkpoints. We present protocols that support these partial abort models in QR-DTM, called QR-CN and QR-CHK. We implemented these protocols and conducted experimental studies using macro-benchmarks(e.g., distributed versions of STAMP benchmark), and micro-benchmarks (e.g., distributed data structures). Our studies reveal that QR-CN improves throughput by as much as 101% over flat nesting in specific cases, with an average improvement of 53%. We also develop QR-ACN, a framework that automatically decomposes programmer-writtentransactions into closed nested transactions, at run-time, to improve performance. The composition of a closed nested transaction depends on the contention of objects, which can change at run-time depending upon the workload at hand. Our implementation and experimental studies reveal that QR-ACN consistently outperforms flat nesting by an averageof 51% on benchmarks including TPC-C. False conflicts occur when high-level operations, even though semantically independent, traverse the same set of objects during transaction execution. Such conflicts can lead torepeated aborts, increasing transaction execution time and degrading performance, which can be significant in DTM, since transaction execution also includes network communication. ii We consider the approach of reducing validation cost for resolving false conflicts. We presentthree protocols for reducing validation cost in DTM. Our first protocol, QR-ON, incorporatesthe open nesting model into QR-DTM. Open nesting allows inner-nested transactions tocommit independently of their parent transaction, releasing objects in the transaction read-set and write-set early, minimizing aborts due to false conflicts. We then present QR-OON,in which open-nested transactions commit asynchronously so that subsequent transactionscan proceed without waiting for the commit of previous transactions. Finally, we presentan early release methodology, QR-ER, in which objects that do not affect the final state ofthe shared data are dropped from the transaction's read-set, which avoids false conflicts andreduces communication costs. Our implementation and experimental studies revealed that QR-OON outperforms QR-ON by up to 43%, and that QR-ER outperforms QR-ON and QR-OON by up to 10% on micro- and macro-benchmarks.

Software Transactional Memory, Partial Abort, Validation, Closed Nesting, Checkpointing