Supporting Distributed Fault Tolerance In A Real-Time Micro-Kernel
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Research into modular approaches for constructing power electronics control systems has provided a number of benefits, as well as new opportunities. Control systems composed of an interconnected collection of standardized parts makes distributed processing a realistic possibility. Unfortunately, current strategies to supporting software on such systems have a number of critical drawbacks. Many existing approaches rely on centralized control strategies, fail to support fault tolerance in the face of failures among processing nodes or communications links, and fail to robustly support live addition or removal of nodes from a running network. In this context, failure of a single element means failure of the entire system.
This thesis describes research to extend the Dataflow Architecture Real-time Kernel (DARK) to support distributed, fault-tolerant execution of control algorithms for power electronics control systems. An appropriate scheme for fault-tolerant scheduling of processes on distributed processing nodes is described, added to DARK, and evaluated. Literature indicates that fault-tolerant multiprocessor scheduling for hard real-time tasks with task precedence constraints is an NP-hard problem. The new system is based on an off-line fault-tolerant scheduling strategy that generates a static schedule of tasks for each processing unit to follow. This algorithm handles both the task precedence constraints and the constraints imposed by the underlying network protocol(DRPESNET). Modifications to the underlying daisy-chained, packet-switched, time-triggered ring network protocol to support communications fault tolerance and plug-and-play addition or removal of live nodes from an existing control system are also described.