Distributed reconfiguration and fault diagnosis in cellular processing arrays

TR Number
Date
1993-01-05
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract

An overview of an existing hierarchical reconfiguration scheme for a fault-tolerant two-dimensional cellular architecture is presented, wherein an array of finite state machine cells controls the processing and switching elements. This allows the array to either reconfigure in the presence of faults, or to perform different processing functions. Since the control mechanism is distributed, the system is not subject to single-point "hard core" failures, as in the case of a global control mechanism. Unlike other fault-tolerant systems, the proposed method does not assume the existence of components which never fail.

The processing elements in the array are logically connected in a mesh pattern, and are provided with additional physical connections to other cells. A local reconfiguration scheme allows faulty cells to be bypassed via these additional connections, so that the logical mesh can be restored. This technique allows the array to quickly reconfigure in the presence of up to triple faults.

When local reconfiguration fails, the array can still reconfigure by using a global reconfiguration scheme, in which the functional part of the array relocates itself to a faultfree area. The process is "global" in the sense that the entire functional part of the array is involved in the process, but the mechanism to accomplish this is still distributed in nature.

With the framework of the system established, the results of this research are presented. The hardware complexities of the existing global reconfiguration scheme are analyzed, and compared with the complexities of previous work in this area. A distributed diagnosis algorithm is also developed, which works in conjunction with the local reconfiguration mechanism to quickly detect and isolate faults in the array. Using these results, the foundations are laid for a totally self-checking implementation of the control cells, which allows online concurrent fault detection in the array.

Description
Keywords
Citation
Collections