Recovery from transient faults in wavefront processor arrays

TR Number

Date

1993

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

A transient fault in an array of processing elements results in an inconsistent or incorrect state in the processing element. If the erroneous information has already propagated before detection occurs, then the neighboring processing elements can also be in an incorrect state. Restarting the computation from the beginning every time a transient fault occurs is not only very inefficient but, in real-time computations, may not be possible. This thesis suggests the idea of "rollback" to recover from transient faults. Rollback is done by saving the state of the processing element at different instants of time. When an error is detected, backtracking is done to a consistent state and computation resumes from that state. The rollback algorithm is distributed in nature so that there is no single point of failure in the fault recovery mechanism.

Description

Keywords

Citation

Collections