High level strategy for detection of transient faults in computer systems
A major portion of digital system malfunctions are due to the presence of temporary faults which are either intermittent or transient. An intermittent fault manifests itself at regular intervals, while a transient fault causes a temporary change in the state of the system without damaging any of the components. Transient faults are difficult to detect and isolate and hence become a source of major concern, especially in critical real-time applications.
Since satellite systems are particularly susceptible to transient faults induced by the radiation environment, a satellite communications protocol model has been developed for experimental research purposes. The model implements the MlL-TD-1553B protocol, which dictates the modes of communication between several satellite systems. The model has been developed employing the structural and behavioral capabilities of the HILO simulation system.
SEUs are injected into the protocol model and the effects on the program flow are investigated. A two-tier detection scheme employing the concept of Signature Analysis is developed. Performance evaluation of the detection mechanisms is carried out and the results are presented.