Fault-Injection in FPGAs

To inject, detect and fix faults in the configuration memory of the FPGA. Keywords: Safety, Fault injection, Fault insertion, Failure injection, Safety validation, Safety verification, Fault modelling, Fault handling, Fault propagation, Soft-error mitigation
To explore and evaluate the results of faultinjection in an FPGA-based Hardware Platform and its propagation to other system layers. To relate the fault injection on the low-level hardware layer to potential faults on the higher layers to reduce test space.

The Configuration RAM (CRAM) of FPGAs are susceptible to Single-Event Upsets. If a bit in the CRAM is flipped, the FPGA's functionality changes. To counteract this, the CRAM must be continually scrubbed using a Soft-Error Mitigation core, to flip changed bits back to their original design state. The same methodology can be used to intentionally flip bits to inject errors in the CRAM.

A Healing Core IP (HC IP) is a means that can inject, detect and fix faults at any desired place in an FPGA Design. It uses Soft-Error Mitigation (SEM)-cores to detect errors and uses Run-Time Reconfiguration (RTR) techniques to correct Single-and Multiple-Event Upsets (bit-flips) in the FPGA’s configuration memory.   Further,   it   has   a   classification   system   that   can   report   and   initiate   appropriate countermeasures for some faults. Thus, it can be used to implement self-repairing functionality in an FPGA system. The  HC  IP can be  used  to inject,  detect  and  fix  faults  in  the  HW  designs  resident  in  the  FPGA.  By flipping bits in the configuration RAM of the FPGA, we can see how the system reacts, how the fault manifests on the system level, and how it affects up-time, robustness and availability of the component, and evaluate what it implies according to relevant safety standards. Based on historical data, the V&V process should minimize the steps needed for (re-)certification.

Fault injection in FPGAs allows to:

  • Inject, Detect and Heal Faults caused by Single-Event Upsets in the FPGA fabric. This will allow to assess the effects of Single-Event Upsets in the Hardware Platform and see how it will manifest itself, both in Software and on System Level.
  • Inject, Detect and Heal Faults during Run-Time. This will be useful in V&V, when assessing the Safety of the FPGA System.

Fault injection in FPGAs paired with the Healing Core has the potential to:

  • Enable usage of FPGAs in Safety Critical Functions in a System. This will lead to reduced Time, Cost, and Efforts when developing Safe Hardware Platforms.

The method has the following limitations:

  • Using the Healing Core has some Time, Cost and Effort overhead associated with it. At the moment it is unclear if the costs of using it outweigh the costs of not using it.
  • The Healing Core is also subject to Faults. If the voter that reads or write the ICAP is hit, we cannot detect and correct the Fault. The FPGA has to be reset, resulting in system downtime.
  • There may be faults in the FPGA that cannot be healed without resetting and rebooting the entire FPGA, resulting in system downtime.

[FIF1] E. Kyriakakis, K. Ngo, J. Öberg, “Mitigating Single-Event Upsets in COTS SDRAM using an EDAC SDRAM Controller”, In Proc. of 2017 IEEE Nordic Circuits and Systems Conference (NorCAS-2017), Linköping, Sweden, Oct 24-25, 2017.

[FIF2]  E.  Kyriakakis,  K.  Ngo,  J.  Öberg,  “Implementation  of  a  Fault-Tolerant,   Globally-Asynchronous-Locally-Synchronous,  Inter-Chip NoC Communication Bridge on FPGAs”, In Proc. of 2017 IEEE Nordic Circuits and Systems Conference (NorCAS-2017), Linköping, Sweden, Oct 24-25, 2017.

[FIF3] K. Ngo, T. Mohammadat, J. Öberg, “Towards a Single Event Upset Detector Based on COTS FPGA”, In Proc. of 2017 IEEE Nordic Circuits and Systems Conference (NorCAS-2017), Linköping, Sweden, Oct 24-25, 2017.

[FIF4] Öberg, J., Robino, F., “A NoC System Generator for the Sea-of-Cores Era”, In Proc. of FPGAWorld 2011, Copenhagen, Stockholm, Munich, September, 2011, ACM Digital Libraries.

Method Dimensions
In-the-lab environment
Experimental - Testing, Experimental - Monitoring, Experimental - Simulation
Hardware, Software
Implementation, Architecture Design, Other, System Design, Detail Design
Thinking, Acting, Sensing
Non-Functional - Safety
V&V process criteria, SCP criteria
Relations
Contents

There are currently no items in this folder.