FMEA – A powerful method, but not for software!
In the functional safety, there is a method which is always used – the FMEA (Failure Mode Effects Analysis).
In particular, on system and hardware level the FMEA supports systematic analysis. There are also variants such as the FMECA and the FMEDA. In this blog post I use only the term FMEA.
In project practice very often the question is raised, whether there is also a Software FMEA needed.
I will first explain the meaning and purpose of system and hardware FMEA in fulfilling the requirements of functional safety standards such as ISO 26262 or IEC 61508. Thereafter I will consider the possible needs to perform and Software FMEA.
System Level:
The name Failure Mode Effects Analysis (FMEA) is almost self-explanatory. First of all conceivable faults of a system are systematically written down in the form of tables. In the second step, the impact of these errors on the system under consideration are analysed and documented in the table. For unacceptable error effects there are, in a third step, appropriate countermeasures defined.
In a second analysis the system architecture is analysed. The goal is to identify weaknesses of the design.
Both analysis result in countermeasures to be implemented in order to reduce the possible risk down to a tolerable risk level. For some cases there are also error conditions for which no countermeasures can be implemented. In these cases, a more extensive risk management is then required. This is not the focus of this blog post.
Hardware Level:
At the hardware level, there are the failures of individual components in the focus of the analysis. Two main results are obtained by this analysis:
1) Failure rates
2) Requirements to the Power-On-Built InTests, the Continuous Built-In Test and Maintenance Built-in test
This analysis thus provides a significant contribution to the control of random error in a system.
Software Level:
How does it look like at software level? Is a FMEA useful or even required by the functional safety standards?
A demand for a software FMEA can’t be derived from the standards in any way. Why is that? As already described on the hardware and system level, the goal of an FMEA is, possible errors and their impact systematically to explore and analyze.This is no problem if the functionality is analyzed. Also on hardware level, this analysis makes sense, because the failure of components is based on physical effects and thus fault probabilities can be determined using statistical methods.
On software level it is different. Software errors are implemented unintentionally by the programmer. All failures that can lead to such an incorrect programming are caused by humans. Until today there is no way known, how the error probabilities of humans in software development can be calculated.
Of course it is possible to systematically analyze software and using the FMEA as a method. This would mean that e.g. requirements and source code would be analyzed accordingly.
In fact, the standards of functional safety require such analyzes. These are called requirements reviews and code reviews.
However, the complexity and scope of requirements reviews and a code reviews are very high. It is simply not practical and wise to carry out a FMEA for requirements and code reviews.
In addition, the standards require additional measures, like a plurality of dynamic tests that, by definition, can’t be covered with an analysis. For these reasons it can be seen that an FMEA in the software is not useful.
Added to this is that it is impossible to define countermeasures for software errors. This is because software errors incorporated by human error in the system and not, as in the hardware are based on physical effects. For the prediction of the number, frequency and severity of human error, there is still no good enough, mathematical models.
Conclusion:
It can be summarised that a software FMEA is not effective if it is intended to replace the in the functional safety standards defined methods and processes of the software development process.
My experience is that the term software FMEA often is used if actually a functional system FMEA is meant. This is due to the fact that most of the functionality is realized ins software by modern embedded systems. Therefore you can get the feeling, that you carry out a software FMEA, when you actually analyse the system functions.
In addition, a FMEA can also be used to analyse processes. If you want to analyse the vulnerability of a software development process, you can perform a so-called process FMEA. In this case, the goal is “only” weak points of the process to identify. Methods and processes of functional safety standards are not replaced by such an FMEA.
Are you ready for a functional safety workshop, to analyse improvement potentials in your development process, then send a mail to: info[at]heicon-ulm.de or call +49 (0) 7353 981 781.
Good explanation
A software FMEA is a test to whether the functions of the code withhold when changes to specifications occur, just as with hardware. For example:
a) Mechanical attaching part with the fatigue failure mode may result in reduced probability of securing attached panel. Whereas:
b) Software function with ‘delayed query time’ failure mode may result in the loss of a function altogether being applied or incorrect display of information.
Errors in the code, bugs really, are a result of improper specification and in my opinion no different to a mechanical part design with insufficient strength to cope with the intended stress.
Bugs in software and analogous to mishaps during the manufacturing process of a mechanical part.
let me disagree with your opinion. FMEA of Software combined with STPA rules and statistic analysis is a very powerfull method to find out potential failure causes. Point of interest are not “simpe” causes like “data not available” but aspects of software design stability and protection against butterfly effects. You are right in case of simple control functions, but in case of high complicated algorithms it is irreplaceble. According to my analysis results, Software complexity has increased about 50 times in last 20 years and this process is not to be stopped.
Kind regards
Gregor