Catastrophic Single-Point Failures

by Ted W. Yellman
Bellevue, Washington


Examples of Soft CSPFs

These are not just academic issues. Here are some practical examples of soft CSPFs:

(1) If an airplane’s anti-icing system fails, ice can build up, which causes the airplane to lose lift – but only during the 5% or so of flights that encounter atmospheric icing conditions.

(2) A short circuit can cause a spark in a fuel tank, which can cause it to explode – but only if the fuel-air mixture in the tank is in a flammable (meaning primarily a rich enough) condition, which may happen only 1% of the time.

(3) If an uncontained engine failure occurs on one engine on a two-engine airplane, a fragment can damage the other engine – but only if the direction of the fragment is within a particular small (e.g., 3-degree) angle.

(4) All propulsive thrust can be lost if an airplane’s engines are starved of fuel – but that will cause a catastrophic accident only if the airplane is far enough from a suitable airport that the crew cannot glide to a successful landing. (Airplanes have been able to do that successfully in a surprisingly high percentage of cases. Nevertheless, to my knowledge at least, that benign possibility has always been neglected in probabilistic analyses.)

Many more examples of soft CSPFs could be cited. In each case, only one primary material failure is involved, but the probability of a resulting catastrophic accident is significantly reduced by the necessity of either an unfavorable condition or an additional failure with a probability substantially below 100%.

Whether soft CSPFs should fall under the same prohibitions as hard CSPFs depends primarily on the value of the conditional probability (that is, the probability given the failure of interest) of the additional event or events required to cause a catastrophic accident. There is no generally agreed-upon threshold at which this probability should be assumed to be the worst-case 100%. The determination must depend on a judgment about whether the conditional probability is sufficiently lower than 100% to bother taking credit for the difference. If the probability is 90%, it may not be, and the failure of interest can be allowed to fall under the CSPF prohibition. But if the probability is 1%, it probably is worth trying to take credit for that, arguing that the CSPF prohibition shouldn’t apply, and proceeding with a probabilistic risk analysis.

Incidentally, similar prohibitions have sometimes been encouraged or imposed on so-called "latent" failures, meaning failures that can remain undetected for a long time. Nobody is saying that such failures are a good thing, but to simply assume that all possible undetectable failures already exist can also lead to extremely pessimistic analysis results. As is the case for CSPFs, sound general risk-analysis principles and criteria should preclude the need for "special case" assumptions for undetectable failures.

Conclusion
Prohibitions against catastrophic single-point failures are not based upon either the probabilities of or the expected losses caused by those failures being essentially different than for multiple failures. Rather, those prohibitions are based upon greater levels of uncertainty (usually in the minds of government regulators and contracting agencies) that claimed catastrophic single-point failure probabilities and the expected-loss estimates resulting from those probabilities will be correct.

Ideally, the same general principles, including uncertainty considerations, should be applied to single-point failures (catastrophic or not) as to multiple-point failures. However, that can come about only if risk analysts and managers become more aware and respectful of the uncertainty facet of risk. As a practical matter, in the case of catastrophic single-point failures, that will mean recognizing uncertainties as part of the process of determining probabilities – and appropriately adjusting those probabilities upward as uncertainties increase.

I have to say, however, that I suspect that the catastrophic single-point failures prohibition will be around for a long time to come. Perhaps the best we will be able to do in the near term will be to try to prevent it from causing too many unrealistic and wasteful decisions.

About the Author

Ted Yellman has 43 years of experience in system design, reliability, components, safety, regulatory engineering, and engineering-assurance management in several aerospace and electronics companies. His primary interests are in the areas of risk criteria, risk analysis and risk management. 

Yellman is the originator of Event-Sequence Analysis. Offering an alternative to fault-tree and Markov-chain analyses, this method makes it easier to understand and account for events that may have a common cause or that may cause one another.

Article References


1. Yellman, Ted W. "Failures and Related Topics," IEEE Transactions on Reliability, Vol. 48, No. 1, March 1999, pp. 6-8.

2. System Design and Analysis. Federal Aviation Administration Advisory Circular (AC) 25.1309-1A, June 21, 1988.

3. System Design and Analysis. Draft for Advisory Joint Material AC/AMJ No. 25.1309, November 23, 1997.

4. Code of Federal Regulations, Title 14 (Aeronautics and Space), Part 25 (Airworthiness Standards: Transport Category Airplanes), January 1, 2000 Revision.

5. Clemens, Pat L. "From Our Readers," Journal of System Safety, Q3 2002, p. 5.

6. Yellman, Ted W. "The Three Facets of Risk," SAE/AIAA 2000 World Aviation Conference, San Diego, October 10-12, 2000.

7. Yellman, Ted W. "Redundancy Killers," Proceedings of the SAE 1998 Advances in Aviation Safety Conference, April 6-8, 1998.