Design for Safety

by Dev Raheja and Brian Moriarty
Washington DC
 


Typical safety analysis is task oriented rather than knowledge oriented, which contributes to many overlooked interactions and higher risks. Also contributing is a poor understanding of the system and inadequate specifications. This article uncovers some of the root causes of unsafe systems and describes how to use this knowledge to develop robust design for safety at a lower cost, as well as a way to perform key hazard analyses to reduce the execution risks and analyze for unknowns to reduce the white-space risks. The emphasis is on eliminating the hazards by design, including hazards created in production.

Design for Safety is Intuitive
Have you ever wondered why some very good designs failed to deliver safety? We can ask the same question about more than 100 recalls of automobiles each year. The reason is that there is hardly any data to analyze, except by intuition based on individual experiences. Therefore, the standard logical analysis, even though it seems acceptable, is an illusion. It leaves out many potential risks that do not show up in the reported data.

This is not to say that logical analysis never represents the reported risks. What is required is a balance between intuition and logic. However, balance does not mean equal weight. It refers to weight based on the need to analyze the right risks. For example, a joint effort by the FAA and a major airline revealed that the airline’s employees knew about 13,000 safety-related hazards. The airline reports showed only 2% of them, and FAA documents showed only 1%. If we had relied on logical analysis alone, we would have covered, at most, 2% of the risks. In this case, the need for intuitive analysis is much higher than 2% of potential risks. The risk of such failures, which are overlooked because of insufficient data or analysis, is called the white-space risk. To minimize such risk, we must make sure that several methods are used in hazard analysis, involve cross-functional teams, and conduct thorough brainstorming.

Task Oriented Instead of Knowledge Oriented
Most safety projects rely on completion of the analysis. If the analysis is insufficient or the right process is not followed, it is frequently overlooked as long as the evaluators do not have deeper wisdom. For example, many large organizations perform fault tree analysis primarily to conduct probabilistic risk analysis, while a goldmine of clues on design improvements is not even discussed. This is true of many other hazard analyses. Often a task is scheduled too late, when major design changes would require drastic budget changes, and such risks obviously have to be accepted. The lesson to be learned is that we should rely on the thoroughness of analysis in making design improvements at the concept stage of the system. It is most cost effective to make design changes at this early time in the cycle.

"Therefore, the standard logical analysis, even though it seems acceptable, is an illusion."


Some Hazards May Be Acceptable
If a brake on a truck fails, is it okay? It may be, as long as the brake system gives advance warning (such as audible degraded performance) and lets the driver complete the mission safely. The warning should occur sufficiently in advance so that the driver can fix it before the next trip. The cost of replacing a failed component is minor, whereas the cost of a mission failure is often enormous. NASA was shut down for almost three years after the Space Shuttle Challenger accident. The cost of the seal that failed was minimal, but the cost of losing the entire shuttle, along with the astronauts, and the cost of shutting down the entire program, was beyond the logical analysis.

Poor Understanding of the System is Not an Excuse
Experience shows that almost all performance specifications are incomplete or vague, and the product functions are often vaguely defined. Often there is nothing in the specification about modularity, complex safety interactions, serviceability hazards, logistics hazards, human errors and diagnostics hazards. Very few specifications address requirements such as internal interface, external interface, user-hardware interface, user-software interface, or how the product should behave when a sneak failure occurs. Those who are trying to build safety around a faulty specification can only guarantee frequent mission failures. Unfortunately, most organizations think of hazard analysis when the design is already approved. At this stage, there is no budget or time for major changes. Therefore, writing an accurate performance specification and a thorough requirements analysis are the prerequisites for a safe design.

Safety Definition Must Include “Shall Not”
In ensuring the accuracy and completeness of a specification, only those who have the knowledge of what makes a good specification should approve it. They must ensure that the specification is clear on what the product should never do — for example, “The SUV shall not roll over in case of a component failure or low tire pressure,” or “There shall be no sudden acceleration in the cruise control.”

In addition, the marketing and service experts should participate in the process to make sure that “unknown unknowns” are identified through brainstorming and interviews with operating personnel. Attention should be paid to modularity to minimize the complexity. Complexity creates more unknown hazards. For example, GM is designing a hydrogen car to have one chassis for all models, instead of 80 different chassis as is the case with current production. Hazard analysis should also be performed by service engineers to ensure safety of the repair crew, and that no workmanship hazards (such as installing a wrong component or leaving out a lock-washer) are introduced. The specification should be critiqued for quick serviceability and ease of access to make the task pleasant and less prone to errors. Until the specification is thoroughly written, no design work should begin.