|
Paradigm 3: Safety Must Always be a Business Case
Our experience with safety shows that the pay-off on safety investment is always high. An organization designing transmissions for heavy-duty trucks requires 500 percent return on investment, even on safety, to encourage robust designs. It does system hazard analysis before approving the specifications and requires design mitigation for all catastrophic hazards [Ref. 1].
The trick is to make all big safety changes early, during the concept stage, when the cost of change is the lowest.
Paradigm 4: Don't Pay Attention to Probability of Catastrophic Accidents — Just Design Them Out
This organization designs all critical components for at least twice the expected life and safety-related components for three times the life. The need for computing the probability of catastrophic accidents does not arise. Of course, sometimes there are exceptions for new technologies. In the case of software, it tries to mitigate risk through redundancy or prognostics.
Why twice the life? The simple answer is that it is cheaper than designing for one life. It requires understanding of lifecycle costs. Imagine a bridge designed for 20-ton trucks. It may have no problems in the beginning, but the bridge is degrading over time. After five years, it may not be strong enough to take even 15 tons, and it is very likely to collapse. If it was designed for 40 tons, it will be safe. This is the same as the 100 percent safety margin we were taught in engineering long ago. For the same reason, components in the aerospace industry are de-rated 50 percent.
The challenge is to get twice the life without increasing the size or weight of the components, which are the main cost drivers. There are many examples of twice-the-life design without changing the size or weight. In a shift-key assembly for an automotive transmission, the life was increased several-fold by using a different method of heat treating. The life increased further by using the cheaper round key instead of the rectangular key. The round key has practically no stress concentration points. In another case, twice-the-life was achieved by molding two parts as a single piece, preventing the stresses at the joint. The cost was lower because no assembly was required, there were fewer part numbers in the inventory, there were no failures, and there was no downtime for the customers. Such a robust mitigation of risks always makes safety a good business case.
The precedence for mitigating risk during the concept stage should be:
- Change the requirements to avoid the hazard
- Introduce fault tolerance
- Design to complete the mission safely
- Provide early prognostic warning
Paradigm 5: Perform Production Hazard Analysis
More than 80 percent of problems in the first year of deploying a new system come from production errors and variation, sometimes referred to as Infant Mortality in reliability. Production hazard analysis is not the same as process hazard analysis, in which most analysts look for hazards to the process. What we need is something similar to Process FMEA tailored to discovering the hazards that can result in accidents later in actual use, repair, logistic support and maintenance.
Paradigm 6: Design for Safety Prognostics
In complex systems such as telecommunications and fly-by-wire systems, many of the system failures do not result from component failures. They often result from complex interactions and sneak circuits, as well as the system-of-systems. Failure rates are difficult to predict. We need innovative tools for discovering hidden problems, which usually turn up in rare events, such as the deployment of air bags, or a scenario in which a fireman may come in contact with high-voltage battery terminals. Questions are asked such as, "Will the air bag open when it is supposed to?" "Will it open at the wrong time?" "Will the system give a false warning?" or, "Will the system behave fail-safe in the event of an unknown fault?" The bottom line is that no matter how much analysis we do, it is impossible to analyze millions of combinations. The following data on a major airline, announced at an FAA/NASA workshop [Ref. 2], shows the extent of unpredicted failures:
- Problems reported confidentially: Approximately 13,000
- Number of problems actually in airline files: Approximately two percent, or 260
- Number of problems known to FAA: Approximately one percent, or 130
Sneak failures are more likely to exist in embedded software, where it is impractical to do a thorough analysis. Frequently, the specifications are faulty because they are not derived from the system performance specification. Peter Neumann, a computer scientist at SRI International, highlights the nature of damage from software defects in the last 15 years [Ref. 3], including:
- The wreck of a European satellite launch
- The delay in the opening of the new Denver airport by one year
- The destruction of a NASA Mars mission
- The inducement of a U.S. Navy ship to destroy an airliner
- The shutdown of ambulance systems in London, leading to several deaths
To counter such risks, we need an early prognostic warning — early enough to prevent a major mishap. This would consist of all the possible mishaps, and designing intelligence to detect unusual behavior of a system. The intelligence may consist of measuring important features and making decisions about their impact. For example, a sensor input occasionally occurs after 30 milliseconds instead of 20 milliseconds, as the timing requirement states. The question is, "Is this an indication of a pending disaster?" If so, the sensor should be replaced before the failure manifests itself to a critical state.
« previous page | next page »
|