The Columbia Disaster: A Case Study for System Safety Analyses?

by Niles T. Welch, CSP
 

All of us mourn the loss of the Columbia crew - so close to touchdown, yet an eternity away from landing. No one thought there was any real danger to the shuttle. No human lives had ever been lost on re-entry before. This time wouldn't be any different - or so we thought.

Now that the cause of the accident is being investigated, theories abound as to what may have been less-than-optimal decisions that eventually contributed to this catastrophe and the loss of seven lives. One such theory surrounds the original choice of a heat protection method to protect the shuttle upon re-entry into the earth's atmosphere at Mach 20. The options originally considered were ceramic tiles, a thermal blanket, and titanium, as reported on television. It was said that tiles were chosen because they were the least costly.

One of the major drawbacks considered in the use of tiles was that any damage to the tiles incurred during or after takeoff could not be repaired in space before re-entry. Presumably, the other two options, although more expensive, would have been less susceptible to non-repairable damage, and less vulnerable to cascading into a catastrophic failure.

In hindsight, perhaps one of the other choices would have provided a better, safer environment. As system safety engineers and practitioners, we know that the probability of a catastrophe occurring should not outweigh the end consequences. Even if the probability of the event happening were 1 in 10-12, the high consequence of such a failure - the failure of the heat protection system, for example, and then the failure of the shuttle and the loss of all lives on board - should not be allowed to happen. The design must be weighed again the consequences. In retrospect, the fault tree examining and evaluating the thermal protection design should have had the following branches:

These three design options should have been carefully evaluated to determine all the ramifications of one design choice over the other for performance, producibility, repairability and safety. One additional criterion should be included:

Consequences - and not just the probability of an occurrence - must be factored into every system safety and design decision. "Highly improbable" should not be the reason to certify safety. The final consequences must be weighed before a design is considered safe. 


 

 

 

Navy to Hold Software Safety Summit

 

The Naval Ordnance Safety and Security Activity is planning to host a Software Safety Summit on April 1 - 3 in Dam Neck, Virginia Beach, Virginia, U.S.A. Intended for software safety professionals in both government and industry, this summit plans to:

  • Review software safety methodologies that are currently in use
  • Establish a methodology for standardized software safety certification within the U.S. Navy
  • Ensure that, when used correctly, software can be confidently certified as safe
  • Receive endorsement from the Navy that the proper level of safety has been applied to software development

The second and third days of this Summit will feature workshops geared toward developing a software safety methodology framework that can become the Navy's recognized standard for software safety.

Location
The Summit will be held at the Sea Breeze Officer's Club, Fleet Combat Training Center Atlantic, Dam Neck, Virginia Beach, Virginia.

Registration
The summit is free, but registration is required. Registrations must be received by March 12, 2003. Interested persons are invited to register at the Summit's web site:
http://www.ih.navy.mil/summit/

For more information, contact the Event Coordinator, A. Walko:
Phone: 301-753-5600 x105
E-mail: AWalko@aot.com