What went wrong with Challenger, Mars Climate Orbiter and Columbia.
Itâs an unfortunate truth in quality assurance and control (QA/QC) that the only time most people recognize the importance of the job is after something has gone horribly wrong. Like air traffic controllers and IT departments, quality professionals tend to garner the most attention when disaster strikes and everyone is looking for someone to blame.
But some of the biggest engineering failures in history had nothing to do with QA/QC. If youâre looking for examples, NASA has plenty.
Â
1. Space Shuttle Challenger
On Jan. 28, 1986, the Space Shuttle Challenger broke apart 73 seconds into its flight, resulting in the deaths of all seven crew members. The spacecraftâs disintegration was caused by the failure of an O-ring seal in its right solid rocket booster at liftoff, which took place under unusually cold conditions for which the shuttle had not been certified.
One of the engineers working at Morton Thiokol, the company which manufactured the solid rocket boosters, wrote a letter to the companyâs vice president anticipating the disaster in July 1985.
âThe mistakenly accepted position on the joint problem was to fly without fear of failure and to run a series of design evaluations which would ultimately lead to a solution or at least a significant reduction of the erosion problem,â wrote Roger Boisjoly. âThis position is now drastically changed as a result of the SRM 16A nozzle joint erosion which eroded a secondary O-ring with the primary O-ring never sealing.â
âIf the same scenario should occur in a field joint (and it could), then it is a jump ball as to the success or failure of the joint because the secondary O-ring cannot respond to the clevis opening rate and may not be capable of pressurization. The result would be a catastrophe of the highest order – loss of human life.â
Boisjolyâs warning went unheeded.
He later published a report entitled Ethical Decisions â Morton Thiokol and the Challenger Disaster. In it, Boisjoly describes the engineering presentation made during a teleconference between Morton Thiokol, the Kennedy Space Center and the Marshal Space Flight Center the night before the launch.
â[After the presentation,] Joe Kilminster [vice president of the rocket booster program] asked for a five-minute, off-line caucus to re-evaluate the data and as soon as the mute button was pushed, our general manager, Jerry Mason, said in a soft voice, âWe have to make a management decision.â I became furious when I heard this, because I sensed that an attempt would be made by executive-level management to reverse the no-launch decision,â wrote Boisjoly.
Boisjolyâs intuition proved correct, but the moral of this story is not about a team of engineers being overridden by a single manager. Boisjoly actually emphasized that it was the managerial caucus responding to NASAâs pressure to launch which led to the reversal of the recommendation.
âThe caucus constituted the unethical decision-making forum resulting from intense customer intimidation,â Boisjoly wrote. âNASA placed MTI in the position of proving that it was not safe to fly instead of proving that it was safe to fly. Also, note that NASA immediately accepted the new decision to launch because it was consistent with their desires and please note that no probing questions were asked.â
This disaster didnât happen because of a faulty O-ring. The part performed exactly according to its design specifications and if the shuttle had been launched on a warmer day it wouldnât have been a problem. In other words, the Challenger was not a QA/QC failure.
Engineers are under enormous pressure to complete jobs under budget and ahead of schedule, which can lead to the temptation to think like a manager first and an engineer secondâor not at all. The Challenger represents the severe consequences of giving in to that temptation.
Â
2. Mars Climate Orbiter
Sometimes the stakes arenât quite as high as the loss of human life.
In the case of the Mars Climate Orbiter, the cost was $327.6 million. The robotic space probe was launched on Dec. 11, 1998 to study the Martian climate, atmosphere and surface changes as well as act as a relay for the Mars Polar Lander.
On Sept. 23, 1999, communication with the spacecraft was lost during its orbital insertion.
An investigation revealed that the spacecraftâs altitude was significantly lower than the intended 150-170 km. Post-failure calculations demonstrated that the spacecraftâs trajectory would have taken it within 57 km of the surface, where it most likely disintegrated from atmospheric stresses.
The cause of the error turned out to be a discrepancy between Lockheed Martinâs software, which generated results in U.S. customary units, and NASAâs, which was designed to accept metric units. Consequently, outputs in pound-seconds were taken as inputs expected in newton-seconds.
This discrepancy in calculation resulted in a discrepancy between the target and the actual orbit insertion altitudes. And so a spacecraft that cost $193.1 million to develop, $91.7 million to launch and $42.8 million to operate burned up in the Martian atmosphere because of a failure to convert one unit of measurement to another.
The lesson hereâaside from the obvious, which is always check your mathâlies in the importance of oversight and the value of communication between customers and providers. After countless cycles of design, development and testing, itâs easy to assume that all the obvious errors have been caught. As the Mars Climate Orbiter illustrates, sometimes itâs the simplest errors that slip through.
However, despite the simplicity of the error, it did not occur because of a failure in QA/QC. The quality professionals who reviewed the Lockheed and NASA software would have found that both were working perfectly. The real problem was interoperability, an issue that remains a major concern across industries today.
Â
3. Space Shuttle Columbia
The second space shuttle disaster occurred when the Columbia disintegrated over Texas and Louisiana during re-entry on Feb. 1, 2003. During the shuttleâs launch, a piece of foam insulation from the external fuel tank broke off and struck its left wing. When the Columbia re-entered the Earthâs atmosphere, the damage allowed plasma and superheated gases to penetrate the internal wing structure, destroying it and destabilizing the spacecraft.
Post-accident investigations concluded that mistakes during installation were the likely cause of the foam breaking. This resulted in the employees at the Michoud Assembly Facility in Louisiana being retrained in how to apply foam to the fuel tanks.
However, the cause of the accident goes much deeper than incorrectly applied insulation. The Columbia Accident Investigation Board (CAIB) identified fundamental organizational and structural issues in NASA which compromised the safety of shuttle missions.
For example, the shuttle program manager was responsible for achieving safe, timely launches at acceptable costs. The fact that these goals are rarely complimentary indicates another source of failure that goes beyond QA/QC.
Safety, timeliness and cost-effectiveness are universal engineering values, as important to manufacturers as they are to NASA. However, these values are often in tension with one another. Balancing them requires careful negotiation and putting that balancing act in the hands of a single person is a recipe for disaster more often than not.
Â
From the Launch Pad to the Shop Floor
The stories of the Challenger, Mars Climate Orbiter and Columbia illustrate how high the price of failure can be in engineering. More importantly, they show that engineering catastrophes are not always the result of failures in QA/QC.
Do you have other examples of engineering disasters that had nothing to do with quality? Comment below.