The planemaker is completely redesigning its software in light of the new problem.
The Federal Aviation Administration (FAA) has found a new complication in the 737 MAX during its rigorous testing of Boeing’s proposed fixes to the aircraft. Boeing is responding by making a fundamental change to its software.
Two flight control computers are standard on the 737 MAX under the current system—which is decades old. But the automated flight control system will only pull data from one computer at a time: it will take input from one computer on one flight and switch to the other computer for the next flight.
Boeing is altering the system’s software so that it will take input from both flight control computers simultaneously. The new software architecture will employ a two-channel “fail-safe” system where each flight computer will operate using input from an independent set of sensors that measure angle of attack, altitude, air speed and other variables. This way, should one computer or sensor fail or malfunction, the other computer and sensors can continue operating.
The aircraft manufacturer’s proposed fix will not only address the new glitch—it will also strengthen the safety and reliability of the upgraded Maneuvering Characteristics Augmentation System (MCAS). A malfunctioning MCAS caused the two deadly crashes that have grounded the 737 MAX worldwide since March.
The FAA found the new glitch while testing Boeing’s proposed solutions for the flawed MCAS—specifically, by running simulations that replicated the scenarios that led to the fatal crashes.
It is a glitch that occurs when bits in the microprocessor randomly switch position between 0 and 1. This is a known glitch—some have attributed it to cosmic rays hitting the circuitry; especially at high altitudes where the rays are stronger and could affect sensitive airplane electronics. When a neutron hits the microprocessor, its minute electric charge could flip a bit from off to on (or vice versa). So, even though the software code is still correct, the computer’s output is altered by the wrong bit. Hence, a value of “off” on a bit might tell the computer the MCAS is engaged, while an “on” would not.
Plane manufacturers and regulators are aware of this phenomenon, and aircraft makers are supposed to plan for it when developing airplane electronics. “There are active means of protecting against bit flips,” said former Boeing electronics manager Dwight Schaeffer. “We always built it into our own software.”
How likely is it that a cosmic ray can bring down a plane? It’s highly unlikely that such an event will occur. In over 200 million flight hours on the same computer system in older 737 NGs, it never happened. But those planes weren’t equipped with the MCAS.
FAA test pilots ran simulations that deliberately assigned corrupted bits to the MCAS—telling the computer that the MCAS was engaged when it actually wasn’t. One of the three pilots testing the system was unable to restore steady flight and lost the aircraft.
This upgraded the failure mode from a “major fault” that a flight crew could handle, to “catastrophic.” And FAA regulations require that no single fault could lead to a catastrophic result. This meant that Boeing had to fix the cosmic ray problem.
To Boeing’s credit, it has taken extra steps to address the problem. Boeing could have just rewritten the software while maintaining the one-computer-per-flight approach. This was Boeing’s original fix for the MCAS: the improved MCAS would have taken input from two angle of attack sensors instead of one—but that information would still have funneled through only the one computer.
Instead, company engineers have redesigned the software entirely—creating a new design that not only safeguards the system against random bit flips but also improves the MCAS software. By requiring that the flight control system take simultaneous input from both computers and compare them, it builds redundancy into the system.
If the outputs disagree, that would indicate to the system that there is a computer fault—and instead of taking automatic action—as when the MCAS kicked in during the fateful Lion Air and Ethiopian flights—the system would notify the pilot of the fault and let the pilot fly manually and make the decision.
But while Boeing is going bove and beyond what is required by regulators, it should have discovered this flaw when the 737 MAX was originally being designed. It also means that for years the 737 MAX was flying with no redundancy for its flight control computers—and it took two crashes and the global grounding of the fleet to finally diagnose and fix the problem.
FAA Administrator: 737 grounded “till I’m assured it’s safe.”
Boeing aims to have the new software architecture ready for testing in late September, and the company hopes to regain certification in October. In the meantime, the aerospace giant and its client airlines continue to fret over lost business. Boeing has already slowed production of the 737 MAX, and is considering stopping production entirely as it runs out of parking space on company tarmac. CEO Dennis Muilenburg admitted on the company’s most recent earnings call back in July that those delays are likely to continue even after the 737 takes to the air again.
Encouragingly, though, it looks like Boeing may have learned its lesson and is taking the steps necessary to make the 737 MAX safe to fly again.
“This is a huge deal,” said Peter Lemme, a former flight controls engineer at Boeing. “I’m overjoyed to hear Boeing is doing this. It’s absolutely the right thing to do.”
Read more about the 737 MAX tragedies at Boeing 737 MAX Pilots Had No Idea What They Were Up Against.