The real reason airline computers crash
Note: this is a reprint of August 8, 2016, CNN article. The original article can be found here.
Why do computer crashes keep bringing major airlines to their knees, leaving hundreds of thousands of passengers stranded at airports? Human error. Mistakes. Good old fashioned screw ups.
That's the explanation offered by airline computer experts Monday, after Delta Air Lines scrambled to deal with a huge computer snafu. The world's second largest airline was forced to delay of all its flights on the ground for at least six hours worldwide.
Delta (DAL) blamed the problem on a power outage. But that alone should not have brought the system down -- there are backups that should have kept Delta's system up and running.
"You're basically saying, 'We had power failure in a location but unfortunately we were unable to continue operations from a secondary data center despite the fact that we spent hundreds of millions of dollars on it.," said Gil Hecht, founder and chief executive of Continuity Software, an expert in computer disaster recovery.
That is essentially an admission of human error, added Hecht.
A Delta spokesman wouldn't comment on whether the airline had a backup independent power supply -- but experts told CNN that it's certain the company has one.
It's not just Delta. These glitches happen a lot.
Monday's computer crash came about three weeks after a computer outage at Southwest Airlines (LUV), which led to the cancellation of more than 1,000 flights. In May, JetBlue (JBLU) computer issues forced passengers to be checked in manually at some airports. Computer problems delayed United Airlines flights worldwide in 2015.
Why do these airline computer failures keep happening?
"Complexity in the data center gets out of hand," Hecht said.
The airlines, by building layers upon layers of systems -- each one with a different configuration and a different purpose -- accidentally create the threat of something going down in their computer networks.
"Somehow, someone created a threat in the Delta Air Line situation that caused their disaster recovery not to work. How do I know it? Because their disaster recovery system should have worked. And it didn't."
Airline experts say there are three reasons why systems go down.
1) No redundancy. An airline might have chosen not to protect itself with a backup system. That's unlikely for a major carrier like Delta.
2) Hacking. The crash was caused by a malicious attacker. That's not likely the cause of Monday's Delta computer failure, said Hecht, because a malicious hack into Delta's system would probably have been isolated and the system would have been brought back to life more quickly, he said.
3) Human error. Layers and layers of systems that pile up over time create some kind of glitch and suddenly the whole thing comes crashing down. That's the most likely explanation for Delta.
Meanwhile we can't go back to paper check-ins during an emergency like this. It's just not feasible anymore, experts say -- especially on international flights -- because airline computers are linked to security networks like government No Fly Lists and visa document systems.
So what can airlines do to prevent these computer failures from happening so often?
They can install more automated checkup systems. They can perform emergency drills by taking their systems offline during slow periods and going to their secondary and backup systems to make sure they are working properly.
In The News is brought to you by WinMill Software, the premier resource for systems development and integration, expert consulting, quality assurance, technology infrastructure, and software resale. For more information, contact a WinMill Account Manager at email@example.com or 1-888-711-MILL (6455).