Amazon's massive AWS outage was caused by human error

Adjust Comment Print

Though the majority of sites affected have since gone back online, some appear to still be facing issues.

According to a postmortem the company's cloud services business published Thursday, around 9:37 a.m. PT Tuesday an Amazon worker incorrectly punched in a command while trying to debug an issue.

The Amazon Simple Storage Service and Amazon Web Services, also known as S3 and AWS, happen to power a lot of the internet as you know it.

When the system was down, websites could not access the photos, logos, lists or data they normally would have pulled from the cloud. Used by more than 150,000 Web sites, S3 is designed for up to 99.99 percent availability. "While these subsystems were being restarted, S3 was unable to service requests". That command was part of an established Amazon playbook.

"This will prevent an incorrect input from triggering a similar event in the future", AWS said in the post.

Threatening Clean Water for Millions, Trump Signs Off on Polluter Giveaway
Critics contend the rule crafted by the Obama administration could apply to ditches and small isolated bodies of water. But opponents of the law feel President Trump's action is a relief to what they deemed a burdensome regulation.

"Unfortunately", as AWS put it in its lengthy mea culpa, a technician made a mistake when entering a command, taking out more servers than needed - some of which were critical to the functioning of S3 in the entire region.

An index subsystem that "manages the metadata and location information of all S3 objects in the [Virginia data center] region" was one of the two affected, Amazon wrote. The other manages where new items are stored. When the system shut down, something that AWS says it hasn't had to do in many years, it became a victim of its own success. Removing a significant portion of the capacity caused each of these systems to require a full restart. That estimate doesn't include countless other businesses that rely on S3, on other AWS services that rely on S3, or on service providers that built their services on Amazon's cloud. S3, the launch of which in early 2006 helped start the cloud-computing revolution, is Amazon's largest and most-utilized service. "S3 has experienced massive growth over the last several years and the process of restarting these services and running the necessary safety checks to validate the integrity of the metadata took longer than expected", the company said.

That command had been meant to fix a billing system.

In its most recent quarterly financial report issued in early February, Amazon said its AWS operating income for the 12 months ending December 31 amounted to $3.1 billion, compared to $1.5 billion for same 12 months in 2015.