With three outages in the month of December, AWS has suffered some real setbacks recently. These outages are not just affecting the small business owners that traditionally sell on Amazon, they affect aggregators as well as large businesses like McDonald’s and Slack.
In what has been a weekly occurrence in December 2021, AWS experienced another outage that brought down major apps and services like Slack, the Life360 location tracking platform, Grindr, the McDonalds App, and the Epic Games Store, along with popular games like Fall Guys.
The timing of the outage couldn’t have been much worse for east coast Slack users, as reports began spiking just before 7 AM EST when many were logging on for the day. Within minutes, AWS’ status page was updated to confirm that the company had detected a power outage “within a single data center within a single Availability Zone (USE1-AZ4) in the US-EAST-1 Region.”
Despite the relatively localized nature of the blackout, the impact on Amazon’s Elastic Computing (EC2) service was fairly catastrophic over the short term, with AWS recommending any able customers should be “failing away” to unaffected Availability Zones.
Restoration efforts got underway quickly, and AWS reported progress on restoring power within 18 minutes of its initial confirmation. However, the restoration process remained ongoing for several more hours until AWS noted that power had fully been restored at 9:51 AM EST. Even then, the company’s support page warned that some network connectivity issues continued to linger for a portion of impacted EC2 instances. It also revealed that some customers of its EBS storage service were impacted by “degraded IO performance” during the outage.
Despite the apparent pattern of outages cropping up during December, the three incidents have very little in common. The first on December 7 also impacted the US-EAST-1 Region, but it related to an automated network capacity scaling issue rather than a power outage. The second downtime happened all the way across the country in the US-WEST-1 and WEST-2 regions and was related to a network connectivity issue.
Whether there is an underlying reason for the sudden spike in AWS downtimes, or the company has just had a run of extremely bad luck remains to be seen. However, the apps, services, games, and websites that have come to rely on AWS for their own stability are almost certainly beginning to take a long, hard look at the impact these outages are having on their own bottom lines.