Apps like Snapchat, Reddit, and Venmo among thousands affected as Amazon’s cloud arm suffers latest U.S.-East-1 data center failure
Amazon.com’s cloud computing unit, Amazon Web Services (AWS), said on Monday that its systems had returned to normal operations after a major outage disrupted thousands of websites and applications worldwide, halting operations for businesses and individuals from London to Tokyo.
The outage — one of the most significant since last year’s CrowdStrike malfunction crippled hospitals, banks, and airports — exposed once again the fragility of global internet infrastructure dependent on a handful of major cloud providers.
AWS, which hosts critical services for companies and governments around the world, said some systems such as AWS Config, Redshift, and Connect still had a backlog of messages that would take several hours to clear even after full restoration.
“All AWS services returned to normal operations,” the company said in a statement issued shortly after 3 p.m. PT (2200 GMT). “Some services continue to process a backlog of messages over the next few hours.”
Root Cause: Network Health Monitoring Subsystem Failure
AWS traced the root cause of the outage to a fault in an internal network health monitoring subsystem tied to its Elastic Compute Cloud (EC2) service. The issue disrupted traffic routing through network load balancers, preventing applications from finding the correct addresses for AWS’s DynamoDB API, a key database service that stores user and transaction data.
The malfunction originated in AWS’s US-EAST-1 data center cluster in northern Virginia — its largest and oldest web services site, and the same region responsible for major outages in 2020 and 2021.
Amazon declined to comment on why the region remains prone to such failures, though experts said its heavy usage and role as the default setting for many AWS services make it a single point of potential overload.
Wide-Scale Impact: From Banks to Apps and Gaming
The outage caused cascading failures across finance, communication, and entertainment platforms, according to internet monitoring firm Ookla, which reported more than 4 million user complaints.
Popular apps including Snapchat, Reddit, Roblox, Duolingo, Signal, Coinbase, and Robinhood experienced widespread downtime. Even Amazon’s own products — Prime Video, Alexa, and its main shopping website — were hit.
Gaming services such as Fortnite, Clash Royale, and Clash of Clans also reported disruptions, while ride-hailing platform Lyft briefly went offline in the United States.
In Britain, Lloyds Bank, Bank of Scotland, Vodafone, BT, and the HMRC tax portal were among those affected, according to Downdetector UK.
“This outage once again highlights the dependency we have on relatively fragile infrastructures,” said Jake Moore, global cybersecurity advisor at ESET.
Experts Warn on Cloud Dependency
Cybersecurity analysts say the incident underscores the risk of overconcentration in cloud computing, as companies increasingly rely on a few dominant providers — Amazon, Microsoft Azure, and Google Cloud.
“When people cut costs and skip steps that ensure redundancy, these are the consequences,” said Ken Birman, computer science professor at Cornell University.
“AWS gives developers the tools to protect against these outages, but too often companies fail to implement them.”
The disruption illustrated how deeply everyday life now depends on a small set of interconnected digital systems — from mobile banking to ride-hailing and online gaming.
“For major businesses, hours of cloud downtime translate to millions in lost productivity and revenue,” said Ryan Griffin, U.S. cyber practice leader at McGill and Partners.
Despite the widespread impact, Wall Street was largely unfazed. Amazon shares closed up 1.6% at $216.48, as investors viewed the incident as unlikely to have long-term repercussions.
