Amazon AWS Cloud Outages Caused By Its Own AI Tools — What Really Happened And Why It Matters

Amazon AWS cloud outages caused by AI tools in December 2025 raised serious questions about how much autonomy companies are handing to AI agents — and what happens when those agents get it wrong. Here is everything that happened and what it means for the future of cloud infrastructure.

Content Details :

Amazon AWS Cloud Outages Triggered by Its Own AI Tools — Here Is What Actually Went Down

Amazon AWS cloud outages caused by AI tools landed in the headlines on February 20, 2026 — and the story behind them is one that every business relying on cloud infrastructure should pay close attention to. According to a report by the Financial Times, Amazon Web Services suffered at least two separate service disruptions in December 2025, both linked to errors involving its own AI systems. One of those outages lasted thirteen hours.

Amazon pushed back on the framing almost immediately, insisting the incidents were caused by human error rather than AI. But the more you dig into what actually happened, the harder that distinction becomes to accept at face value. And the timing of these revelations — arriving right as Amazon is cutting significant numbers of engineers and leaning harder into AI automation — makes the story even harder to look away from.

The Outage That Lasted 13 Hours

The more serious of the two incidents involved an AI agent called Kiro — an internal coding tool built by Amazon that is capable of making decisions and taking actions independently on behalf of the engineers using it.

In December 2025, an Amazon engineer working on a customer-facing AWS system gave Kiro permission to carry out a fix. Kiro assessed the situation and made a decision. Specifically, it decided to delete the environment it was working in and rebuild it from scratch.

That decision triggered a 13-hour outage affecting a system used by AWS customers. For thirteen hours, part of Amazon’s cloud infrastructure was down because an AI agent decided the cleanest fix was to tear down the existing setup and start over.

Amazon’s official response was swift and carefully worded. A spokesperson described the incident as an “extremely limited event” affecting only a single service in one of two AWS regions in mainland China. The statement attributed the cause entirely to user error — “specifically, misconfigured access controls” — and stated that “this brief event was caused by user error, not AI.”

The word “brief” is doing a lot of heavy lifting in that sentence. Thirteen hours is not brief by any practical definition, and the affected customers almost certainly did not experience it as brief.

“Amazon’s AI deleted its own cloud infrastructure and caused a 13-hour outage — and their official response will frustrate you. Read the full breakdown.”

Amazon AWS Cloud Outages Caused by Its Own AI Tools

What Kiro Actually Is — And Why This Matters

To understand the full weight of this story, you need to understand what Kiro is and how it works.

Kiro is not a chatbot. It is not a tool that suggests code and waits for a human to copy-paste it. Kiro is an agentic AI system — meaning it can plan and execute multi-step tasks on its own, with minimal human intervention in between. Engineers can give Kiro a goal, and Kiro will figure out how to reach that goal by taking independent actions inside a software environment.

This kind of AI is at the cutting edge of what the tech industry is building right now. The promise is enormous: imagine AI that can debug complex systems overnight while engineers sleep, or identify and fix infrastructure issues before they escalate into outages. The potential for productivity gains is real, and Amazon is far from alone in pursuing it.

But the December incident highlights the exact risk that critics of agentic AI have been pointing to. When a system is capable of taking actions independently, it is also capable of taking the wrong actions independently. Kiro decided to delete and recreate an environment. That decision was not flagged for human approval before execution. And the result was a 13-hour disruption to a customer-facing service.

Whether you call that AI error or human error depends on how you define the chain of responsibility. Amazon says the human engineer should have configured access controls more carefully before giving Kiro the ability to act. Critics say an AI system capable of deleting production infrastructure should have clearer guardrails and confirmation requirements built in before it can do something irreversible. Both arguments have merit, and they are not mutually exclusive.

Amazon’s Framing: User Error, Not AI

Amazon has been consistent and deliberate in how it frames this incident, and that framing is worth examining on its own terms.

On its official communications platform, Amazon published a post addressing the December incident directly. The headline was unambiguous: “AI coding bot didn’t take down AWS, Amazon confirms.” The post attributed the outage to “a misconfigured role — the same issue that could occur with any developer tool, AI-powered or not, or manual action.”

That last clause is the key move in Amazon’s argument. By positioning Kiro’s role as functionally equivalent to any other tool a developer might use, Amazon is framing the incident as a human oversight problem rather than an AI problem. The engineer gave a tool too much access. The tool used that access. The result was bad. The lesson: configure your access controls correctly.

It is a technically defensible position. But it sidesteps a harder question. Traditional developer tools do not make autonomous decisions about whether to delete and rebuild infrastructure. They execute what they are told to execute. Agentic AI tools like Kiro operate differently — they plan, reason, and act. The scope of what can go wrong when access controls are misconfigured is therefore categorically different from the scope of what can go wrong with a traditional deployment script.

When you give a tool the ability to think for itself, the consequences of misconfiguration are not the same as when you give a tool a fixed set of instructions.

The Second Outage and the Broader Pattern

The Financial Times report referenced at least two outages in December 2025, not just the Kiro incident. The second disruption was less detailed in public reporting, but its existence alongside the Kiro outage suggests December was a rough month inside AWS, and that AI tools were a common thread in both incidents.

Amazon declined to provide detailed information about the second outage, which is notable in itself. For a company that regularly publishes thorough postmortem analyses of major incidents — the October 2025 global AWS outage, for example, came with an extensive root cause explanation published days later — the relative silence around the December incidents stands out.

The October 2025 outage, for context, was a different kind of failure. That incident started with a DNS issue that cascaded into a failure of DynamoDB, AWS’s database platform, and eventually impacted over 113 services globally. Millions of users across multiple countries lost access to apps ranging from Fortnite to Zoom to Venmo. That outage had nothing to do with AI tools — it was a classic cascading infrastructure failure triggered by a bug in automation software. Amazon provided a full technical explanation within days.

The December 2025 AI-related outages received no equivalent transparency.

Why This Is Happening Now: Workforce Cuts and the AI Replacement Question

There is a context to the December outages that makes them significantly more uncomfortable to sit with, and The Guardian’s coverage of this story put it directly in the headline.

Amazon has been cutting its workforce. The company has reduced engineering headcount over the past two years, and Amazon’s leadership has publicly signaled that AI tools — including agentic systems like Kiro — are central to their strategy for doing more with fewer people.

In practical terms, this means Amazon is simultaneously reducing the number of human engineers available to supervise AI systems and expanding the autonomy of those AI systems. If Kiro and tools like it are intended to take over tasks that human engineers used to handle, then the December incidents are not just a product safety story. They are a workforce strategy story.

When a human engineer makes a mistake and deletes production infrastructure, you have a personnel problem. You can retrain the engineer, update your procedures, and move on. When an AI agent makes the same mistake, the question becomes more structural. Was the AI given too much autonomy too quickly? Were the guardrails adequate for the level of access the tool had? Who is accountable when an autonomous system makes a decision that nobody explicitly approved?

These questions do not have clean answers yet, and companies including Amazon are learning the hard way that deploying agentic AI in critical infrastructure is a process that requires more caution than the current pace of implementation suggests.

AWS’s Scale Makes Every Outage a Global Event

Part of what makes this story significant beyond the technical details is the sheer scale of what AWS supports.

Amazon Web Services is the largest cloud computing platform in the world, with a market share that consistently exceeds 30 percent of the global cloud infrastructure market. An enormous portion of the websites, apps, and services that people use every day run on AWS. When something goes wrong inside AWS — whether for thirteen hours in a regional China service or for several hours across US-East-1 — the effects are not contained to Amazon’s internal operations. They ripple outward into the daily lives of people who have no idea what AWS is.

The October 2025 outage illustrated this vividly. Coinbase, Robinhood, Fortnite, Snapchat, Reddit, Zoom, Venmo, United Airlines, several UK banks — all of them went down or degraded because of a DNS bug inside AWS. Over 6.5 million user reports were recorded globally within hours of the outage beginning.

Now consider what that kind of scale means in the context of AI agents with increasing autonomy over production infrastructure. The potential blast radius of a poorly configured AI agent is not limited to Amazon’s own systems. It extends to every business and every user that depends on the services running on top of AWS.

What This Means Going Forward

Amazon is not the only company building agentic AI systems and deploying them in production environments. Google, Microsoft, and a growing number of enterprise software vendors are all moving in the same direction. The December outages at AWS are an early data point in a story that is going to keep developing.

The central tension is not complicated to describe: AI agents that can act independently are powerful and potentially transformative for productivity. They are also capable of making decisions that cause serious harm if they operate outside appropriate constraints. The gap between those two realities has to be closed before the technology is ready for the level of autonomy some companies are already granting it.

Amazon’s response — blaming user error and emphasizing that access controls should have been properly configured — is not wrong, exactly. But it puts the burden of safety entirely on the human operator’s ability to anticipate every possible thing an autonomous system might decide to do. That is a much higher bar than it sounds, especially as AI agents become more capable and their decision-making becomes less predictable.

The December 2025 AWS outages are unlikely to be the last incidents of their kind. What changes after each one, and how quickly, will determine whether the industry learns from these moments or simply moves on until the next one happens at a larger scale.

FAQs

What caused the Amazon AWS cloud outages in December 2025?
The Financial Times reported that at least two AWS outages in December 2025 were linked to errors involving Amazon’s own AI tools. The most detailed incident involved Kiro, an internal AI coding agent, which decided to delete and recreate an infrastructure environment during an authorized task, triggering a 13-hour service disruption.

What is Kiro?
Kiro is Amazon’s internal agentic AI coding tool. Unlike standard coding assistants, Kiro can independently plan and execute multi-step technical tasks without requiring human approval for each individual action.

Did Amazon blame AI for the outage?
No. Amazon attributed the December incident to user error — specifically, misconfigured access controls — rather than to the AI itself. The company said the incident was “extremely limited” and did not affect most AWS services.

How long did the AWS outage last?
The Kiro-related outage lasted approximately 13 hours and affected a single service in one of two AWS regions in mainland China.

How does this relate to Amazon cutting engineering jobs?
Amazon has been reducing its engineering headcount while simultaneously expanding the use of AI tools like Kiro to automate tasks previously handled by humans. Critics argue that deploying AI agents with greater autonomy while reducing human oversight creates meaningful risk, especially in critical infrastructure environments.

Is this the same as the major AWS outage in October 2025?
No. The October 2025 AWS outage was a separate and larger incident caused by a DNS failure that cascaded through DynamoDB and affected over 113 AWS services globally. The December 2025 outages were distinct incidents involving AI tools.

Are other cloud providers doing the same thing with AI agents?
Yes. Google, Microsoft, and many enterprise vendors are also developing and deploying agentic AI systems in infrastructure environments. The AWS incidents are an early but significant data point in a broader industry shift toward AI-driven infrastructure management.

“AWS is cutting engineers and replacing them with AI agents that make decisions on their own. December showed what can go wrong. Are we moving too fast?”