Lawrence Person's BattleSwarm Blog
Megacorporations are telling businesses that their AI offerings are good enough to run vital company functions. The problem is, those AIs are still screwing up, and frequently in ways humans wouldn’t screw up. That’s what Amazon found out when they tried to eat their own dogfood, putting their AI in charge of Amazon Web Services. It didn’t go well.
Are AI tools reliable enough to be used at in commercial settings? If so, should they be given “autonomy” to make decisions? These are the questions being raised after at least two internet outages at Amazon’s cloud division were allegedly caused by blundering AI agents, according to new reporting from the Financial Times.
In one incident in December, engineers at Amazon Web Services allowed its in-house Kiro “agentic” coding tool to make changes that sparked a 13-hour disruption, according to four sources familiar with the matter. The AI, ill-fatedly, had decided to “delete and recreate the environment,” the sources said.
When something is “in the cloud,” that means it’s sitting on someone else’s computer. More specifically, it’s probably running as a containerized instance on any of a number of other CPU and storage pools being run under a hypervisor to scale up or scale down resources as demand requires. This allows efficient use of those resources, and it’s made AWS Amazon’s most profitable business. And most of the time AWS works pretty well.
Amazon employees claimed that this was not the first service disruption involving an AI tool.
“We’ve already seen at least two production outages [in the past few months],” one senior AWS employee told the FT. “The engineers let the AI [agent] resolve an issue without intervention. The outages were small but entirely foreseeable.”
AWS launched its in-house coding assistant, Kiro, in July. The company describes the tool as an “autonomous” agent that can help deliver projects “from concept to production.” Another AI coding assistant developed by Amazon, described as an AI assistant, was involved in the earlier outage.
The employees said the AI tools were treated as an extension of an operator and given operator-level permissions. In both of the outages, the engineers didn’t require a second person’s approval before finalizing the changes, going against typical protocol.
More:
https://www.battleswarmblog.com/?p=70213