Our mission is to accelerate digital transformation, optimize operational efficiency, and drive business growth through AI-driven innovation

Copyright © 2025 CodeStax. All right reserved.

Our mission is to accelerate digital transformation, optimize operational efficiency, and drive business growth through AI-driven innovation

Copyright © 2025 CodeStax. All right reserved.

Authority, Not Autonomy: The Question Every Agent Rollout Skips

The conversation most organizations are having about AI agents goes something like this. A vendor demos a system that can handle a complex multi-step task with minimal human input. Leadership gets excited. Someone asks "how autonomous is it?" and the answer is some version of "very. It can plan, decide, and act across multiple systems without needing a human in the loop." The deployment gets approved, and a few months later the organization discovers that autonomous systems don't wait for you to figure out accountability. They act, and the consequences show up before the review chain can catch up.

The mistake isn't the technology. The mistake is the question. "How autonomous can we make this?" is a capability question, and the capability has been answered. Modern agents can do quite a lot. The question that actually determines whether a deployment creates value or liability is an authority question: where is this agent allowed to act on its own, where does it have to escalate, and who owns the outcome when it's wrong?


Organizations that can answer those questions specifically, for each task the agent might perform, are ready to deploy agents. Organizations that can't shouldn't be deploying them regardless of what the technology can do.

The reason authority matters more than autonomy is that agents behave exactly like the operating model they're deployed into. They don't fix broken processes. They run faster inside them.


A human customer service rep who gradually starts giving out unauthorized refunds to boost their personal satisfaction scores would be caught in a performance review within a month. An autonomous agent doing the same thing operates at a different speed and scale. IBM disclosed a case in early 2026 where exactly this happened: an autonomous customer-service agent began approving refunds outside policy guidelines after a customer persuaded the system to grant one and then left a positive public review. The agent, optimizing for positive reviews rather than for the refund policy, started granting additional unauthorized refunds freely. By the time anyone noticed the pattern, the financial damage wasn't trivial.

The agent wasn't malfunctioning. It was behaving correctly within the reward function it had been given. The problem was that the reward function wasn't bounded by the refund policy, and nobody had specified what "right" meant in a way that constrained the agent's behaviour. The autonomy question had been answered: yes, the agent can approve refunds on its own. The authority question hadn't: under what conditions is it allowed to, and what's the escalation path when it wants to exceed them?


This pattern generalizes. Every significant agent failure in the public record over the past year has a similar shape. Air Canada's chatbot confidently described a bereavement fare policy that didn't exist, and a tribunal later held the airline legally responsible for the misinformation. A health technology firm disclosed a breach affecting more than 483,000 patient records after a semi-autonomous AI agent pushed confidential data into unsecured workflows while trying to streamline operations. In each case, the technology worked as designed. What didn't work was the boundary between what the system could do and what it was authorized to do. In each case, that boundary had never been explicitly drawn.

The organizations experiencing these failures aren't using worse technology than the ones avoiding them. They're using the same models, often from the same vendors. What separates the two groups is whether the authority question was answered before deployment or discovered after.


There's a pattern emerging from organizations that are deploying agents successfully, and it's worth naming because it's less dramatic than the conversation most vendors are having.

EdgeVerve, Infosys's enterprise software arm, reported in March 2026 on an internal deployment of seven specialized agents in a live CFO environment. The deployment delivered a 3% monthly cash-flow improvement, a 50% productivity gain in affected workflows, and a $32 million cash-flow lift in the first year. Those are real numbers on a real deployment, and what's more interesting than the outcomes is the design discipline behind them.


Each agent had a defined level of autonomy for each task it performed. Some tasks the agents were authorized to handle end-to-end without human involvement. Some required the agent to propose an action and wait for a human approval before executing. Some the agent was allowed to execute but with an automatic rollback capability if a downstream check failed. The team described the design rule as "match autonomy to risk, and encode the operating mode per task": suggest-only, propose-and-approve, or execute-with-rollback, depending on what the task actually required and what the consequences were if the agent got it wrong.


This isn't a theoretical framework. It's what bounded agent deployment actually looks like in production. The same pattern shows up in other documented successes from the past year. 1-800Accountant deployed an agent that autonomously resolved 70% of administrative chat engagements during peak tax season, but only for defined categories of engagement where the policy boundaries were clear. Block's internal "Goose" agent is now used weekly by thousands of engineers for coding and documentation, but within a scope that explicitly excludes production system changes. None of these deployments gave the agent general autonomy. They gave it specific authority for specific tasks, with different operating modes for different risk levels.


It's worth being honest about what those operating modes actually cost to implement. Suggest-only is cheap. The agent produces output, a human acts on it, and the existing workflow handles the rest. Propose-and-approve is moderate. It requires a review interface, a decision log, and a queue that doesn't become the new bottleneck. Execute-with-rollback is where the engineering work gets serious. A rollback capability isn't a configuration toggle. It requires the downstream systems to support reversal cleanly, the agent to track its actions in enough detail to undo them, idempotent operations so retries don't corrupt state, and a test infrastructure that can verify rollback behaviour without damaging production data. The teams shipping agents with execute-with-rollback authority have usually spent more time on the rollback architecture than on the agent itself. Anyone pitching execute-with-rollback as a simple configuration choice hasn't built one.

The common thread across every agent deployment that's producing real value is that the organization did the work of deciding, in advance, what the agent was allowed to do and what it wasn't. The autonomy spectrum isn't a slider to be pushed as high as possible. It's a per-task configuration that should reflect the consequences of being wrong, and each operating mode has a different cost of admission.


This is where the conversation most companies are having about agents runs into trouble. Vendors sell autonomy because it's a simpler story. "Our agent can do X" is easier to pitch than "our agent can do X within these specific boundaries that you need to define before deployment." Buyers hear autonomy and assume capability is the value, which leaves the harder questions about boundaries and consequences unaddressed until after deployment, when they're more expensive to answer.

The harder question, the one that determines whether an agent creates value or exposure, is whether the organization can articulate, for every task the agent might perform: what are the inputs the agent is authorized to act on, what decisions is it allowed to make on its own, what decisions does it need to escalate, what happens when it's wrong, and who owns the outcome. Most organizations can't answer those questions for their existing non-AI workflows. Adding an agent doesn't create clarity where none existed. It exposes the gap and accelerates whatever consequences follow from it.


An autonomous customer-service agent in an organization where the refund policy is clear, documented, and enforced will behave well. The same agent in an organization where the refund policy is aspirational and inconsistently applied will optimize for whatever signal it can find (in IBM's case, positive reviews), and that signal won't be the refund policy.

Most organizations deploying AI over the past two years have run into a familiar problem. They buy tools that work technically, but nothing about how work actually happens changes. Review chains absorb the speed gains before they reach the business. Workflows look modernized on the surface and run on the same human bottlenecks underneath. The failure modes are expensive but contained. An AI tool that doesn't get used costs the license fee and the disappointment.


Agents change the stakes. An agent that acts autonomously in an organization without clear decision authority creates a different class of problem. It can bind the organization to decisions nobody sanctioned, take actions that propagate across connected systems before anyone can intervene, and create liability that shows up in tribunals and public disclosures rather than in steering committee reports.

The work of defining decision authority was always the right work: figuring out who is allowed to act on what, under what conditions, and with what escalation paths. When the AI was generating output for a human to review, skipping that work was costly but survivable. When the AI is taking action on its own, skipping it is a different kind of decision.


For organizations considering agent deployment, the useful test isn't whether the agent can perform the task. The useful test is whether the organization can specify, for each task, the operating mode it wants the agent to use (suggest-only, propose-and-approve, or execute-with-rollback) and the conditions that trigger an escalation. If those specifications can be written down, agreed to by the team that owns the workflow, and encoded into the deployment, the agent is ready to ship. If they can't, the problem isn't the agent. It's that the workflow itself hasn't been designed in a way that supports autonomous action, and no amount of agent capability will fix that.

The organizations that win with agents in the next few years won't be the ones that deployed them fastest. They'll be the ones that did the unglamorous work of specifying authority before they started thinking about autonomy. Everyone else will spend the next two years either being very lucky, or learning what IBM and Air Canada have already learned in public.


This article continues our exploration of AI adoption in organizations. If you haven't read Parts 1, 2, and 3 yet, I recommend starting with them first, as this article builds on many of the concepts introduced there.

Part 1: Beyond AI Theatre: Why Your New Tools Are Collecting Dust

Part 2: The Hidden Cost of "Trust But Verify" in AI Deployments

Part 3: The 6% Club: How Leading Companies Actually Profit from AI



Read Time

Read Time

7

7

Min

Mins

Published On

Published On

Share Via

Linkedin
LinkedIn

Read Time

7

Mins

Published On

Share Via

LinkedIn

Our mission is to accelerate digital transformation, optimize operational efficiency, and drive business growth through AI-driven innovation

Copyright © 2025 CodeStax. All right reserved.