The 6% Club: How Leading Companies Actually Profit from AI

Platforms

Gen AI

About

Resources

Our mission is to accelerate digital transformation, optimize operational efficiency, and drive business growth through AI-driven innovation

Platforms

FlowStax

FinStax

Gen Ai

ScreenStax

Launch Pad

HelpStax

Company

About

Life at CodeStax

Compliance

Resources

Blogs

Inside the stack

Our mission is to accelerate digital transformation, optimize operational efficiency, and drive business growth through AI-driven innovation

Platforms

FlowStax

FinStax

Gen Ai

ScreenStax

Launch Pad

HelpStax

Company

About

Life at CodeStax

Compliance

Resources

Blogs

Inside the stack

The 6% Club: How Leading Companies Actually Profit from AI

Most AI deployments follow a predictable arc. Leadership approves a budget, the team picks a use case, a pilot gets built, and someone presents results to a steering committee. The tool works. The demo goes well. And then, somewhere between six and eighteen months later, the organization is running at roughly the same speed it always did. The AI is technically in production. Nothing about how work actually happens has changed.

This pattern is well-documented at this point. What's less discussed is that it doesn't describe every organization investing in AI.

McKinsey's 2025 State of AI research found that roughly 6% of organizations qualify as AI high performers, defined as those generating more than 5% of EBIT from AI. That same group is nearly three times more likely than typical organizations to have redesigned their workflows around AI. The gap between the 6% and everyone else isn't access to better models or larger budgets. It's that the high performers stopped treating AI as a tool to add and started treating it as a capability to design around.

This article is about what that redesign actually looks like in practice, and what it costs to do.

Allianz UK launched an internal tool called BRIAN in January 2025. It's not a flashy customer-facing product. It's a system that helps underwriters navigate the company's internal guidance documents, which run to hundreds of pages and have to be consulted whenever an underwriter is assessing a complex risk. Before BRIAN, an underwriter who needed to check a specific clause or guideline would spend hours searching, reading, and cross-referencing. After BRIAN, they ask a question and get an answer.

In the first year of operation, the tool saved approximately 135 working days of information gathering across the underwriting team. That number sounds modest compared to the headlines other AI deployments generate, but it's the kind of number that suggests something real happened. Underwriters didn't get a tool they had to integrate alongside their existing search habits. They got a workflow change. The default for "I need to check this" stopped being "open the PDF and start reading." It became "ask BRIAN."

The underwriter's role shifted accordingly. Less time spent retrieving information, more time spent on the analytical work that retrieval was getting in the way of. The tool didn't replace the underwriter. It removed the part of the underwriter's job that nobody enjoyed and that didn't require their judgment.

The 135 days saved didn't materialize on day one. Underwriters initially treated BRIAN's answers the way they treat any new system — by checking them against the source documents to make sure the tool wasn't hallucinating. The trust built slowly, over months, as the answers held up. The shift in default behaviour came later than the launch announcement suggested.

Wells Fargo did something structurally similar at a much larger scale. The bank built an internal Microsoft Teams app that gives 35,000 bankers across 4,000 branches instant access to guidance on roughly 1,700 internal procedures. Before the app, when a banker hit a procedural question they didn't know the answer to, they'd ask a colleague, escalate to a supervisor, or dig through internal documentation. The lookup might take ten minutes. Across 35,000 bankers, that's a significant operational drag, and it pulled experienced people away from their own work to answer routine questions.

After the app, response times for procedural questions dropped from ten minutes to thirty seconds. The number that matters more than the speed, though, is the adoption rate. Within months of rollout, 75% of procedural searches were going through the app rather than through colleagues or documentation. That's not performative adoption. That's the workflow actually changing.

Getting to that 75% wasn't immediate either. The first weeks of any rollout at that scale include the predictable mix of bankers who refuse to use the new tool because they prefer their old habits, branches where the rollout coincides with another priority and gets deprioritized, and edge cases where the AI returns the wrong procedural answer and gets blamed more loudly than the human errors it's replacing. The 75% is what's left after that noise settles.

Both deployments share a structural choice that distinguishes them from the typical failure pattern. The AI wasn't positioned as something the team could use if they wanted to. It was positioned as the new default for a specific type of work, and the team's role was redesigned around the assumption that the default would be used.

The reason this kind of redesign is rare isn't that organizations don't understand the principle. It's that getting from a working AI tool to an AI-native workflow requires moving through a layer of operational friction that most leadership underestimates.

Consider what actually happens when an organization decides to make an AI tool the default rather than the optional. The product team has to write new business requirement documents that describe the redesigned workflow, not just the tool's features. The change request goes through whatever governance process the organization uses for production system changes, which usually involves multiple sign-offs from teams who weren't part of the original tool decision. UAT environments need to be configured to test workflows that include non-deterministic AI outputs, which is a problem QA teams have not traditionally had to solve. SDEs have to figure out how to handle outputs that vary across runs in systems that were architected on the assumption of consistent behaviour. Test cases have to be written for scenarios where "correct" is a range rather than a single value.

Each of those is solvable. None of them are quick. And underneath all of the technical work sits the harder political work of telling senior people in the affected team that their role is changing — that the part of their job they've spent years getting good at is being absorbed by the system, and that their value going forward will be measured differently. That conversation doesn't happen in a steering committee. It happens in one-on-ones, over months, with people who have legitimate concerns about what the change means for them.

The 6% don't avoid this friction. They budget for it. They treat the workflow redesign as the actual project and the AI tool as one component of that project. The 94% treat the AI tool as the project and discover the workflow redesign work after deployment, when it's harder and more expensive to do.

There's a specific failure mode worth naming because it shows up even in organizations that get the initial redesign right. As AI capabilities improve, the boundaries that were correctly drawn at deployment start to drift.

A workflow that was designed eighteen months ago might have given the AI authority to handle simple cases and escalate complex ones to humans. The model was good enough for the simple cases and not good enough for the complex ones. Eighteen months later, the model has improved. It's now capable of handling cases that were previously escalated. But the workflow hasn't been updated to reflect that. The humans are still reviewing cases the AI could now handle on its own, which means they're spending time on work that's no longer the highest use of their judgment.

The opposite drift also happens. A workflow that started with conservative human review can quietly become more permissive over time as the team builds confidence in the AI. Outputs that were reviewed line by line at launch get glanced at after six months and accepted at face value after a year. By the time something goes wrong, nobody can clearly explain when the review standard changed or who decided it should.

The 6% treat decision authority as something that needs periodic recalibration, not a one-time configuration. They review the boundaries quarterly. They look at where the AI is now competent that it wasn't before, and where the human review is now perfunctory that it shouldn't be. They adjust the workflow as the underlying capability shifts. This isn't glamorous work, but it's the difference between an AI-native workflow that keeps producing value and one that gradually drifts back toward either over-review or under-review.

None of this means the 6% are doing something other organizations couldn't do. They aren't using better models. The technology is broadly available. What separates them is that they started from the workflow rather than the tool. They asked which decisions get made, who currently makes them, what data those decisions require, and where AI could legitimately participate. Then they built around that. And crucially, they spent the months of unglamorous operational work needed to make the new design actually function in production.

Most organizations skip that part. They identify an AI capability, pick a workflow that looks like a good fit, deploy the tool into the existing process, and bolt review layers on top to manage the discomfort of trusting a model. That's how a vendor risk assessment ends up with both an AI summarizer and an analyst still reading the full PDF. It's how a test case generation tool ends up with senior QAEs writing their own cases first and cherry-picking AI suggestions to make adoption metrics look healthy. The technology runs. The team's day looks roughly the same as it did before.

The difference between the two outcomes isn't model quality, talent, or budget. It's whether the workflow redesign happened before the deployment or got hoped for afterward.

There's also a third group worth naming, which the McKinsey binary doesn't capture. A meaningful number of organizations are currently mid-redesign — they've started the workflow change, hit a compliance director who won't sign off, run into a senior reviewer who's organizing quiet resistance, or discovered that the change request queue has a six-month backlog. They aren't in the 6% yet and they aren't in the comfortable 94% either. They're stuck in the operational friction the previous section described, and most of them will either push through it over the next year or quietly slide back into the bolted-on pattern.

For organizations currently in the 94%, the practical move isn't to launch another pilot or evaluate another vendor. It's to pick one workflow that already has an AI tool deployed and ask a different set of questions about it. Where in this workflow is the AI authorized to act without human re-verification? If the answer is "nowhere," that workflow is a candidate for redesign, not for adding more tools. What would have to be true for the AI's output to become the default? If those conditions are achievable, they're the actual roadmap. Who in the affected team would lose authority if the redesign happened, and what conversations need to happen with them before any of it can move forward?

That last question is the one most organizations skip, and it's the one that determines whether the redesign actually ships. The companies that get AI to deliver real value aren't deploying more of it than everyone else. They're treating the BRDs, the change requests, the UAT cycles, and the difficult one-on-ones with senior reviewers as the work itself, with the AI tool as the part that's relatively easy.

This article continues our exploration of AI adoption in organizations. If you haven't read Parts 1 and 2 yet, we recommend starting with them first, as this article builds on many of the concepts introduced there.
Part 1: Beyond AI Theatre: Why Your New Tools Are Collecting Dust
Part 2: The Hidden Cost of "Trust But Verify" in AI Deployments

Platforms

FlowStax

FinStax

Gen Ai

ScreenStax

Launch Pad

HelpStax

Company

About

Life at CodeStax

Compliance

Resources

Blogs

Inside the stack