There’s a question that runs underneath every AI Agent evaluation: what can it do?
Two years ago, that was the right question to ask because Agents were limited and capability was a genuine constraint. The gap between what organizations needed and what the technology could deliver was wide. I felt that gap acutely in early pilots—plenty of ambition, not enough dependable execution.
That gap has since narrowed considerably, and yet most organizations are running their Agents well below what’s technically possible. I see teams lean on answering and routing, but stop short of looking things up, taking actions, or resolving complex, multi-step problems—especially where data, process variance, or risk come into play.
The standard explanation is that AI isn’t good enough yet—models must improve, or vendors must ship more features. But after studying organizations across industries actively expanding their AI automation, I’ve found that this explanation holds up less often than people assume. The blockers tend to be elsewhere.
The teams I’ve observed weren’t primarily constrained by what their AI could do; they were constrained by what their organization was structured to let it do. In other words, the ceiling wasn’t the Agent’s capability—it was organizational readiness, governance, and risk tolerance.
“Readiness” for AI breaks into five distinct types, and most organizations have some but not all of them. Below is how I assess them with product, operations, and engineering leaders.
Content readiness is whether you can explain your product and policies clearly and consistently. Most companies can. In practice, that means up-to-date knowledge bases, unified policy language, and clear versions that Agents can cite and apply.
Scope readiness is whether you’ve defined the edges: when should AI engage, and when should it step aside? Edge cases multiply, intent varies by customer segment, sensitive topics surface mid-conversation, but most teams can work through this with effort. Clear guardrails reduce ambiguity and shrink risk.
Procedural readiness is where things start to get harder. This is about whether you can articulate your processes clearly enough for something other than a human with years of tacit knowledge to follow. The happy path is rarely the problem. It’s the failure paths, decision branches, variations that have never been written down because they’ve always lived in someone’s head.
Data readiness is the first real cliff. Can you reliably identify the right user, account, or object at the moment a decision needs to be made? Is the data trustworthy in real time? Are the APIs stable, accessible, and actually connected? For most organizations, the honest answer is “partially, but we’re not always sure when it breaks.”
Execution readiness is the highest bar. Not just technically (can the Agent make the change?) but organizationally. Who owns it when the wrong refund gets processed? Who detects it? Who recovers? Does someone with authority actually accept the risk?
Most companies have the first two, some have the third, fewer have the fourth and fifth. When I map this with teams, we often discover that their Agent’s ceiling is really a reflection of operational maturity and data plumbing, not model quality.
We studied companies across six industries – energy, healthcare, ecommerce, gaming, financial services, property management – all trying to expand what their Agents could do. The pattern was consistent: teams set out to automate real actions—looking up account status, processing changes, handling transactions. In most cases, the AI could technically do it, but at a certain point (somewhere between guiding a user through a process and looking something up on their behalf) they hit a wall.
One team tried to automate application changes but couldn’t reliably identify which application to modify across their internal systems. Another explored billing automation but couldn’t access live account data due to regulatory constraints. A third needed to verify status across third-party vendor systems their Agent couldn’t reliably reach. I’ve seen similar constraints surface around CRM integration, data governance, and vendor SLAs—none of which are model issues.
In most cases, the team redesigned around what their infrastructure could support. They moved toward guiding—walking users through processes step by step, rather than executing changes on their behalf. It worked, it resolved conversations and delivered real value, just differently than anyone planned. In customer support, this often looks like consultative flows that shorten time-to-resolution even without direct writes.
Most Agent evaluations are built around capability. Can it handle complex queries? Does it support multiple channels? Can it integrate with our systems? These are reasonable things to evaluate for, but they produce a capability score, and that doesn’t tell you whether your organization can actually use what you’re buying.
The teams that got to deeper automation, the ones executing actions early, didn’t have “better AI,” they had more standardized operations. Actions that were already well-defined, consistently applied, and exposed through stable systems with clear rules. Automation wasn’t inventing new behavior, it was triggering actions that were already tightly controlled elsewhere.
Readiness enables capability, not the other way around. Which reframes the evaluation question from “can the AI do this?” to “are we actually ready for it to?”
Something that gets lost in most conversations about AI readiness is that organizations are often further along than they assume, just not for the kind of work they were planning for. A team that set out to automate refunds but can reliably guide users through complex troubleshooting has genuine capability deployed. They’re operating at the level their readiness supports, which is a starting point, not a deficit.
The more useful frame isn’t “are we ready?” – it’s “what are we ready for, and what specifically stands between here and the next level?” The gaps tend to be concrete: a missing API, data that lives in three systems that don’t agree, a process that’s never been documented, or an ownership question nobody has answered. These are solvable problems. They just require a different kind of investment than buying a more capable Agent.
What nobody has worked through seriously yet is how organizations actually build readiness. Does it develop naturally through using AI at shallower levels first? Or is it mostly a function of prior decisions, like system architecture choices made years ago, operational maturity that accumulated over time, engineering investments that have nothing to do with AI? When readiness does increase, what actually changes? Does the support team develop it? Does engineering grant it? Does it require executive sponsorship and investment in infrastructure with no obvious AI label on it?
In my experience, progress comes from a joint effort: product to define scope and guardrails, operations to codify procedures and edge cases, engineering to harden APIs and observability, and leadership to underwrite risk with clear ownership. When those pieces align, agentic AI moves from guided assistance to safe, auditable execution.
Until there are clearer answers, the pattern is likely to continue. Companies will buy capable Agents, plan ambitious rollouts, and find that the harder work is building the organizational infrastructure. The Agents can do the work. The question is what it takes to let them.
Inspired by this post on The Intercom Blog.











Leave a Reply