I think about Agentforce implementation the same way I think about any high-stakes product launch: start with outcomes, instrument relentlessly, and iterate in tight loops. When agentic AI touches core workflows in Salesforce, the winners are the teams that combine rigorous product strategy with thoughtful CRM integration and product-led growth tactics.
Learn the ways in which Pendo helps companies design and iterate on their agentic strategy for Salesforce.
My working playbook begins with clarity. Before a single agent is deployed, I align with stakeholders on the highest-value “jobs” inside Salesforce—reducing case handle time in Service Cloud, accelerating lead qualification in Sales Cloud, or improving data hygiene for revenue operations. That alignment shapes our agentic AI approach and prevents us from shipping clever agents that don’t move the metric that matters.
From there, I treat telemetry as a first-class requirement. I instrument the end-to-end journey with Pendo so we can observe when an agent triggers, when it falls back, when it hands off to a human, and how those moments affect conversion, CSAT, and cycle time. I refer to this observability layer as Agent Analytics, and it’s the backbone of evidence-based iteration.
Guidance is equally critical. I use Pendo’s in-app guides to onboard admins and frontline users directly inside Salesforce, deliver contextual tooltips that explain what the agent will do next, and collect feedback within the flow of work. That combination shortens time-to-value and builds trust, which is essential for customer support ai strategy and change management.
Iteration is where the compounding returns show up. I run A/B testing on prompts, decision policies, and handoff rules; evaluate performance on real user cohorts; and promote what works. This is classic product-led growth applied to agentic AI—ship small, measure precisely, and scale winners. Prompt engineering is not a one-time task; it’s a continuous discovery loop.
I also weave in governance from day one. Privacy-by-design, data governance, and AI risk management aren’t add-ons—they are design constraints that shape what the agent is allowed to see and do. The guardrails live alongside the experience: clear disclosures, reversible actions, and easy ways for users to override or escalate.
Finally, I operationalize the learning loop. Weekly reviews with a product trio (PM, design, engineering) examine Pendo dashboards, qualitative feedback, and Salesforce outcomes. If an agent is underperforming, we adjust prompts, refine retrieval, or simplify the decision tree. If it’s exceeding targets, we expand the use case and systematize the pattern.
When teams ask me for the “right way” to implement Agentforce, my answer is simple: treat your agent like a product. Measure with Pendo, guide inside Salesforce, and iterate until the business outcome moves. That’s how we turn promising agents into durable advantages.
I’ve spent the past few product cycles re-architecting roadmaps around one simple reality: AI is no longer just a feature—it’s a business model. The companies winning market share are those that treat models, data, and workflows as monetizable assets with defensible moats, not science projects.
AI business models are rewriting value creation. Learn how smart teams turn algorithms into profit engines, reshaping entire industries.
From my seat in product leadership, I evaluate AI bets through three lenses: durable value (moat and differentiation), measurable outcomes (clear ROI), and unit economics (gross margins under real-world load). With that frame, here are ten AI business models I see performing now—and how I decide when to invest.
1) API-first Model-as-a-Service. I monetize foundation or specialized models via an API, priced by tokens, requests, or time-in-context. Success hinges on latency, accuracy, and “context window management” that balances quality with cost. This is where “consumption SaaS pricing” shines and where disciplined rate-limiting, observability, and SLAs build trust.
2) Vertical AI copilots. I package domain-specific expertise (legal, healthcare, finance, field service) into workflow-native assistants that surface next-best actions. Because these copilots live where work happens, I price on outcomes—time saved, revenue recovered, or risk reduced—aligning value with customer metrics and accelerating product adoption.
3) Agentic AI automation. When autonomous agents handle multi-step tasks across tools, I lean toward per-outcome or per-job pricing. Reliability is the moat, so I invest early in eval-driven development, robust guardrails, and human-in-the-loop QA. This model compounds fast once agents can execute end-to-end workflows with transparent audit trails.
4) Copilot add-ons inside existing SaaS. I’ve seen “AI Assist” tiers deliver immediate ARPU lift and retention gains. The playbook: start with high-frequency, high-friction jobs (drafts, summaries, enrichment), then expand to proactive suggestions. This aligns tightly with product strategy and lets me stage value without overhauling the core experience.
5) Insights-as-a-Service via data network effects. I transform exhaust data into benchmarking, predictions, and prescriptive recommendations—while honoring privacy-by-design and data governance. The more customers I onboard, the stronger the patterns, and the higher the switching costs. Pricing ties to seats plus an outcomes or value metric.
6) Retrieval-first pipeline for enterprise knowledge. I land with high-accuracy answers over customer data (search, summarize, cite), then expand into workflow automations. This “retrieval-first pipeline” reduces hallucinations, boosts trust, and creates defensibility through connectors, semantic indexing, and continuous relevance tuning—an ideal fit for LLMs for product managers prioritizing reliability.
7) Open source monetization. When I bet on openness, I monetize hosting, support, enterprise controls, and compliance features. The advantage is developer love and rapid iteration; the moat is operational excellence at scale, plus integrations customers rely on. This model converts community momentum into predictable revenue.
8) Marketplaces for prompts, skills, and agents. I create a platform for third-party extensions and charge a take rate on usage. The flywheel spins when developers see distribution, customers see breadth, and I enforce strong quality bars. The roadmap focuses on governance, discovery, and safe execution policies.
9) Solutions with forward deployed engineers. For complex rollouts, I pair product with specialized implementation to guarantee outcomes. Revenue blends software plus services, accelerating time-to-value and informing the roadmap with real-world constraints. Over time, learnings fold back into scalable, self-serve capabilities.
10) AI risk, security, and compliance tooling. As AI scales, so does the need for policy enforcement, monitoring, and auditability. I monetize via platform subscriptions that address model provenance, data leakage prevention, red teaming, and reporting. Strong “AI risk management” is now a purchasing requirement, not a nice-to-have.
How do I choose among these models? I start with the customer’s biggest workflow pain, map it to the fastest path to measurable outcomes, and align pricing with value creation. Then I build defensibility through data advantage, distribution, and governance. If a model deepens trust, improves margins, and compounds learning, it earns a place on the roadmap.
When a customer reports a stolen credit card, the frontline play seems straightforward—freeze it. But that’s just the visible tip of a much larger customer support iceberg. Underneath sits the real work: dispute filings, fraud investigations, merchant communications, proactive outreach, and follow-ups that unfold over days across multiple systems. Most AI support tools only touch the surface; they don’t coordinate or close the loop. That gap is exactly where my product instincts kick in—and why this story matters.
I recently listened to a conversation with Jack Taylor (Product Engineer) and Ibrahim Faruqi (AI Engineer) from Gradient Labs, an AI-native startup building agents that automate the full scope of customer support in fintech. Their approach resonated with the challenges I see every day in customer support automation: fragmented workflows, regulatory complexity, and the need for human-in-the-loop moments. Gradient Labs has architected a platform with three coordinating agents—"inbound, back office, and outbound"—all built on a shared foundation of "natural language procedures, modular skills, and configurable guardrails."
What impressed me most was how they "Let non-technical subject matter experts define agent behavior through natural language procedures—no coding required." That’s a powerful way to remove engineering bottlenecks, accelerate iteration, and keep the domain experts—those closest to fraud, disputes, and compliance—directly in control. In my experience, this design choice alone can compress lead times from weeks to hours and aligns perfectly with continuous discovery and eval-driven development.
At the heart of their platform is orchestration. They "Architected a state machine orchestrator that manages turns, triggers, and skill selection across long-running conversations." That "turn" architecture is built for the messy reality of async, multi-day support. They treat "Skills as modular agent capabilities—and how they're scoped deterministically per turn," ensuring the system stays predictable and auditable. They also confront a nuanced challenge most teams dodge: "Defining "done" for outbound agents when the customer isn't the one ending the conversation." That’s where deterministic criteria, timers, and clearly scoped outcomes matter as much as the model beneath.
Compliance is not an afterthought—it’s baked into the core. Gradient Labs "Built guardrails as binary classifiers with eval pipelines, tuning for high recall on critical regulatory checks." In regulated domains, optimizing for recall on high-stakes checks is the right call; you can tolerate a few extra reviews, but you can’t miss a potential fraud signal. More broadly, they frame "Guardrails as classification problems: balancing recall and precision for regulatory compliance." That mindset is exactly how I like to merge AI risk management with product velocity.
Crucially, they avoid the trap of fully autonomous optimism. "Ask a Human: a tool call that brings humans into the loop for approvals or missing APIs" gives the system a safety valve for novel or high-risk cases. I also appreciated the explicit "Ask A Human Tool" pattern, which cleanly integrates approvals, policy exceptions, or data gaps without derailing the workflow.
Quality doesn’t happen by accident. They "Designed an auto-eval system that samples conversations for human review to catch edge cases and build labeled datasets" and built "Auto-eval pipelines that flag conversations for manual review and feed labeled datasets." That closed-loop evaluation flow is the backbone of sustainable performance in agentic AI. Combine this with targeted instrumentation—think CSAT, first contact resolution, deflection rate, time to resolution, and escalation rate—and you get a real Agent Analytics discipline, not just logs and dashboards.
The "iceberg" metaphor is more than a catchy visual. It’s a blueprint for scoping multi-agent platforms that work across the entire customer journey. With "inbound, back office, and outbound" agents coordinating on complex tasks like fraud disputes, the system can transition cleanly from intake to investigation to resolution—without dropping context or asking customers to repeat themselves. This is what genuine customer support automation looks like when it’s grounded in real operations.
Under the hood, the team leans into robust design choices that matter at scale: the "Complexities of Natural Language Input" are managed with explicit state and skill scoping, "Deterministic Skill Execution" reduces flakiness, and "Customer-Specific Guardrails" ensure compliance remains aligned to each client’s policies. Add their focus on "APIs and Customer Tools Integration" and the result is a platform that can actually take action—not just answer questions.
If you’re building in this space, here’s how I’d apply these lessons. Start by mapping the iceberg: enumerate back-office steps, approvals, and SLAs that follow the initial customer touchpoint. Capture those steps as "natural language procedures" owned by SMEs. Implement a "state machine orchestrator" to manage "turns, triggers, and skill selection" across multi-day workflows. Treat "guardrails as classification problems" and tune for high recall on high-stakes checks. Introduce "Ask a Human" early to handle missing APIs or policy exceptions. Finally, operationalize learning with "auto-eval pipelines" and tight, eval-driven development loops. That’s how multi-agent platforms deliver measurable outcomes in fintech support.
If you want to hear the full conversation, you can listen on Spotify or Apple Podcasts. You’ll also hear a nod to the "Incident.io episode – Referenced in the conversation," and a thoughtful take on the "Future of Multi-Agent Systems."
In short: this is a shift from simple Q&A bots to agents that can coordinate, comply, and complete. It’s the kind of multi-agent platform work that moves the needle for customer support in fintech—and a compelling template for any product leader scaling agentic AI and AI workflows beyond the tip of the iceberg.
AI isn’t a side quest for product managers anymore—it’s the skill stack that will define how we discover problems, prototype solutions, and ship value in 2026. Over the last few cycles, I’ve watched teams that embrace AI Strategy outperform on speed, signal, and stakeholder confidence. This roadmap is the approach I use to build capability in a structured, outcome-driven way—so we ship smarter, faster, and more impact-driven products.
"AI for PMs in 2026: why it matters, what to learn, and a 12-month AI roadmap to master product skills and ship smarter, faster, impact-driven products."
Here’s how I frame what to learn and why: focus on enduring capabilities first (problem discovery, experimentation, ethics), then layer the AI product toolbox (LLMs for product managers, retrieval-first pipeline patterns, AI workflows), and finally operationalize with outcomes vs output OKRs. The goal isn’t to sprinkle gen ai on everything—it’s to make better decisions, reduce cycle time, and unlock product-led growth in measurable ways.
Months 1–3: Foundations. I build literacy around model behavior and constraints, context window management, and prompting patterns. I pair this with data governance and privacy-by-design basics so we avoid rework later. Practically, I assemble an AI product toolbox (evaluation checklists, prompt libraries, retrieval-first pipeline templates) and apply them to product discovery—summarizing research, clustering feedback, and sharpening value propositions without losing critical nuance.
Months 4–6: Prototyping and evaluation. This is where ideas become testable artifacts. I use gen ai for product prototyping to create UX mocks, PRDs, and in-app guides rapidly, then validate with eval-driven development. I run lean experiments (A/B testing with a clear minimum detectable effect), wire up analytics to Amplitude, and track activation and retention signals. The mantra: instrument early, measure causally, and iterate based on evidence.
Months 7–9: Shipping AI-enabled workflows. I partner with product trios to integrate AI into real user journeys—customer support ai strategy, CRM integration, and guided onboarding are common wins. We explore agentic AI for complex multi-step tasks, add safeguards for AI risk management, and pressure-test systems with threat detection and response playbooks. As features reach production, we monitor deployment frequency and tighten feedback loops to protect quality while accelerating learning.
Months 10–12: Scale and governance. I operationalize what works with product roadmapping and sprint planning aligned to outcomes vs output OKRs. We codify playbooks for continuous discovery, define eval gates for new AI features, and unify analytics so teams can compare lift apples-to-apples. Stakeholder management matures into clear narratives: what shipped, what moved, what’s next—so leadership sees compounding value, not just activity.
Throughout the year, I keep the focus on real users and real metrics: fewer hops from insight to iteration, tighter loops between problem and prototype, and crisper communication around trade-offs. The result is a team that can translate AI capabilities into differentiated product experiences—reliably and responsibly. If you follow this path, you’ll enter 2026 with the confidence to lead, the systems to scale, and the evidence to prove it.
"The best AI products improve more through context engineering than prompt tinkering." I’ve seen this play out repeatedly in high-stakes, enterprise use cases: substantive gains come from how we curate, structure, and deliver context to models—not from wordsmithing. When we started treating context as a product surface, performance climbed, hallucinations dropped, and teams shipped with more confidence.
Here are four key decisions we made to improve our AI context.
First, we moved to a retrieval-first pipeline. We unified trusted sources—CRM records, support knowledge bases, product telemetry, and governance-approved docs—behind hybrid retrieval (semantic + keyword) with strong metadata ranking. This let us constrain generations to verifiable facts, apply privacy-by-design rules at the edge, and practice disciplined context window management so every token carried its weight. Freshness policies, source-level confidence scores, and lightweight schemas kept the system precise and auditable.
Second, we made eval-driven development non-negotiable. Every change to context assembly goes through offline evals and online A/B testing with clear acceptance thresholds (e.g., task success, groundedness, time-to-first-answer, and deflection rate). We sized tests with minimum detectable effect (MDE) and tied them to outcomes vs output OKRs so we weren’t just shipping more prompts—we were shipping measurable improvements that mattered to customers.
Third, we personalized context based on intent and role. We built AI workflows that detect user intent, segment by persona, and dynamically assemble context: recent account activity for customer success, policy-safe excerpts for finance, and fine-grained reasoning chains for product teams. For conversational and voice AI agent experiences, we combined short-term conversation memory with scoped, long-term account memory to preserve relevance without bloating the prompt. This agentic AI pattern ensured faster, safer, and more helpful responses.
Fourth, we operationalized context as a first-class platform capability. We invested in data governance (ownership, lineage, and redaction), instrumentation (Amplitude analytics for usage, retrieval hit rates, and failure modes), and CI/CD guardrails for context updates. Product trios partnered with SRE to monitor drift, while side-by-side comparisons and human-in-the-loop reviews turned frontline feedback into structured improvements. The result: a durable system that improves continuously instead of relying on one-off prompt tweaks.
Context engineering isn’t glamorous, but it compounds. By prioritizing retrieval-first design, rigorous evaluation, intent-aware assembly, and operational excellence, we transformed our AI features into dependable, enterprise-ready capabilities. If you’re serious about LLMs for product managers and sustainable AI Strategy, shift your energy from clever prompts to robust context—and watch adoption and trust follow.
Inspired by this post on Amplitude – Perspectives.
When I think about the roles that truly move the needle on AI Strategy and product outcomes, the Staff AI Engineer stands out. This is the person who can translate research into repeatable AI workflows, partner with product to solve real user problems, and operationalize models in a way that scales. It’s where innovation meets accountability—and where product management leadership meets hands-on engineering craft.
Ram Soma is a Staff AI Engineer at Amplitude, leading various AI initiatives across the company. He has a background in data science and machine learning engineering.
What does that look like in practice from my seat? It starts with precise problem framing and measurable success criteria. I align with a Staff AI Engineer on eval-driven development and instrumentation so we can track impact from prototype to production. With Amplitude analytics operating as a unified analytics platform, we can quantify user activation, retention analysis, and feature adoption, then iterate through continuous discovery with tight feedback loops.
Execution quality hinges on robust experimentation. Together, we design A/B testing plans with minimum detectable effect (MDE) targets, isolate confounding variables, and build evaluation harnesses that reflect real-world UX constraints. We also agree on rollout strategies—staged deployments, guardrails, and observability—so we can learn safely while preserving customer trust and performance SLAs.
On the technical approach, I look for pragmatic architectures that balance speed and reliability: a retrieval-first pipeline for grounding, judicious use of LLMs for product managers to instrument prompts and policies, and agentic AI patterns only when task decomposition truly reduces complexity. Just as important are privacy-by-design and data governance practices from day one, because responsible innovation beats retrofitting controls after the fact.
Finally, the magic happens in empowered product teams and product trios. When product, design, and Staff AI Engineering operate with shared context and clear constraints, we compress decision cycles and ship value faster. That’s how AI initiatives evolve from demos to durable capabilities—and how we enable product-led growth with measurable results that customers feel, not just features they see.
Inspired by this post on Amplitude – Perspectives.
I’ve spent years trying to bottle the judgment of a great product analyst and pour it into our AI workflows. The hardest part isn’t access to data; it’s encoding the nuance of analytical reasoning. That’s why Amplitude’s approach resonated with me—turning expert analysis into a repeatable, stepwise process AI can run with discipline and speed.
Learn how Amplitude turned its data analysis expertise into a structured, iterative process that AI can execute in moments.
In practical terms, I translate that one line into an operating model: define the decision, formalize the metrics, map the data, decompose the questions, iterate on evidence, and converge on a recommendation with clear trade-offs. This is the backbone of agentic AI for product managers—giving an LLM not just data, but a procedure that mirrors how our best analysts think.
Here’s the analyst-to-AI loop I use. First, frame the business question in decision language (what will we do differently?). Second, anchor on success metrics and guardrails, including statistical sensitivity and minimum detectable effect (MDE). Third, locate trusted sources—your unified analytics platform, experiment logs, and product instrumentation—so the AI never guesses. Fourth, generate hypotheses and segment the data (cohorts, channels, plans, geos), prioritizing signal over noise. Finally, synthesize findings into options with expected impact, risks, and next steps.
To operationalize this, I build a retrieval-first pipeline that binds Amplitude analytics to structured prompts and function calls. The AI receives exact metric definitions, event taxonomies, and governance rules, then returns a predictable schema—headlines, evidence, segments, caveats, and recommended actions. That combination of clear constraints and consistent output makes eval-driven development possible: I can test prompts and tooling against a gold set of analyses and steadily improve quality.
Consider retention analysis on a new onboarding flow. I’ll ask the system to pull activation rate, time-to-value, and day-7 retention from Amplitude, then compare cohorts by channel and plan. The AI proposes hypotheses (e.g., tooltip engagement correlates with activation), runs segmentation to validate them, and lays out product-led growth levers—like simplifying the first-run checklist or moving guidance in-app. What used to take hours of manual slicing now becomes an iterative loop that lets me spend more time on prioritization and less on tab wrangling.
Of course, speed without rigor is a trap. I guard against metric drift and hallucinations with strong definitions, lineage checks, and human-in-the-loop approvals for consequential decisions. I also log analysis steps and outcomes so we can audit reasoning, catch regressions, and keep AI grounded in our true north metrics—not just what’s easy to compute.
The big unlock isn’t a clever prompt; it’s codifying the analyst’s craft. When we treat analysis as a structured, iterative process, AI can execute it with consistency, and product teams can move faster with more confidence. If you’re building AI workflows for product insight, start by formalizing your analyst loop, connect it to your Amplitude analytics, and evaluate continuously. The result is smarter, faster decisions—and a repeatable path from raw data to action.
Inspired by this post on Amplitude – Best Practices.
Capacity planning has always been a high-stakes exercise in customer service, and when you miss, the signal shows up fast in backlogs and SLAs. I’ve lived that pressure across multiple cycles, and 2026 will reward teams that plan differently.
AI fundamentally changes capacity planning because it changes the work. It resolves the bulk of your volume, speeds up execution, and elevates the complexity and value of what humans handle. The consequence is simple: planning models must evolve.
This is the final installment in my 2026 customer service planning series, and I’m focusing on the tension every leader feels right now—be ambitious about automation, but avoid the trap of understaffing if your assumptions don’t hold.
My goal is to share how AI changes the logic of capacity planning, what I’ve learned implementing these practices with my team and with customers, and the common traps to avoid.
Traditional planning rests on relatively stable assumptions: volume grows predictably, work types stay consistent, handle times don’t swing dramatically, and productivity improves slowly with better tools and training. In an AI-first model, none of that is guaranteed, and the fundamentals flip.
The mix of work changes as AI absorbs a growing share of simpler conversations, leaving humans with deeper, more time-consuming issues that demand human-to-human connection. Demand can actually increase when you remove friction, so AI can both resolve more and attract more volume. Human time splits differently as teammates solve customer problems and also review AI behavior, give feedback, improve content, and support system-level work. Performance becomes dynamic, not fixed—automation rate isn’t a one-time number; it can rise with care and fall with neglect.
If you plan for 2026 using a pre-AI model—assuming similar productivity, similar work mix, and a linear relationship between volume and headcount—you will underestimate what it now takes to run a high-performing support organization.
There are many metrics you can track, but the one to put at the center is automation rate (AI Agent involvement rate × AI Agent resolution rate). This single construct tells me what share of total volume AI actually resolves, how much work remains for humans, how much additional demand humans can absorb, and how ambitious I can be with headcount.
Early in the journey, I prioritize raising involvement—getting the AI involved in more conversations. Once involvement is high, I shift to resolution on the hardest remaining work, where each additional 1% of automation can represent several people’s worth of capacity.
In my 2026 plans, automation rate sits alongside projected inbound volume, average “output” per person for the more complex work that remains, and occupancy—how much time is allocated to customer-facing interactions versus operational and strategic work. Together, those inputs give a realistic picture of how many people you need and where they should spend their time.
First, plan boldly on automation, but match it with investment. I do not cap automation assumptions at 40–50% “because AI is new.” Many teams are already modeling 60%, 70%, even 80%+ for 2026—when they invest in AI ownership and content. The investment is non-negotiable: named ownership for AI performance (AI ops, knowledge management, conversation design), clear automation targets by work type (e.g., informational vs. personalized vs. actions vs. deep troubleshooting), realistic expectations for what’s easy to automate and what’s not, and a concrete plan to raise automation over time in monthly or quarterly steps rather than a single jump.
To decide where to invest first, I dig into the data. I start with the biggest volume drivers, separate content-led issues from those dependent on data or complex procedures, assume higher resolution potential for content-led topics once the knowledge base is in shape, and set more modest initial resolution expectations for system-dependent flows. Then I stair-step improvements as the systems, data contracts, and workflows mature.
In short, bold automation goals only work when paired with the team structure, content, and systems required to reach them—and the discipline to iterate.
Second, expect human “output” per person to go down. That’s a mindset shift. Historically, we assumed individual productivity would stay flat or tick up as tools improved. In an AI-first model, humans handle fewer conversations but more complex, cross-functional issues—and create more value despite lower case counts.
I model a lower “cases closed per person” than prior-year baselines, explicitly assume the remaining work is more complex and time-consuming, and redefine productivity to include system-level work like AI Agent improvements, content updates, and policy or workflow change management. I also report “capacity created” from automation alongside human outputs, so leadership sees the full picture.
Third, rethink occupancy: more time off the queues, on higher-value work. Traditional occupancy splits time between inbox and training, meetings, and breaks. Now there’s an expanding “out-of-inbox” portfolio that directly affects AI performance and overall capacity: reviewing AI-handled conversations, improving AI Agent triaging and handovers, contributing to content and procedures, feeding insights to product and engineering, and supporting system changes that reduce future volume.
I set lower inbox occupancy targets than before and make the rationale explicit. People aren’t working less—they’re working differently. In planning, I assume more time spent on improvement and system work, make it visible (for example, X% in inbox and Y% on AI and system improvement), and treat this as critical, not a “nice to have.” If you don’t proactively allocate it, it won’t happen—and your automation and performance targets will suffer.
Fourth, work with the finance team early, and treat your plan as a set of assumptions. Capacity planning with AI is a set of bets across automation rate, human output, demand growth, occupancy, and where surplus capacity (if any) goes. I bring finance in early, show that the plan is dynamic and directly tied to AI performance, and label every lever as an assumption with ranges.
I commit to a quarterly review cadence with finance to compare assumptions versus reality and adjust headcount, targets, and investment as needed. The risks are real: if automation grows slower than expected and you stop backfilling too early, you’ll be understaffed for months. Hiring and onboarding take time, so course-correcting late creates strain. If you do produce surplus capacity, have a clear strategy to reallocate those teammates to higher-value work—improving systems, feeding insights back to product, supporting new channels, and driving proactive CX—rather than defaulting to reductions.
I also set explicit guardrails—if automation rate misses by five points for two consecutive months, we pause planned reductions and revisit hiring gates. If it over-performs, we shift people into backlog eradication, content upgrades, or proactive outreach, so we bank compounding value.
To set your team up for success in 2026, anchor your plan on automation rate, be honest that humans will handle fewer but harder conversations, and protect time for system improvements. Partner early and often with finance, avoid shrinking too fast, and design a plan for surplus capacity so you’re never caught flat-footed.
If AI is going to handle the majority of your customer conversations, your plan has to be designed to help it do that well and to keep your team set up for meaningful, sustainable work. A 2026 plan built on adaptable assumptions—not fixed predictions—will hold up as your work, your systems, and your customers’ expectations continue to change.
If you’d like future editions like this, subscribe and stay close—I’ll keep sharing what’s working, what isn’t, and how to tune your customer support AI strategy in real time.
Once I’ve defined the right roles on my team, the next move is to design an operating model that makes progress a habit. My goal is simple: every interaction should strengthen the system so the AI Agent keeps improving over time.
I anchor the team on a mantra that has never failed me: “The first time you answer a question should be the last.” That single statement reframes support as a compounding system rather than a one-off activity.
The ambition is to ensure every resolution makes the next one faster and more accurate, so fewer issues repeat, quality compounds, and support scales naturally. That doesn’t happen by accident—it requires intentional design.
In practice, this comes down to four essentials: clear ownership of performance, guardrails that make iteration fast and safe, feedback loops that turn learning into routine upgrades, and a culture that celebrates the work of improvement—not just the outcomes. Here’s how I put that into play.
First, I start with clear ownership. Ambiguity is one of the most common reasons AI performance plateaus. When no one truly owns how the AI Agent performs, feedback gets lost, issues linger, and improvements stall.
On high-performing teams, I assign a single owner—often an AI ops lead—responsible for making the AI Agent better. They review resolution trends to spot underperformance, make targeted updates to content, configuration, and behavior, coordinate with product and engineering on systemic blockers, and set improvement priorities, targets, and timelines. The title matters less than the mandate; what matters is clear authority to drive change across teams.
Real-world example: At Dotdigital, AI performance plateaued after a strong start—resolving around 2,800 conversations per month for three consecutive months. To drive resolution rates up, the team created a dedicated support operations specialist role, filled by an experienced agent with deep product knowledge. This person will focus on refining snippets, improving content, and enhancing the AI’s resolution capabilities.
Second, I make iteration fast and safe. As the AI Agent takes on more volume and complexity, change can start to feel risky—so teams hesitate, and performance stalls. Lightweight governance fixes that by making the path from insight to action predictable.
I keep the rules simple and explicit: which changes need review (and which don’t), who the decision-makers are, how we test updates before they go live, where feedback flows so it’s seen and acted on, and when progress gets reviewed on a steady cadence. Governance isn’t bureaucracy—it’s what keeps improvement routine and safe.
Real-world example: Anthropic ran a focused “Fin hackathon” sprint to improve their AI Agent’s resolution rate. The team audited unresolved queries, identified underperforming topics, and created or updated content to close gaps. They converted frequently used macros into AI-usable snippets, monitored Fin’s performance during live support, and continuously refined content based on real interactions. This structured approach enabled rapid improvement while maintaining quality standards.
Third, I build a system that learns by default. AI performance isn’t static, but many organizations treat it like a one-time implementation. The most successful teams operationalize learning: they analyze where the AI Agent struggles and feed those insights directly into structured improvements.
The signals are straightforward: review common handoffs to humans, track unresolved queries by topic or intent, measure resolution rate trends over time, and use those inputs to prioritize fixes and content upgrades. Whether you follow a formal loop like the Fin Flywheel framework or something lighter, the goal is the same—make improvement inevitable.
Fourth, I treat content as competitive infrastructure. Your AI Agent is only as good as what it knows. As George Dilthey, Head of Support at Clay, put it: “That’s when we realized: AI doesn’t just come up with information out of nowhere, you have to feed it. We were spending all our time evaluating tools when we should’ve been focused on content.”
I operationalize knowledge like infrastructure: every topic has a clear owner, content is structured, versioned, and ingestion-ready, new products ship with source-of-truth content by default, and changes ship on a schedule—not when someone finds time. This is the backbone that differentiates teams who scale confidently from those who stall out.
In my organization, we’ve evolved our New Product Introduction (NPI) process by aligning early with R&D on a single, canonical source of truth that becomes the foundation for all downstream content—including what the AI Agent uses to resolve queries. By embedding content creation into launch readiness, not as an afterthought, we’ve consistently hit 50%+ resolution rates on new features from day one.
Finally, I make belief visible. Even the best system will stagnate if people stop believing in it. Belief can fade quietly unless you reinforce it on purpose. I keep it strong by sharing specific wins regularly, highlighting improvements with metrics, and recognizing the people behind the gains—then giving them space to lead. This isn’t just about morale; it keeps everyone aligned on the bigger play.
When you put it all together—clear ownership, safe iteration, a learning system by default, and content as infrastructure—AI performance compounds. As the AI Agent gets better, the entire support model becomes faster, more reliable, and truly scalable. That’s the foundation of a modern, AI-first support organization.
Next, I’ll take this a level deeper and share how capacity planning changes when AI handles the majority of inbound volume and your team shifts into higher-value roles. If scaling with confidence is the goal, this is where the operating model pays off.
Support teams in Spain just got the clearest signal yet that the old way of doing things won’t cut it anymore. As I look at the details, I see more than a regulatory hurdle—I see a blueprint for the modernization many of us have been pushing toward for years.
The signal arrives in the form of one of the most ambitious customer service regulations in Europe—a law designed to strengthen consumer protections and set clear expectations for fair, transparent, and personalized customer service. Among its measures: new protections against spam calls, stronger transparency requirements, safeguards around personalized interactions, and measurable standards for speed, accessibility, and complaint handling within customer support.
It’s a significant shift, especially for large enterprises and essential-service providers. While the initial reaction might be anxiety about audits and penalties, the larger opportunity is hard to ignore: this law compels us to build modern, resilient support operations that scale, perform, and earn trust.
Spain is often an early mover in consumer-protection regulation, and this shift could signal what future standards across the EU might look like. For EMEA leaders, this is a moment to reevaluate operating models, invest in automation thoughtfully, and ensure customer experience improvements directly support regulatory compliance.
Below, I break down what the law requires, what it means in practice, and how AI Agents like Fin can help teams meet regulatory expectations while delivering faster, more personal support at scale.
The law applies in full to providers of regulated services, including water, energy, passenger transport, postal services, pay-audiovisual media, and electronic communications, and also to any company (or group) that meets certain size and turnover thresholds, even if their core business falls outside those sectors.
Large companies (those with more than 250 employees and over €50 million in turnover) also hold additional obligations, particularly around multilingual support in Spain’s co-official language regions.
While the law is still moving through its final approval stages, the direction is clear: a broad set of obligations will apply to reinforce consumer rights, ensuring they can: Reach support quickly. Speak to a human when needed. Get clear information during outages or service disruptions. Have complaints handled promptly and on time.
1. 95% of support calls must be answered within three minutes
This raises the bar significantly for responsiveness, especially during spikes, outages, billing cycles, or seasonal surges. Most support systems are not built for this level of agility. In my experience, you can’t hire your way to this metric sustainably—you have to design for it.
2. Customers must be able to speak to a human on request
Automation is allowed, but it cannot be the only option. At any point during a call, a customer must be able to transfer to a human if they ask for one. Companies cannot trap customers in automated loops. The practical implication: every workflow needs a reliable, audited escape hatch to a person.
3. Support lines must be free of charge
Premium-rate numbers are prohibited. Customer service cannot generate revenue for the business, nor may it be used to upsell products. This cleanly separates service from sales and reduces consumer friction.
4. Essential services must offer 24/7 support for continuity issues
Electricity, water, gas, telecoms, and transport providers must always be reachable at all hours when customers need to report service interruptions. That means coverage, triage, and routing must be always-on.
5. Complaints must be resolved within 15 days – or within five days for undue charges
This halves the previous general complaint window of 30 days and adds a much faster path for billing-error complaints. Companies must maintain records, assign tracking numbers, and ensure timely follow-up. Your case management discipline will make or break this requirement.
6. No spam calls or unwanted commercial pressure
Companies must identify business calls with a designated prefix, and customer -service calls with a different one. Telecom operators will be required to block calls that do not use these codes. Additionally, contracts obtained via unsolicited calls will be legally null and void, protecting consumers from being pressured into commitments they never intended to make.
7. Companies must maintain a unified complaint-tracking system
All complaints, claims, and incidents must be recorded in a centralized system to ensure traceability. If your data is fragmented across tools, this is a call to centralize and standardize intake.
8. Companies must pass annual external audits
These audits assess whether customer service processes are meeting the required standards. In practice, that means consistent processes, measurable outcomes, and reliable evidence.
9. Better linguistic and accessibility rights
Large companies operating in regions with co-official languages must be able to provide support in those languages. They must also ensure their customer service is accessible for vulnerable consumers, such as those with disabilities or older adults. Multilingual and accessible by design is the new default.
10. Fairer contract renewals
Companies must provide customers with 15 days’ notice prior to automatic renewal of online subscriptions and make cancellation simple. This is both a compliance and customer trust win.
Most support systems weren’t built for this level of speed or operational rigor. But the steps required to comply are the same ones that make service better for customers—and better for the teams delivering it. That’s why I view AI as an essential capability, not a bolt-on.
With the regulatory expectations clear, the question becomes: what does a modern, compliant support operation look like? For me, it blends human empathy with intelligent automation, proving auditability without sacrificing experience.
This is where AI plays a meaningful role. Not as a replacement for humans, but as a reliable front line that can handle a wide range of queries, including the most complex ones that require real depth, while keeping queues under control.
Adopting an AI Agent like Fin helps teams build a support model that meets regulatory expectations and improves customer experience across all your channels. Here’s how.
Many organizations will struggle to meet the three-minute standard during normal times, let alone during spikes or busy seasons, without unsustainably scaling their teams. Fin can help by reducing the number of calls that reach your phone lines and Fin Voice will ensure the ones that do are handled quickly.
Reducing avoidable call volume before it reaches the queue
Many of the queries teams receive are predictable: outage updates, billing questions, account changes, and other repeatable issues. Fin can resolve these instantly across several channels, including live chat, SMS, email, and WhatsApp, using the content and processes your team already maintains. I’ve seen this alone cut peak-time pressure dramatically.
Answering the phone immediately
For customers who do call, Fin Voice can pick up straight away. It provides natural, conversational responses based on your existing knowledge and helps your team stay responsive during busy periods.
Making it easy to reach a human easier during spikes
When queues build up, Fin can capture the reason for the call, gather details, and prioritize the most urgent issues. If you offer callback options, Fin can help schedule them quickly so customers avoid long wait times, which is key for staying compliant during peak periods.
The law requires customers to reach a real person whenever they request one. Fin supports this by keeping the path to a human clear and dependable: every interaction includes an option to speak to a person, and that option is accessible until the issue is resolved; when chosen, Fin hands over full context so human teams don’t start from scratch; if you show team availability or wait times, Fin can surface that information for customers; escalations can be prioritized to ensure faster pickup; alerts can notify on-call staff when urgent issues arise. On the phone, Fin Voice follows the same principle. Callers can request a transfer at any moment, and Fin routes the call to the right team with context intact.
Essential-service providers must be reachable at any hour when customers need to report service interruptions. Fin can help you meet this requirement without building a full overnight staffing model.
Always-on answers and triage
Fin provides first-line support at any hour of the day or night. Fin Voice brings this capability to the phone, giving callers immediate help even when your human team is offline. Fin can also direct customers to the latest updates you’ve published, such as outage information or status pages.
Routing urgent issues to the right people
When an issue requires human judgment, Fin gathers the necessary details and routes it to the appropriate on-call team using your existing after-hours processes. Teams can set up notifications so urgent issues are seen quickly.
Proactively surface what matters most
With AI Insights, Fin can also monitor for emerging patterns in customer conversations through Trending Topics. This means that if there’s a sudden spike in reports about a specific outage or a recurring question about a new process, Fin can flag these trends in real time. Your team is alerted to what’s top-of-mind for customers, so you can prioritize updates, publish targeted FAQs, or escalate critical issues, ensuring your support stays relevant and responsive, even overnight.
Complaints and outages often create the biggest spikes in volume, and the new law increases pressure to respond quickly, keep customers informed, and maintain complete records. This is exactly where structured AI intake adds value.
A more structured complaint intake
Fin can recognize when a customer is lodging a complaint, gather required information, and initiate a record in your existing system with a clear ID assigned from the outset.
Clear ownership and deadline alignment
Your team can then use your case-management tools to apply the 15-day resolution timeline (or five says for undue charges). Fin’s structured intake helps ensure that ownership and next steps are visible, rather than buried in unstructured notes.
Faster, more consistent outage communications
During service interruptions, Fin can share the latest published information, provide estimated fix times when available, and direct customers to live updates. On the phone, Fin Voice can triage incident-related calls quickly so callers aren’t waiting for a human agent just to receive basic information.
While multilingual support is only mandatory for large companies operating in co-official language regions, it remains essential for meeting consumer expectations. Fin helps by supporting multilingual, natural language interactions across voice and other channels; operating within channels that support accessibility features, like channels compatible with screen readers or commonly used messaging apps; and offering “request a call” paths and collecting the necessary information up front so teams can follow up quickly for customers who prefer phone support.
The law prohibits customer service interactions from generating additional revenue or being used to offer new products. With Guidance, you can set Fin up to stay firmly within these boundaries by shaping how it responds, which topics it should avoid, and what it should prioritize when a customer is seeking help or lodging a complaint.
The law raises expectations around documentation and audit readiness. Fin helps by making customer interactions more structured and consistent: when a conversation involves a complaint, Fin can ensure the required information is captured and a clear ID assigned; that ID can follow the interaction so it remains easy to trace; consistent intake gives you better visibility into key metrics regulators care about, like response times, time to first human contact, escalation volume, and whether complaints are resolved within required timelines; transcripts, summaries, and metadata can be retained until cases are resolved, supporting audit requirements; many organizations maintain internal compliance playbooks outlining processes and owners. Fin’s structured intake helps keep these practices reliable; leverage Insights to identify trending topics, optimize processes and measure service quality.
Spain’s new customer service law raises the bar on speed, access, and accountability. It’s natural to worry about how your team will cope, especially if your support operation has grown organically across tools and regions. I’ve seen how quickly burnout and chaos can set in when expectations rise faster than capacity.
The reality is that meeting these expectations through people alone would put unsustainable pressure on already stretched support teams. The risk of burnout and operational chaos is real, which is why an AI Agent like Fin can bring welcome relief.
By handling everything from high-volume, repetitive questions to many of the deeper, more involved issues customers raise, Fin keeps queues manageable and prevents the strain from falling entirely on your human team, helping everyone stay above water as expectations rise.
For companies operating across the EU, adapting early to Spain’s stricter expectations can build resilience for whatever comes next—whether that ends up being driven by regulation or customer demand. Now is the time to align compliance, AI strategy, and customer experience into a single, measurable operating model.
I build products on the belief that trust is earned in every design decision and every deployment. Trust has always been a first principle at Intercom, from our early investments in security and privacy to the globally recognized certifications that shape our approach today.
As AI becomes more deeply embedded in customer-facing work, it’s essential that businesses can rely on systems that are safe, reliable, and governed to the highest standards. That’s why we’re proud to share that Intercom is now AIUC-1 certified, becoming one of the first companies to meet the world’s first standard designed specifically for AI Agents. For leaders navigating AI Strategy and AI risk management, this is more than a badge—it’s a measurable leap forward in governance and operational rigor.
AIUC-1 is the first certification tailored to the unique risks and challenges of AI Agents. It complements broader AI governance frameworks like ISO 42001 by focusing on enterprise-specific concerns like security, customer safety, system reliability, data and privacy, society, and accountability. In practice, this alignment helps us translate policy into deployable safeguards across cybersecurity, data governance, and regulatory compliance.
To achieve certification, organizations undergo independent third-party audits and quarterly adversarial testing across more than a thousand enterprise risk scenarios. This continuous technical evaluation ensures that AI systems remain robust against fast-evolving threats and that safeguards keep pace with rapid progress in the field. As a product leader, I welcome this level of scrutiny—it’s how we operationalize threat detection and response and make agentic AI dependable at scale.
AIUC-1 itself evolves every quarter, incorporating new research, threat patterns, and global best practices. The standard is shaped by the AIUC-1 Consortium, launched in November with more than 50 founding members who collectively handle tens of trillions of dollars in payments and serve over a billion people daily. Intercom is proud not only to be certified, but to be recognized as a founding technical contributor helping shape the development of the standard. That continuous, community-driven iteration mirrors how we build—measure, learn, and harden—so our customers benefit from real-world, enterprise-ready AI.
Intercom has decades of combined experience in security, compliance, and trust, and we’ve consistently demonstrated that robust governance and fast innovation can coexist. Achieving AIUC-1 certification reinforces that the same rigor we apply across our platform also extends to Fin, our AI Agent. I’ve seen first-hand how risk and procurement teams evaluate generative AI: they expect clarity, evidence, and controls. This certification delivers independent proof that our approach meets those expectations.
For our customers, this certification provides independent validation that Intercom’s AI systems are safe, resilient, and enterprise-ready. It confirms that our AI is tested regularly, built with strong safeguards, and aligned with the expectations of modern security and risk teams. It also signals our continued leadership in shaping responsible AI practices globally, ensuring our customers benefit from standards built for real-world use. In short, you can move faster with confidence—without compromising on governance.
Intercom has always approached trust as an ongoing commitment. AIUC-1 strengthens the foundation we’ve built across other frameworks and certifications, including SOC 2, ISO 27001, ISO 27701, ISO 27018, HIPAA, HDS, and ISO 42001. Together, these certifications create a comprehensive control fabric across privacy, security, and reliability—critical pillars for any enterprise deploying gen AI into production workflows.
As AI technology accelerates, we will continue to evolve our safeguards, deepen our governance practices, and contribute to the standards that shape responsible AI. Our promise is simple: to build AI that is not only powerful and efficient, but safe, transparent, and deserving of the trust our customers place in us. That’s how we turn innovation into durable value.
You can learn more about our certifications and access our security and compliance documentation through the Intercom Trust Center.
Get started with Fin and see how an AIUC-1 certified, enterprise-ready AI Agent can elevate your customer experience with confidence.
I love real-world AI that ships, scales, and actually solves painful customer problems. This story checks every box. As a product leader who has brought agentic AI to production environments, I was captivated by how a small, focused team at Perk took a no-code voice AI prototype and turned it into a system that reliably makes 10,000+ calls per week to prevent failed hotel payments.
What happens when you combine a real customer problem, a no-code prototype, and a team willing to listen to every single call?
Steven Payne (Product Manager), Gabriel Stock (Senior Engineering Manager), and Philipe Steiff (Senior Software Engineer) from Perk share how they built a voice AI agent that calls hotels to verify virtual credit card payments, preventing travelers from arriving to find their rooms unpaid. This is a textbook example of linking operational pain to a high-leverage AI solution.
What started as a hackathon experiment in Make.com became a production system handling over 10,000 calls per week across multiple languages. Along the way, the team learned hard lessons about prompt engineering for voice (numbers, pronunciation, and a very "Karen-like" first version), how to break a single monolithic prompt into structured conversation stages, and why listening to actual calls beats any amount of theorizing.
From a product management perspective, this approach aligns perfectly with eval-driven development and continuous discovery. Structure the problem, instrument aggressively, ship safely, then listen—deeply—to real interactions. In my own teams, I’ve seen that nothing accelerates iteration on agentic AI like closing the loop between qualitative call reviews and quantitative evals.
They built a working prototype without writing a single line of backend code.
They structured the call into discrete stages (IVR, booking confirmation, payment) to improve reliability.
They created two eval systems: one for call success classification, another for conversational behavior.
They scaled from five calls a day to tens of thousands per week while maintaining quality.
This is a detailed look at building AI for real-time human interaction—where the stakes are high and the feedback is immediate.
Guests: Steven Payne, Product Manager, Perk; Gabriel Stock, Senior Engineering Manager, Perk; Philipe Steiff, Senior Software Engineer, Perk.
What stood out to me was how Perk's team identified an AI use case by connecting prior experimentation with a real operational problem. Why they chose Make.com for prototyping—and shipped to production without touching backend code—underscores how far no-code can take you when paired with crisp problem framing. The evolution from a single prompt to structured conversation stages (IVR handling, booking confirmation, payment request) is exactly how you harden agent behavior for production.
Breaking up the agent's task dramatically improved reliability. They also built two eval systems: classification for success rates and LLM-as-judge for conversational behavior. Even with automation, the team still listens to calls manually—a practice I strongly endorse for uncovering edge cases, trust issues, and UX nuances that dashboards can’t show.
The challenge of prompt engineering for voice—numbers, booking references, and text-to-speech markup—was non-trivial. Expanding to German revealed that prompts in native language improve results. And, as often happens with operations-heavy rollouts, this project uncovered other operational problems they didn't know existed—valuable signal for the roadmap.
Resources & Links: Perk. Make.com — No-code automation platform used for the prototype. Twilio — Voice/telephony provider. Eleven Labs — Text-to-speech provider (used in early experiments).
Chapters: 00:00 Introduction to the Team; 01:54 Understanding PERK's Mission; 02:59 Challenges in Travel Booking; 07:27 AI Solutions for Customer Care; 09:52 Prototyping with AI and Voice; 17:00 Implementing AI in Production; 25:51 Learning Through Trial and Error; 26:40 Prompting Challenges and Solutions; 27:58 Iterating on Prompts and Evaluations; 30:08 Scaling and Production Challenges; 32:43 Advanced Evaluation Techniques; 35:32 Real-World Applications and Success; 49:07 Future Directions and Expansion; 53:53 Conclusion and Team Reflections.
My product takeaways: Start with clear operational pain and measurable outcomes (e.g., payment verification). Use no-code to validate quickly, then progressively harden. Treat voice AI like any production system: break it into deterministic stages, add guardrails, and measure both outcome and behavior. Pair automated evals with hands-on reviews. And when going multilingual, write prompts in the native language—your accuracy will thank you.
If you’re exploring agentic AI for operations, this is the blueprint: tight scoping, Make.com for speed, Twilio for reliability, structured prompts for control, and an eval-driven loop to scale quality with confidence.