Tag: agentic AI

  • Break the Headcount Ceiling: How AI Agents Create Net-New Pipeline at Scale

    Break the Headcount Ceiling: How AI Agents Create Net-New Pipeline at Scale

    I’ve been through enough planning cycles to know the impossible math sales leaders juggle. Every year, we’re asked to deliver more pipeline, and the expectation is that the team will somehow hit the target—whether headcount follows or not. In a good year you close some of the gap, but the underlying constraint remains: your pipeline ceiling is tied to your headcount. The ask gets bigger, but the resources rarely keep pace. There’s never been a convincing answer to “how do I grow pipeline by 30% without 30% more people?”

    For the first time in my 20-year sales career, there’s a real answer, and it comes from how we’re using our Customer Agent—internally nicknamed “Fin”—for inbound sales. What changed my perspective wasn’t faster execution on the same tasks; it was recognizing that an Agent can generate its own pipeline, consistently and at scale.

    Most conversations about AI in sales focus on efficiency—do the same work, just faster. That’s helpful but incomplete. In practice, the Agent is producing net-new, attributable pipeline. It’s not simply an efficiency layer inside the SDR team; it’s a distinct source that deserves its own targets, its own owner, and clear visibility in our pipeline analytics.

    Here’s how we run it. Fin has dedicated performance metrics but is held to the same outcomes as any rep: meetings booked, pipeline created, and revenue generated. On live chat, we track qualified, disqualified, and dropped conversations, then follow those cohorts through to opportunity and close. When you fold the Agent’s numbers into the team’s aggregate, you lose the crucial signal of what the Agent is actually doing. Reframing this with explicit attribution changes the boardroom conversation from “efficiency gains” to “a new, incremental source of pipeline.” Last month was our highest pipeline month from Fin to date—stronger than when live chat was handled by humans alone.

    The template for this transformation came from customer service. Before we operationalized AI for sales, I partnered closely with our support organization. They built the organizational architecture we’re applying today: clear ownership of the AI motion, Agents and humans running in parallel, and a continuous optimization loop that treats the Agent as a living system, not a set-and-forget tool. The workflows in support and sales are more similar than people expect—qualify the need, guide to the right solution, and move decisively toward an outcome.

    “The right benchmark is matching a high-performing rep on that channel, consistently and at scale”

    When the Agent reliably meets that benchmark, the gains compound. The team wins back time for work where relationships truly matter—multi-threading across stakeholders, tailoring value narratives, and navigating complex buying processes. That is where human judgment shines.

    The most common question I hear is what this means for SDRs. If the Agent owns the frontline, what are SDRs actually doing? The answer is: higher-leverage work. The Agent handles frontline inbound—engaging instantly, qualifying, routing high-intent prospects to the right team, and keeping lower-intent visitors warm by directing them to self-serve resources or remembering their context until they’re ready for a real conversation. It does this 24/7, across languages, without the capacity constraints that come with a human-only model.

    What changes is where SDRs’ time goes. For us, that’s phone-based qualification, where we still see the strongest conversion. It’s also deeper relationship-building across multiple stakeholders in an account—the kind of multi-threaded engagement that takes time and judgment. Trials are a great example: rather than treating a trial as a conversion mechanism, SDRs can help prospects get real value from it through guided setup and outcome-oriented check-ins.

    Minimalist hero graphic with the headline 'Add Fin to your sales team today,' a glossy 3D blue spiral at center, and a black 'Start free trial' button, promoting Fin for Sales as an AI customer agent.
    Introduce Fin for Sales to your team with this clean hero banner: bold headline, signature blue spiral, and a clear 'Start free trial' call to action—inviting readers to explore an AI customer agent built for revenue.

    “That’s work they rarely have capacity for right now, because too much of their time goes to the frontline. Fin changes that”

    I want to be direct about one thing: replacing your SDR function entirely with AI is a mistake. SDRs are the talent pipeline for closing teams. The reps who become your best AEs are, more often than not, people who came up through an SDR role. That’s where they learn to qualify and build relationships at speed. Eliminating that function to reduce cost creates fragility further up the funnel that can take years to surface.

    Across the market, many sales organizations are still early in this journey. Startups and smaller teams are ahead—they’re building AI-first motions from the ground up and deliberately designing to avoid scaling headcount in the traditional way. Larger, more established sales development functions are mostly still running standard workflows. That makes sense—transforming a mature org is harder than building anew—but complexity isn’t a reason to wait. Momentum is building, and the gap is widening between teams leaning in and those holding back.

    What’s emerging now is dedicated AI ownership within sales. It requires someone with program-level responsibility for how the Agent actually performs, rather than bolting AI tools onto an existing job description. We created that role – it’s called “AI SDR program lead.” This role owns the strategy, implementation, and optimization of Fin within the inbound SDR motion, ensuring it drives pipeline growth and integrates well across our systems and workflows. It’s a new career opportunity that came directly from the AI motion, with one of our existing managers moving into it.

    The long-held assumption that pipeline growth requires proportional headcount growth is no longer a fixed law. AI-generated pipeline is real, measurable, and improvable with the same rigor we apply to any other part of the function. Treating it as its own source—with explicit targets, attribution, and dedicated ownership—is the difference between marginal efficiency gains and truly breaking the link between pipeline growth and headcount.

    The constraint hasn’t disappeared; it has moved. It’s no longer just about how many people you can hire. It’s about how well the Agent understands your product, your customers, and your qualification logic—and how quickly your team can iterate the workflows, knowledge, and guardrails around it. For the first time, the pipeline ceiling can be higher than your headcount allows.

    If you’re standing up this motion now, start with three moves: give the Agent its own KPIs and attribution, put a single owner in charge of performance and iteration, and reorient SDR time toward high-conversion conversations and multi-threaded account development. That’s how you scale pipeline with AI Strategy and sales-led growth—without scaling headcount in lockstep.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Scale Support with Heart: How AI Makes Every Customer Interaction Faster and More Human

    Scale Support with Heart: How AI Makes Every Customer Interaction Faster and More Human

    Every day at HighLevel, I talk with support leaders who are balancing two imperatives that can feel at odds: scaling service efficiently while deepening empathy in every interaction. My product lens is simple—use AI to clear the path for humans to do what only humans can do: listen, understand, and solve nuanced problems with care.

    Discover how AI helps support teams deliver faster, more empathetic experiences. Automate the repetitive, so agents can focus on what matters: the customer.

    That principle anchors our customer support AI strategy. We deploy AI workflows that handle the heavy lift—classification, intent detection, summarization, knowledge retrieval, and next-best-action—so agentic AI can triage, resolve routine issues, and hand off the right context when a human touch is needed. The result is a queue that moves faster, with more signal and less noise, and a team freed to bring empathy and judgment to the moments that matter most.

    On the front line, a voice AI agent or chat interface deflects repetitive requests, while conversation design ensures the experience feels respectful, transparent, and helpful. Inside the console, Agent Analytics surface what leaders care about: which topics spike, where customers get stuck, how sentiment and CSAT shift, and which playbooks actually shorten time to resolution. When an agent steps in, AI-assisted replies, real-time summarization, and suggested macros reduce cognitive load—so attention goes to the customer, not the keyboard.

    Shipping these capabilities responsibly requires rigor. My playbook pairs LLMs for product managers with a retrieval-first pipeline that grounds responses in audited knowledge, backed by privacy-by-design and data governance. We use eval-driven development to measure safety and quality, and A/B testing to quantify impact before broad rollout. This isn’t just about automation; it’s about trust, reliability, and continuous discovery with real customers.

    Context is king, so CRM integration is non-negotiable. By unifying tickets, purchase history, prior conversations, and lifecycle stage, agents walk in with empathy already loaded. Whether the channel is Intercom, HubSpot, or native chat, a unified analytics platform connects signals across journeys, enabling proactive outreach, smarter product tours, and in-app guides that prevent avoidable tickets in the first place.

    The outcome is a support organization that scales without sacrificing humanity. AI handles the repetitive; people handle the relational. Teams spend less time searching and more time solving. Leaders coach with data instead of guesswork. And customers feel heard—because they are. That’s how we make human support more human, at scale.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • What’s New with Amplitude Agents: Faster Releases, Smarter Insights, and Must‑Try Upgrades

    What’s New with Amplitude Agents: Faster Releases, Smarter Insights, and Must‑Try Upgrades

    I’ve been deep in the work of turning agentic AI from a promising idea into reliable, measurable outcomes. Today, I want to share a concise, practitioner’s update on what’s new with Amplitude Agents—and, more importantly, how to get real value fast using proven product management techniques.

    We launched AI Agents a few weeks ago. We’ve been shipping pretty fast since then, so we wanted to loop you in on what’s new and what’s worth trying.

    Rapid releases only matter if they translate into user value. My approach is to treat every agent improvement as a learning opportunity: instrument it, set clear success metrics, run controlled experiments, and iterate. This eval-driven development mindset keeps us honest about what’s truly working in the wild.

    If you’re trying Amplitude Agents now, start with a narrowly scoped, high-signal workflow where success is unambiguous—think a single journey with a clear “done” state. Connect the experience to your unified analytics platform so you can see the full picture across events, funnels, and cohorts. In practice, I lean on Amplitude analytics and Agent Analytics to make this visibility effortless.

    Define how you’ll measure impact before you ship. Identify activation and completion events, baseline them, and then A/B test your agentic AI flow against the status quo. Behavioral analytics will show whether users are discovering the agent, sticking with it, and returning for more. When the story in the data is clean, it’s much easier to scale the win.

    Hardening matters as much as headlines. As you expand use, apply sensible guardrails—input validation, clear prompts, and transparent handoffs to deterministic flows when confidence is low. Pair this with observability so you can spot anomalies early and recover gracefully. These practices reduce risk while preserving the speed and creativity that make AI workflows powerful.

    Once the basics are working, dig into adoption patterns: segment by cohort, study user activation paths, and run retention analysis to find where the agent is truly changing behavior. These insights shape roadmap priorities and help you invest in the moments that drive durable value.

    We’ll keep shipping quickly and sharing practical guidance. If you have feedback, experiments to showcase, or questions about instrumentation, send them our way—I use that signal to refine our next set of improvements and learning agendas. Expect more short, focused updates and deeper dives on evaluation frameworks, prompt strategies, and rollout playbooks.

    In short: keep it scoped, instrument everything, test deliberately, and let the data guide your next move. That’s how Amplitude Agents becomes not just new, but indispensable.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • AI Agents That Truly Help Product Teams: A Practical Framework for When—and When Not—to Use Them

    AI Agents That Truly Help Product Teams: A Practical Framework for When—and When Not—to Use Them

    Every week, I field the same question from product leaders and engineers: should we deploy an AI agent here, or are we overfitting the problem to a shiny solution? Learn when AI Agents actually help product teams—plus a simple framework to decide when not to use them.

    When I say “AI agents,” I’m talking about autonomous or semi-autonomous systems that can perceive context, plan steps, and take actions across tools and data sources with minimal supervision—what many now call agentic AI. In product management terms, they’re not just another feature; they’re an operating model shift. Used well, they compound team leverage. Used poorly, they add invisible complexity, new failure modes, and governance headaches.

    To make the call with confidence, I use a straightforward VITAL framework that my team can apply in minutes. It keeps us honest about where AI agents are a force multiplier—and where a simpler automation, rule, or in-product UX is the better choice.

    V is for Volume. Agents shine where there’s sustained, repetitive, high-throughput work: triaging inbound support, cleansing CRM records, orchestrating QA checks, or synthesizing weekly research summaries. If the workflow happens rarely or ad hoc, an agent is often overhead in disguise.

    I is for Instructions. Can I specify success in clear, testable terms? Strong instructions include measurable acceptance criteria and constraints. If I can’t articulate what “good” looks like without hand-waving, the task likely needs product discovery, not autonomy.

    T is for Tolerance. What is the blast radius if the agent makes a wrong call? Low-stakes, reversible actions with tight guardrails are ideal. If the tolerance for error is near zero (e.g., irreversible financial transactions or sensitive regulatory actions), favor human-in-the-loop, stronger approvals, or defer agents entirely.

    A is for Access. The agent needs the right data, tools, and permissions, with privacy-by-design and data governance in place. If telemetry is sparse, integrations are brittle, or you can’t enforce least-privilege access, you’ll fight fragility more than you’ll gain leverage.

    L is for Learning loop. Agents require eval-driven development, Agent Analytics, and continuous feedback to stay accurate as reality shifts. If you can’t measure quality, latency, and cost per outcome—or you lack a retrieval-first pipeline to ground responses—expect drift and stakeholder distrust.

    Now, the counterweight. Don’t use agents when the problem is novel or strategically ambiguous and you still need exploratory research; when outcomes are unmeasurable or subjective without heavy context; when stakes are high and the acceptable error rate is effectively zero; when data is siloed, stale, or legally constrained; when the work is one-off or low-volume; or when your team can’t commit to instrumentation, evaluations, and ongoing maintenance. In these cases, a simpler rules engine, a clearer UX, or a well-defined workflow usually beats agentic complexity.

    Here’s how this plays out in practice. We’ve seen agents materially improve customer support triage (categorization, priority, and next-best-action suggestions), CRM hygiene (deduplication, enrichment, and routing), and release QA (regression check orchestration with human sign-off). Conversely, we avoid agents for nuanced pricing decisions, sensitive risk scoring without robust datasets, or any workflow where “explainability” and auditability trump speed.

    Operationalizing agents is a product problem before it’s an ML problem. Start narrow with a retrieval-first pipeline and rigorous prompt engineering, define success metrics upfront (quality, latency, cost per task), and run head-to-head evaluations against human baselines. Ship behind feature flags, monitor with Agent Analytics, and graduate from assisted to autonomous modes only after you’ve proven stability. Align this with product roadmapping and sprint planning so the work lands as durable capability, not a lab demo.

    Finally, be honest about build vs buy. If the workflow is a point of parity, consider buying and focusing your team on integration quality and governance. If it’s a potential source of competitive differentiation, invest in a modular architecture with clear context window management, strong observability, and a feedback loop tightly coupled to your empowered product teams.

    The bottom line: AI agents unlock leverage when there’s volume, clarity, tolerance, access, and a learning loop. If any of those pillars is missing, pause. Your best next move is likely better instrumentation, sharper problem framing, and continuous discovery—not more autonomy. That discipline is how product teams turn agentic AI from hype into habit.


    Inspired by this post on Product School.


    Book a consult png image
  • Unleashing Inbound Sales with AI: My Playbook for Launching and Scaling Sales Agents Fast

    Unleashing Inbound Sales with AI: My Playbook for Launching and Scaling Sales Agents Fast

    Inbound leads shouldn’t wait for a rep’s calendar. When we first launched The Service Agent Blueprint, support leaders finally had a clear AI path. Go-to-market and revenue teams are now facing similar uncertainty, so I’m introducing The Sales Agent Blueprint—a practical map for launching and scaling AI for sales with confidence.

    For most sales teams, inbound motions require a lot of manual work. I’ve watched leads pile up in queues, waiting for availability rather than being prioritized by buyer intent. That delay costs meetings, pipeline, and momentum—and it’s exactly where a modern AI Strategy can transform your go-to-market strategy.

    Agents can run sales conversations end to end – engaging buyers, qualifying leads, and routing high-intent opportunities to the right team to move prospective buyers forward quickly. Humans will still be involved, but will move their focus to the consultative conversations and higher-value work they did not have time to focus on before. In practice, this shift enables cleaner AI workflows, better conversation design, and a healthier balance between sales-led growth and product-led growth.

    The questions many go-to-market and revenue leaders are facing now are where do you start? What should success look like? How do you actually test and deploy these solutions? These are the right questions—and the ones I hear most often when teams weigh build vs buy decisions, evaluation frameworks, and CRM integration nuances.

    The Sales Agent Blueprint answers those questions. It’s designed to be a strategic guide for sales, revenue, and AI transformation leaders who want to deploy AI for inbound sales fast, prove value, and build momentum. If you’re aiming for eval-driven development, this will help you define success up front and operationalize it.

    What’s inside is simple by design yet deep enough to take you from zero to value. The Sales Agent Blueprint is structured around two tracks that reflect how high-performing teams adopt agentic AI: first, launch for quick wins; next, scale for durable growth.

    Minimal blue banner for Introducing the Sales Agent Blueprint with a bold 'Scale it' headline, abstract halftone device graphic, subtle crop marks, and a 'Coming Soon' badge in the upper-right corner.
    Coming soon: Sales Agent Blueprint. A sleek, blueprint-inspired teaser with the call to 'Scale it' signals tools, playbooks, and workflows to grow revenue, streamline operations, and scale teams with confidence.

    Today, I’m releasing the first part of the Blueprint: “Launch it.” It’s a practical guide for getting your Agent live and seeing real results. You’ll learn how to deploy a Sales Agent that runs inbound sales conversations end to end, engaging buyers, qualifying leads, and routing high-intent opportunities to the right outcome in real time—without disrupting your current CRM integration or pipeline processes.

    By the end of the “Launch it” track, you’ll be ready to execute with clarity. Here’s how I frame the essential steps, based on what consistently works in the field.

    Understand what a Sales Agent is: Discover why they’re different from chatbots and how they work. Build a business case: Prove the basic economics of AI, decide whether to buy or build, and get the buy-in and budget you need to move forward.

    Evaluate an Agent: Learn how to define success, choose the right evaluation criteria, and run a focused, high-impact assessment with our five-step framework.

    Deploy with confidence: Build a deployment plan that gets your Agent live quickly to engage buyers at peak intent. Learn what to expect at each stage.

    Vector-style 'Blueprint' title on a light grid with Bézier points, plus a royal-blue panel reading '1 Launch it' next to a satellite icon; footer shows FIN.AI/BLUEPRINT/SALES promoting the Sales Agent Blueprint.
    Introducing the Sales Agent Blueprint. This crisp, grid-based graphic spotlights step 1—Launch it—signaling day-one activation for an AI sales agent. Explore the framework and get started at fin.ai/blueprint/sales.

    Continuously improve performance: After launch, your Agent becomes a system to manage. We’ll show you how to implement a repeatable process to train, test, deploy, and optimize.

    The second track, “Scale it” (coming soon), focuses on the organizational and systems design work that unlocks compounding gains. Launching AI is only the beginning. To unlock its full potential, you need to rewire your inbound sales motion—redesigning the buyer journey, building AI-first systems and ownership models, and rethinking how pipeline is generated and scaled. This is where governance, measurement, and team roles evolve to support sustainable growth.

    I’ll be building this Blueprint in public as I navigate the same challenges—sharing what works, what to avoid, and how to accelerate time-to-value without sacrificing quality or trust. If you’re ready to turn intent into revenue with agentic AI, this is your head start.

    The Sales Agent Blueprint is live now. Explore the full guide at fin.ai/blueprint/sales and start your “Launch it” sprint today.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Fin for Sales: Instantly Engage, Qualify, and Close High‑Intent Leads with an AI Customer Agent

    Fin for Sales: Instantly Engage, Qualify, and Close High‑Intent Leads with an AI Customer Agent

    Today, I’m spotlighting Fin for Sales, a new role for Fin Customer Agent that runs your inbound sales motion end-to-end. From my vantage point leading product management and collaborating closely with revenue teams, this is a meaningful evolution in how we capture, qualify, and convert high-intent demand with precision and speed.

    The promise here is simple and powerful: a single Customer Agent with shared context, memory, and business goals that supports the entire journey from first touch to close. Fin for Sales brings Fin to the start of the customer journey so it can engage prospects, guide them through your funnel, and ensure the best opportunities reach your sales team without delay.

    At a high level, here’s what stands out to me in practice. Fin engages every prospect instantly at the moment intent is highest. It runs discovery like your best rep with clear pricing guidance, product education, and objection handling. It qualifies and routes in real time using your playbook and syncs full context to your CRM. And it closes deals while you sleep by booking meetings, starting trials, and steering buyers to the right next step—boosting MQLs, pipeline, and early close/win rates.

    Fin engages every prospect instantly. It starts the right conversation when interest peaks, re-engages before prospects go cold, and works on every channel, in every language, 24/7. In my experience, that immediacy is the difference between a lead that converts and a lead that disappears.

    Screenshot of a Fin for Sales chat widget on a dark abstract background, where an AI assistant compares Free vs Pro CRM plans, recommends Pro for reporting needs, and offers to book a sales call.
    Introducing Fin for Sales, a conversational assistant that qualifies prospects in real time. The chat compares Free vs Pro, spotlights reporting and Salesforce integrations, and invites users to book a call.

    Fin runs discovery like your best rep. It explains pricing, guides product discovery, handles objections, and personalizes each interaction based on who the prospect is and what they care about. This is where thoughtful conversation design and consistent playbook execution really compound.

    Fin qualifies and routes in real time. Using your playbook, it collects and enriches data about your prospects, sends qualified leads to your sales team or down self-serve paths, while syncing full context to your CRM. Your team never works the wrong lead. That’s operational rigor revenue leaders crave.

    Fin closes deals while you sleep. It can book meetings, start trials, and guide buyers to the right next step. Early customers are already seeing impressive results, increasing MQLs, growing pipeline and seeing close/win rates of nearly 50% in the first month. That’s the kind of lift that reshapes go-to-market strategy and forecasting confidence.

    Graphic showing Fin for Sales connecting a prospect insights panel to Salesforce. A dark UI card lists contact details and signals like purchase intent, opportunity, and timeline over blue shapes.
    Fin for Sales links customer agent insights with Salesforce, turning live conversations into rich profiles and lead scores. View key details, intent and opportunity signals, and guided next steps like booking a meeting.

    Why this matters: most online sales experiences still rely on forms, queues, and follow-ups—exactly when prospects want clarity and momentum. Hiring enough reps to cover every time zone, channel, and hour is unrealistic, and even the best teams burn cycles on leads that were never going to convert. I’ve watched high-intent demand slip through the cracks simply because the response wasn’t fast, consistent, or contextual enough.

    Revenue leaders need a system that meets every inbound interaction immediately, without sacrificing quality, and routes only the right opportunities to sales. Incremental automation doesn’t fix the core issue; an agentic approach does. Fin for Sales closes that gap by pairing instant engagement with disciplined qualification and crisp handoffs.

    How it works in the moment: when a prospect is actively exploring your site, any delay—a form, a queue, a “we’ll get back to you”—erodes intent. Fin engages in real time through the Spotlight Messenger, a new interface built specifically for sales conversations. It can proactively start a conversation based on context like the page someone is on or how they’re browsing, and it offers smart suggestions to kick-start engagement.

    Chat widget for Fin for Sales displaying an in-chat calendar and time-slot picker for March 2026, with Friday, March 9 highlighted and a Confirm booking button on a blue gradient background.
    Fin for Sales schedules meetings directly in chat. A sleek widget shows a March 2026 calendar with selectable time slots and a clear Confirm booking CTA, streamlining lead capture and speeding up sales follow-ups.

    Prospects who might have waited—or never reached out—now get answers immediately. Fin also works across channels including messenger and email, so buyers can engage however they prefer. Whether someone is browsing your pricing page at 2am or comparing features during a lunch break, Fin responds instantly and relevantly so no lead is left behind.

    To move prospects toward a decision, Fin guides personalized discovery conversations that clarify needs and accelerate choices. Four pillars make this consistent and trustworthy. Playbook: you brief Fin in natural language on desired outcomes and scenarios; it follows your rules, handles objections with approved guidance, and stays on track. Knowledge: it draws from your product knowledge base to answer pricing, features, and plan fit, and can reuse what you’ve already trained for customer service—no duplicate setup. Enrichment: once Fin learns a user’s email or name, it enriches that data with outside sources to improve qualification, personalization, and routing. Memory: if Fin recognizes a returning visitor, it remembers context so the buyer never starts over.

    As conversations progress, Fin surfaces the opportunities most likely to close. It qualifies like your best SDR—asking about use case, budget, fit, and timing—and applies your existing playbook to identify the strongest opportunities. Details captured in conversation, plus enrichment, produce a complete picture that’s structured and synced into your CRM for immediate sales action. And when a lead isn’t a fit, Fin gracefully disqualifies or redirects to self-serve resources, ensuring your pipeline stays focused.

    Minimalist hero graphic with the headline 'Add Fin to your sales team today,' a glossy 3D blue spiral at center, and a black 'Start free trial' button, promoting Fin for Sales as an AI customer agent.
    Introduce Fin for Sales to your team with this clean hero banner: bold headline, signature blue spiral, and a clear 'Start free trial' call to action—inviting readers to explore an AI customer agent built for revenue.

    When a lead is ready to act, Fin closes. It books meetings via tools like Chili Piper or Calendly, guides qualified buyers into trials or subscriptions, and routes opportunities to your sales team with full context. Crucially, it passes the full conversation history and an AI-generated summary so reps pick up exactly where the buyer left off—no repeated questions, no lost nuance. For self-serve motions, Fin can guide prospects from discovery to trial signup or even paid conversion, automatically assigning the right path.

    Real results underscore the model’s value. Fin is already delivering measurable results for early customers across different company sizes, sales motions, and go-to-market models. Attio, an AI CRM built for scaling go-to-market intelligently, deployed Fin to replace their traditional form-and-wait inbound flow with real-time conversational engagement. In three months, Fin handled over 1,600 conversations with website visitors, qualified more than 50 leads for sales, and routed over 30 applicants into their startup program. One returning prospect engaged with Fin, had their questions answered in real time, and converted to a paying customer at six times Attio’s average contract value.

    Fellow, an AI-powered meeting assistant and management platform, started by deploying Fin overnight, a window where no human was online and prospects waited up to 18 hours for a reply. In January alone, Fin booked 18 meetings the team would never have reached, converting at around 48%. Importantly, the human team maintained its booking rate while Fin added net-new meetings—proof that automation layered on top of strong human coverage can be additive, not cannibalistic.

    Fin for Sales is built on the same AI platform that powers the highest-performing Agent in customer service, which keeps the end-user experience consistent. If a prospect asks a support question mid-sales conversation, Fin can handle it—no handoffs to other vendors, no lost context. It shares knowledge and memory across its platform, always knows whether it’s talking to a prospect or a customer, and moves between roles as needed. Setup follows the same Fin Flywheel: Train, Test, Deploy, Analyze. Describe your sales playbook, qualification criteria, and routing rules in natural language; test in preview; deploy live; and use Analyze to understand performance and iterate quickly.

    Fin for Sales is available today, and there’s more coming. I share the conviction that the future is a single Customer Agent, vertically integrated down to the model layer, orchestrating customer experience across the entire lifecycle. If you want to see it in action, go to fin.ai/sales and talk to Fin—then imagine that instant, high-quality engagement running across your inbound sales engine, every hour of every day.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Why Your Product Needs a Smarter Support Agent: Data-Driven, Agentic AI That Truly Helps

    Why Your Product Needs a Smarter Support Agent: Data-Driven, Agentic AI That Truly Helps

    Your product deserves a support experience that does more than point users to a help article. In my work leading product teams, I’ve seen how an intelligent, in-product assistant can reduce friction, accelerate user activation, and create the kind of product-led growth that traditional support channels struggle to deliver. The bar is higher now: customers expect immediate, context-aware help that feels proactive, measurable, and trustworthy.

    When I evaluate support solutions, I look for three capabilities: an assistant that truly knows the user’s context, can act on their behalf to resolve issues end-to-end, and can prove the impact with rigorous measurement. Anything less is just another interface to your knowledge base. The shift to agentic AI makes this possible—if it’s grounded in behavioral analytics and integrated with your unified analytics platform.

    Learn more about Amplitude AI Assistant. Our in-product support agent knows your users, acts on their behalf, and measures whether it actually helped.

    That promise resonates with how I design AI Strategy: start with data fidelity, not dialog. When an assistant is wired into Amplitude analytics and behavioral analytics, it can understand where a user is in the journey, the features they have (or haven’t) adopted, and which nudges or in-app guides historically drive success. This is the foundation for precise, contextual help—surfacing the right product tours at the right moments and removing guesswork.

    Knowing users isn’t enough; the assistant must act. With agentic AI, the assistant can execute safe, auditable steps on a user’s behalf—updating settings, triggering a workflow, or guiding a multi-step configuration—rather than handing off a to-do back to the customer. Done well, this reduces time-to-value and support tickets while aligning with a thoughtful customer support ai strategy that respects permissions, privacy-by-design, and clear guardrails.

    Equally important is measurement. I expect every AI touchpoint to demonstrate lift: faster time-to-resolution, higher feature adoption, improved retention, and lower churn. This is where robust A/B testing, Agent Analytics, and retention analysis come in—so we can quantify the assistant’s contribution against meaningful product outcomes, not vanity metrics. If we can’t measure it, we can’t manage it.

    Operationally, I advise teams to pilot with narrowly scoped, high-impact journeys and iterate with tight feedback loops. Instrument the assistant’s actions and outcomes, set minimum detectable effect thresholds for experiments, and continually refine prompts and playbooks. Tie insights back to your unified analytics platform so learnings inform roadmap choices and reinforce a durable product-led growth motion.

    In short, the next generation of in-product support will be built on data-rich context, agentic execution, and rigorous proof of value. That’s the standard I hold my teams to—and the experience users deserve when they ask for help.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • AI Now Approves Our Pull Requests—Safely: Inside an Agentic, Auditable Review Engine

    AI Now Approves Our Pull Requests—Safely: Inside an Agentic, Auditable Review Engine

    At Intercom, shipping is our heartbeat. We push code to production hundreds of times a day, and I’ve seen firsthand how that pace sharpens our product instincts and forces clarity in our CI/CD practices.

    Engineers, engineering managers, designers, and PMs all contribute to this, safely. The average time from merging code to it running in production is 12 minutes. For me, that’s not just a vanity metric—it’s a DORA-style signal that our release pipeline and observability are aligned with the velocity our customers expect.

    I’ve long held a belief that might sound counterintuitive: speed is not the enemy of safety. It’s a prerequisite for it. Accumulating code creates risk. Shipping small batches minimizes it. The faster you ship, the smaller each change is, and the easier it is to catch problems, and roll back when something goes wrong as the context is still fresh in your head. That small-batch discipline underpins how I approach AI workflows and risk management across product teams.

    Today, over 93% of our pull requests (PRs) across our two main codebases are Agent-driven. And over 19% are auto-approved with no human reviewer in the loop. When I first saw those numbers at scale, I asked the same question you might be asking: are we trading rigor for speed? The answer lives in the data.

    I want to focus on that second number, and why I think it makes us safer. Most people hear “AI is approving our pull requests” and think that’s reckless. I thought so once, too—until I looked at the outcomes that actually matter.

    Last year, our CTO Darragh Curran set an explicit goal: double the productivity of our entire R&D organization within 12 months. Because the faster we can build and ship, the faster our customers get the capabilities they need. Ambitious? Absolutely. But the operational clarity that comes from such a target is invaluable for product leaders.

    Nine months later, we did it. The results were significant across the board, but here’s the stat that crystallized it for me: downtime from breaking code changes dropped 35%, even as our deployments doubled. Shipping faster made us safer. As we modernize how we build and ship software, we systematically surface bottlenecks and tackle them. One of the biggest we found? PR review.

    Humans simply don’t have the time or mental capacity to properly review the volume of AI-generated code we’re now producing. I’ve watched great engineers get stuck in review queues, or worse, feel pressure to rubber-stamp under time constraints—an anti-pattern I’ve battled in multiple orgs.

    When an AI Agent can produce a working implementation in minutes, waiting hours or days for a human to review it is an impedance mismatch. The production line is moving faster than the quality gate can keep up. When that happens, one of two things follows: either the queue backs up and velocity drops, or, more dangerously, humans start rubber-stamping. Glancing at a diff, skimming the description, clicking approve. Some companies are drifting into this failure mode silently. We chose to confront it head-on and built a rigorous solution.

    PR review, done properly, is complex. A good reviewer evaluates the problem statement, aligns the diff to intent, checks for safety and logical issues, applies deep product context, and scans for performance and anti-patterns. No single human can cover all of that on every PR at high deployment frequency. The truth—borne out by data—is that the human baseline we often assume is stronger than it really is.

    Bar chart showing AI-approved pull requests merge 5.2x faster than human-reviewed ones, with medians of 14.6 minutes vs 75.8 minutes, illustrating reduced PR cycle time from creation to merge.
    AI is accelerating code reviews: our data shows median merge time drops from 75.8 minutes with human review to just 14.6 minutes with AI approval—about 5.2x faster—while maintaining strong safety checks.

    So we asked ourselves: what if we could do better?

    Our PR review Agent doesn’t treat code review as a single task. It decomposes it into separate sub-jobs, each handled by an independent sub-Agent. One assesses the quality of the problem description. Another checks whether the diff actually aligns with the stated intent. Another reviews for safety concerns. Another checks for logical correctness. Another reviews against best practices and known anti-patterns. And so on. As a product leader, this is exactly the kind of agentic AI architecture I look for: specialized, auditable steps that strengthen the overall control plane.

    The result is that every PR is reviewed as if a dozen of our most tenured and knowledgeable engineers were all looking at it simultaneously, each bringing their own specialist lens. In the past, getting that breadth of review on a single PR was impossible. Now it’s the default. And unlike ad hoc human review, this system is consistent and tireless.

    A human reviewer typically focuses on the actual code changes, the diff. Our Agent goes deeper. It traces execution paths, following the implications of a change through the codebase. This is something humans rarely had time to do, even when they wanted to.

    While testing our new PR review Agent on a set of historical PRs, we found it flagging a one-line text copy change as incorrect. On the surface, it looked completely harmless, just a text update. We assumed it was a mistake, but it wasn’t. Our Agent caught that the new copy contradicted an existing validation mechanism elsewhere in the codebase. No human reviewer would have realistically found this unless they happened to have written that validation code very recently. Our Agent catches this kind of thing consistently, every time, because it’s always tracing execution.

    The review isn’t generic either. It’s grounded in Intercom-specific guidance that our engineers have built and continue to refine, encoding the same context, standards, and product knowledge they’d apply if they were reviewing the PR themselves. When the Agent reviews a PR, engineers flag whether the review comments were helpful or not, and that feedback continuously sharpens the guidance. It’s a flywheel: the more our engineers invest in teaching the system how to think about our codebase, the better every subsequent review gets. This is eval-driven development in action.

    Automated approval is also never forced. Any engineer can request a human review on any change, at any time. The system is a tool, not a mandate. At Intercom, shipping code doesn’t end at merge. The engineer who ships a change is expected to watch it go live, monitor its behaviour in production, and be ready to roll back if something isn’t right. AI approval doesn’t change that. The human who ships the code remains accountable for the outcome.

    Graph showing 19.2% of all PRs fully auto-approved by AI, 60% are evaluated by AI

    The naive take on AI-approved PRs is that it’s just a rubber-stamp LLM call so that humans don’t have to bother. A convenience feature. That misses what’s actually happening. Our Agent is strict. It won’t approve large PRs. If a change is too big, too complex, or too broad in scope, it flags it and requires it to be broken down. That design nudges engineers toward smaller, well-scoped changes—the safest way to ship, review, test, and, if needed, roll back.

    This matters enormously for safety. Small changes are easier to review, easier to test, easier to understand, and, critically, easier to roll back when something goes wrong. This is the same principle that has always underpinned our shipping culture, but now the PR review Agent actively enforces it. As someone who’s owned incident management and SRE partnerships, I can’t overstate how powerful this is.

    Bar chart of revert rates by code author type, comparing human-authored vs AI-authored code for backend and frontend; AI shows about 10x lower reverts (0.53% vs 5.39% backend, 0.22% vs 2.00% frontend).
    A snapshot of our code review results: AI-authored pull requests are reverted far less often than human-written ones—around 10x lower—across both stacks, with 0.53% vs 5.39% in backend and 0.22% vs 2.00% in frontend, signaling safer merges.

    It’s tempting to look at a goal like “>50% AI-approved PRs” and worry we’re optimizing for a metric rather than an outcome. I see it differently. The real goal is to remove a bottleneck that, if left unchecked, pushes people toward rubber-stamping. By elevating the review bar and keeping batch sizes small, we protect both speed and stability.

    We didn’t assume AI review would be good enough; we actively ran an experiment. Our hypothesis was that AI review could match or outperform human review quality, measured by outcomes: were the changes correct? Did they cause problems in production? How quickly were they reviewed and approved?

    We started with a controlled pilot of over 100 PRs through the AI approval pipeline. The results: zero reverts of AI-approved PRs, and a 6–16x improvement in time-to-approval at the 75th percentile. Since then, the system has scaled significantly. In the first four weeks of broader rollout, 497 PRs went fully autonomous, with Claude writing the code and our AI approval system reviewing, approving, and shipping to production.

    Graph showing AI approval is 5x faster than human review

    Beyond the approval pipeline itself, we also looked more broadly at how AI-authored code performs in production compared to human-authored code. AI-authored backend code had a revert rate of 0.53%, compared to 5.39% for human-authored. On the frontend, it was 0.22% versus 2.00%.

    10X lower revert rate for AI-Authored code

    AI-authored code, reviewed and approved through our automated pipeline, is being reverted at a fraction of the rate of human-authored, human-approved code. I don’t expect that to stay at zero forever, but the evidence shows the quality bar our Agent holds is at least as high as the one humans were holding, and in many cases higher. And here’s the humbling perspective: the product changes that caused outages in the past? They were all reviewed and approved by humans. Human review is not a guarantee of safety. It never was.

    Everything I’ve described—the sub-Agent architecture, the traceability, the labeling, the data—wasn’t just built for speed. It was built for auditability. Every AI-approved PR is labelled, logged, and queryable. The review comments, the approval decision, the test results, the merge event: all recorded. The evidence an auditor expects to see is the same whether a human or an AI approved the change. The “who” may change, but the “what” doesn’t. That’s how you meet SOC 2, HIPAA, ISO 27001, ISO 42001, and AIUC-1 without compromising agility.

    We engaged our auditors, Schellman, early, before we scaled. We proactively worked with them to confirm that our automated review processes and the evidence they produce meet the requirements of our compliance frameworks, including SOC 2, HIPAA, ISO 27001, ISO 42001, and AIUC-1, among others. We think AI-driven change management can meet and exceed the standards that human-driven processes set, and we want to help prove that. In my experience, when you build for safety, compliance follows—never the other way around.

    You can only go so far with PR review as a safety mechanism, no matter how good the reviewer is, human or AI. Only in production do you discover the unknown unknowns. The majority of Intercom’s largest outages weren’t even caused by changes to product code at all. They were infrastructure issues, unanticipated customer usage patterns, or third-party outages. PR review, whether human or AI, was never going to catch those. That’s why, in parallel, we’re also working on an Agent that proactively diagnoses issues in production. We’ll share more on this soon.

    Speed has always been at the core of how we build at Intercom, not in spite of safety, but because of it. And we’re getting even faster with AI. It’s easy to assume that AI-approved PRs would lead to a drop in quality and safety but our data proves otherwise. Our heartbeat is just getting stronger. For product leaders, this is the blueprint: pair agentic AI with small batches, robust observability, and clear accountability, and you make shipping both faster and safer.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Inside 27,000 AI Sessions: What Real Users Taught Me About Designing High-Trust Agents

    Inside 27,000 AI Sessions: What Real Users Taught Me About Designing High-Trust Agents

    Over the past quarter, I’ve been obsessed with a simple question: how do real people actually prompt AI agents when the stakes are high and the clock is ticking? We analyzed 27K sessions with Amplitude's Global Agent using our Agent Analytics tool. Here's what we found out about how real users are prompting our agent. That single line belies months of careful instrumenting, qualitative review, and product debates—and it forever changed how I design agent experiences.

    The clearest pattern I saw: users don’t craft “perfect” prompts—they co-create with the agent. Most sessions began with a broad intent, then tightened through rapid, iterative turns. The winning structure emerged as context, command, and constraints. When our agent acknowledged context first, clarified the command, and reflected constraints back, users responded with noticeably more confidence. It reinforced what great prompt engineering already teaches, but grounded in lived behavior across thousands of journeys.

    Trust was the next breakthrough. People wanted transparency on capabilities, a concise first answer, and an easy path to deeper detail and sources. They frequently asked the agent to show its work, summarize trade-offs, or restate assumptions in plain language. Instrumenting observability into the agent’s reasoning artifacts—without overwhelming the user—proved foundational for building credibility session by session.

    On task complexity, users fared best when the agent orchestrated a few small, verifiable steps rather than one heroic leap. Retrieval-first pipeline patterns consistently reduced confusion and rework, especially when paired with strong context window management. The more the agent proactively chunked the problem, validated intermediate outputs, and offered next-best actions, the smoother the journey—and the more reusable the prompts became.

    UX nudges mattered as much as model quality. Inline examples (“Try this”), one-click refinements (“Shorter,” “Add a table,” “Cite sources”), and lightweight guardrails kept momentum high without boxing users in. When the agent made uncertainty explicit and offered safe fallbacks, abandonment dropped and users explored more ambitiously. The experience felt less like “querying a model” and more like collaborating with a capable teammate.

    From a product management lens, these insights shape how I prioritize agentic AI. I’m doubling down on: scaffolded prompts that lead with context and constraints; transparent citations and assumptions; multi-step plans that the user can edit; and evaluation loops that A/B test prompt templates, tool strategies, and response formats. I’m also investing in analytics that connect session patterns to activation, speed-to-value, and retention so we can run eval-driven development, not opinion-driven debates.

    If you’re building agents into a core product workflow, start by designing for iterative co-creation, not one-shot brilliance. Offer progressive disclosure, keep the first answer tight, and make verification effortless. Shape the model with retrieval-first strategies, manage your context window like a scarce resource, and treat observability as a feature, not a debug tool. Most of all, let real usage guide your roadmap—these 27K sessions reminded me that the best agent UX is learned alongside our users, not imagined in isolation.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Cracking the Hardest Percentages: Turn Complex Support into Scalable, Trust-Building Automation

    Cracking the Hardest Percentages: Turn Complex Support into Scalable, Trust-Building Automation

    I’ve learned that the smallest slice of your support queue often dictates the majority of your operating cost, customer memory, and automation ceiling. In product reviews and CX ops deep-dives, I see the same pattern: the “easy” tickets pad your resolution counts, but the complex, multi-step queries quietly own your handle time and your brand trust. If you care about compounding impact, your customer support AI strategy has to target that hardest percentage first.

    Complex queries are a small percentage of your queue, but they consume a disproportionate share of your team’s time.

    Take a typical queue: password resets outnumber refund disputes ten to one, but a reset takes five minutes and a dispute takes thirty. The “rare” query accounts for over a third of total handling time. The same pattern holds for account investigations, subscription changes, and billing disputes.

    How you handle complex queries is also what customers actually remember about their support experience. When someone is dealing with a damaged order or a billing dispute, the stakes are higher, and a fast, good resolution is what separates a forgettable interaction from one that builds lasting trust.

    Most AI Agents automate the easy, informational queries well. The question for your automation rate is whether they can handle the hard ones. That’s where agentic AI and robust AI workflows make or break your outcomes.

    We’ve gotten really good at informational queries – the hard part is what comes next. I’ve seen teams invest deeply here, and for good reason: it lifts containment quickly and cheaply. But to break through the plateau, you have to execute actions across systems, not just answer with text.

    We’ve invested deeply in informational Q&A. We built Apex, a specialized customer service model trained on billions of support interactions, as Fin’s core answering engine. Beneath that sits a custom retrieval model, a purpose-built reranker, and a unified RAG pipeline, all trained specifically for customer service. Fin resolves issues at a higher rate than general-purpose frontier models, with fewer hallucinations and at lower cost.

    But informational Q&A only covers queries where text is the answer. Most Agents can handle that. Far fewer let you configure complex, multi-step actions without a forward-deployed engineer setting it up for you, which creates a gap.

    Every query your team handles falls into one of three categories:

    Informational: “Can you ship transatlantic by priority next day?” Answered with text from your knowledge base.

    Personalized: “Where is my order?” Requires data unique to that user.

    Action-led: “My order arrived damaged, I need a refund.” Requires doing something: checking a return window, cross-referencing transaction data, making a judgment call – reading from multiple systems and acting across them.

    Dark-themed line chart of percentages from Jan 2026 to Apr 2026. An orange line with circular markers climbs steadily, pauses briefly mid‑period, then spikes sharply to a new high near the end of the timeline.
    From Jan to Apr 2026, the trend moves steadily upward, pausing briefly before a sharp late surge. A clear snapshot of momentum for customer service KPIs, finance results, and the impact of new procedures.

    These complex queries, the ones that require multi-step processes across systems, aren’t edge cases; they’re the reason your support team exists. This is the gap Fin Procedures was built to close.

    It works in practice, and the trajectory matters for product strategy and ops planning.

    Procedures is live, it’s scaling, and the results are clear. Since launching in managed availability, Procedures has handled over 1.5 million conversations, and volume is doubling month over month across hundreds of apps in fintech, e-commerce, gaming, healthcare, and SaaS.

    When customers hit complex, multi-step queries, the experience is dramatically better when Fin can do the work end-to-end. We tested this with a randomized 5% holdout – conversations where Procedures would normally run, but didn’t. CSAT was 28.93% higher when Procedures ran, a statistically significant result.

    A product, not a services engagement. I’ve sat through too many “automation” projects that were really solutions engineering gigs: workshops, custom scripts, then a queue of change requests when policies shift. It’s fragile and slow.

    The B2B AI industry has a consultingware problem. It’s not databases being forked anymore, it’s prompts. The economics of maintaining bespoke setups per customer don’t work. Either the application falls behind new models, or the vendor changes the model and quality degrades invisibly.

    In my view, an agentic AI platform should be a product your team owns end to end: a natural language editor – literally paste your existing SOPs – branching logic, data connectors, and AI-powered simulations for testing. Your CX ops team configures this, iterates on it, owns it. If you need help, a forward-deployed team can assist, but they’re optional, not a dependency. You always have control.

    And because it’s a unified product, improvement compounds. When the vendor optimizes a prompt, every customer’s Procedures get better. When they upgrade the model, they can A/B test across the entire customer base and know it’s better before rolling out. You can’t do that when every customer has a bespoke prompt. The consulting model isn’t just expensive, it’s structurally unable to compound.

    Today, Fin Procedures is available to every Intercom customer – no waitlist or managed rollout, ready for all 8,000+ customers.

    We’re iterating fast based on real customer feedback. Here’s what’s landed since the last major update, and why it matters for reliability and governance:

    AI-powered Procedure review: Flags broken logic, missing references, and unreachable conditions before you deploy.

    Promotional banner reading "Get started with the #1 Agent today" over a dark, aurora-like gradient background, featuring a white button labeled "Start a free trial"; marketing graphic for an AI support agent.
    Kick off your journey with the #1 Agent—an AI partner designed to turn resolutions into real outcomes. Tap “Start a free trial” to explore faster, smarter customer service and see how Fin delivers value from day one.

    Procedure failure reporting: A new reporting dimension that lets you drill into conversations where Procedures failed, so you can diagnose and fix.

    Version history with rollback: Track every change, compare versions, roll back if needed.

    Data connector health monitoring: See at a glance if your integrations are healthy, degraded, or failing.

    Optional data connector parameters: Fin only asks customers for information when it’s actually needed, instead of prompting for every field.

    Email Simulation support: Test how your Procedures behave across chat and email before going live.

    Agent in the Loop (Beta) unlocks the next tranche of automation. Even with Procedures, two things hold teams back from automating their most complex queries: missing integrations and policies that require a human sign-off on sensitive decisions.

    “Agent in the Loop” is built for both. Need Fin to check your internal admin tools but haven’t built a data connector yet? Put a human checkpoint at that step. Fin handles the conversation, gathers context, and pauses, surfacing a structured summary for a human agent to verify or act, then resumes. You get automation on the 80% that doesn’t need the integration.

    For compliance – identity verification, high-value refunds – Fin does the legwork, a human makes the final call and then hands it back to Fin. This works natively in the Intercom Inbox and via Slack. Some competitors don’t have an inbox-native variant at all, meaning humans need to leave their primary workspace to review AI actions.

    Procedures are also built to let you collaborate with all your teammates – both human agents and AI Agents. Fin can work with them directly inside a Procedure, using APIs and webhooks to loop in another teammate mid-flow, hand off context, and pick back up once they’re done.

    Making it easier, faster. Procedures is already self-serve, but the next step is making Procedure creation, testing, and maintenance significantly more streamlined and easy to do, with less manual editing and more AI-assisted building and debugging. There’s a lot coming in this space over the next few months – and it aligns perfectly with a retrieval-first pipeline and stronger governance at scale.

    The hardest percentages matter the most. The biggest unlock for your automation rate won’t be answering more FAQs, it will be handling the complex, multi-step queries that consume your team’s time and define what customers remember about their experience with you.

    That means working with an Agent that goes beyond answering questions and executes processes. A product your team owns and configures, not a service you buy and hope gets maintained. And a platform where every improvement compounds across every customer. That’s Procedures. Available now, for everyone.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • How We Taught Agentic AI to Speak Product Analytics—and Unlocked Actionable Insights

    How We Taught Agentic AI to Speak Product Analytics—and Unlocked Actionable Insights

    I set out to solve a deceptively simple problem: help our teams ask product questions in plain English and get trustworthy, analysis-grade answers—fast. That required more than a powerful model; it demanded agents that genuinely understand the language of product analytics, from behavioral analytics nuances to the messy reality of event taxonomies, funnels, and cohorts. In this post, I share how we engineered agentic AI that speaks our domain fluently and turns questions into decisions.

    The core challenge wasn’t data volume or dashboard sprawl; it was semantics. Different teams said “activation,” “onboarding,” or “first value” and meant overlapping but distinct things. Our PMs, analysts, and engineers navigated a maze of synonyms across Amplitude analytics, Pendo, and our unified analytics platform. Generic LLMs stumbled on these nuances, so we built a shared ontology—driver trees anchored to a clear North Star—with canonical definitions for activation, retention, and conversion, plus consistent event naming and cohort logic.

    We started with a rigorous metric catalog: every KPI linked to its drivers, exact formulas, cohorts, and time windows; every event mapped to a product taxonomy; every dashboard and SQL snippet versioned with ownership and lineage. That catalog became the ground truth for agents. We embedded data governance and privacy-by-design from the start—permissioning for fields and queries, PII redaction, and scoped access that reflected how product teams actually work.

    Next, we built a retrieval-first pipeline to ground the agents in our corpus before generation. We indexed metric definitions, dashboards, experiment readouts, runbooks, and high-signal Slack threads so the agent could cite relevant artifacts, not just predict plausible text. With careful context window management and prompt engineering, the agent retrieves definitions and prior analyses, then plans multi-step actions: run a query, compare cohorts, check “minimum detectable effect (MDE)” for an A/B test, and summarize findings with references.

    Architecturally, we treated this as “Agent Analytics”: an orchestrator that selects tools based on intent—querying Amplitude analytics or Pendo for behavioral paths and funnels, hitting our warehouse for cohort tables, or pulling experiment metadata and anomaly detection alerts. Tool use is permission-aware, auditable, and designed to fail safe. The agent’s outputs include citations back to the exact definitions, dashboards, and SQL used, so reviewers can validate and iterate.

    Quality came from eval-driven development, not intuition. We built a gold set of representative product questions (activation inflections, retention analysis by segment, funnel drop-offs after feature launches) and scored the agent on faithfulness to definitions, numerical accuracy, latency, and actionability. We incorporated regression checks to catch drifts after schema changes, and we tuned prompts to reduce overconfident answers and push for clarifying questions when context was missing.

    Safety and reliability were non-negotiable. We layered AI risk management with role-based access, guardrails that block destructive queries, and risk scoring for unfamiliar joins or sudden spikes in metric deltas. The agent logs every step—what it retrieved, which tools it called, and why—so analysts can replay and refine the chain of thought with transparent provenance.

    The payoff: product teams now self-serve nuanced questions in minutes instead of days, and our analysts spend more time on discovery than report wrangling. Retention analysis improved as the agent standardized cohort logic; conversion investigations accelerated thanks to consistent funnel definitions; and cross-functional decisions aligned around the same driver trees and shared language. Most importantly, the agent turned ambiguous asks into structured analyses that stand up to scrutiny.

    For fellow product leaders, my lesson is simple: start with semantics, not models. A crisp ontology, disciplined taxonomy, and clear ownership will outperform a flashy stack riddled with ambiguity. Avoid technology FOMO; favor retrieval-first grounding, small sharp tools, and continuous discovery with your product trios. When your organization speaks a common analytics language, agents can finally think with you, not just for you.

    Next, we’re extending the agent’s planning skills to recommend experiment designs, estimate power and “minimum detectable effect (MDE),” and propose driver-tree-informed bet sizing. We’re also tightening feedback loops so every accepted answer, edit, or override strengthens the retrieval corpus and evaluations. The vision: a calm, reliable layer that makes rigorous product analytics feel conversational—and helps teams move from questions to confident action.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • How We Taught Agentic AI to Speak Product Analytics—and Unlocked Actionable Insights

    How We Taught Agentic AI to Speak Product Analytics—and Unlocked Actionable Insights

    I set out to solve a deceptively simple problem: help our teams ask product questions in plain English and get trustworthy, analysis-grade answers—fast. That required more than a powerful model; it demanded agents that genuinely understand the language of product analytics, from behavioral analytics nuances to the messy reality of event taxonomies, funnels, and cohorts. In this post, I share how we engineered agentic AI that speaks our domain fluently and turns questions into decisions.

    The core challenge wasn’t data volume or dashboard sprawl; it was semantics. Different teams said “activation,” “onboarding,” or “first value” and meant overlapping but distinct things. Our PMs, analysts, and engineers navigated a maze of synonyms across Amplitude analytics, Pendo, and our unified analytics platform. Generic LLMs stumbled on these nuances, so we built a shared ontology—driver trees anchored to a clear North Star—with canonical definitions for activation, retention, and conversion, plus consistent event naming and cohort logic.

    We started with a rigorous metric catalog: every KPI linked to its drivers, exact formulas, cohorts, and time windows; every event mapped to a product taxonomy; every dashboard and SQL snippet versioned with ownership and lineage. That catalog became the ground truth for agents. We embedded data governance and privacy-by-design from the start—permissioning for fields and queries, PII redaction, and scoped access that reflected how product teams actually work.

    Next, we built a retrieval-first pipeline to ground the agents in our corpus before generation. We indexed metric definitions, dashboards, experiment readouts, runbooks, and high-signal Slack threads so the agent could cite relevant artifacts, not just predict plausible text. With careful context window management and prompt engineering, the agent retrieves definitions and prior analyses, then plans multi-step actions: run a query, compare cohorts, check “minimum detectable effect (MDE)” for an A/B test, and summarize findings with references.

    Architecturally, we treated this as “Agent Analytics”: an orchestrator that selects tools based on intent—querying Amplitude analytics or Pendo for behavioral paths and funnels, hitting our warehouse for cohort tables, or pulling experiment metadata and anomaly detection alerts. Tool use is permission-aware, auditable, and designed to fail safe. The agent’s outputs include citations back to the exact definitions, dashboards, and SQL used, so reviewers can validate and iterate.

    Quality came from eval-driven development, not intuition. We built a gold set of representative product questions (activation inflections, retention analysis by segment, funnel drop-offs after feature launches) and scored the agent on faithfulness to definitions, numerical accuracy, latency, and actionability. We incorporated regression checks to catch drifts after schema changes, and we tuned prompts to reduce overconfident answers and push for clarifying questions when context was missing.

    Safety and reliability were non-negotiable. We layered AI risk management with role-based access, guardrails that block destructive queries, and risk scoring for unfamiliar joins or sudden spikes in metric deltas. The agent logs every step—what it retrieved, which tools it called, and why—so analysts can replay and refine the chain of thought with transparent provenance.

    The payoff: product teams now self-serve nuanced questions in minutes instead of days, and our analysts spend more time on discovery than report wrangling. Retention analysis improved as the agent standardized cohort logic; conversion investigations accelerated thanks to consistent funnel definitions; and cross-functional decisions aligned around the same driver trees and shared language. Most importantly, the agent turned ambiguous asks into structured analyses that stand up to scrutiny.

    For fellow product leaders, my lesson is simple: start with semantics, not models. A crisp ontology, disciplined taxonomy, and clear ownership will outperform a flashy stack riddled with ambiguity. Avoid technology FOMO; favor retrieval-first grounding, small sharp tools, and continuous discovery with your product trios. When your organization speaks a common analytics language, agents can finally think with you, not just for you.

    Next, we’re extending the agent’s planning skills to recommend experiment designs, estimate power and “minimum detectable effect (MDE),” and propose driver-tree-informed bet sizing. We’re also tightening feedback loops so every accepted answer, edit, or override strengthens the retrieval corpus and evaluations. The vision: a calm, reliable layer that makes rigorous product analytics feel conversational—and helps teams move from questions to confident action.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image