Tag: product management leadership

  • Designing AI-Powered CX at Scale: Lessons Inspired by Amanda Sime at Amplitude

    Designing AI-Powered CX at Scale: Lessons Inspired by Amanda Sime at Amplitude

    Customer experience is where strategy, data, and execution converge—and where AI can deliver compounding value when thoughtfully designed. In my work, I’ve seen how the right CX vision becomes a growth engine when it’s operationalized through clear measures, robust analytics, and disciplined product practices.

    "Amanda Sime is the Customer Experience Strategy Lead at Amplitude. She shapes CX strategy and partners across orgs to design and scale AI-powered solutions." That concise description captures a model I deeply respect: start with a strong CX strategy, then partner across the organization to make AI real in the day-to-day. It’s not just about new technology; it’s about aligning teams, systems, and incentives to deliver consistent customer value.

    Translating that approach into practice requires a rigorous AI Strategy, anchored in measurable outcomes and informed by behavioral analytics. I prioritize journey mapping to expose friction, then connect those insights to AI workflows that enhance customer success and in-product guidance. When cross-functional partners—from solutions engineering to support—operate from a shared driver tree, the roadmap balances speed with sustainability.

    Data is the backbone. A unified analytics platform—often centered on Amplitude analytics—helps teams move beyond vanity metrics to track user activation, feature adoption, and retention analysis with precision. With that foundation, we can test responsibly, iterate quickly, and validate impact with product-led growth motions that scale across segments without sacrificing quality.

    Operational excellence matters just as much as vision. I’ve learned to treat CX programs like enduring products: build reliable feedback loops, connect customer support AI strategy to clear service-level outcomes, and empower product management leadership to make evidence-based tradeoffs. When teams have clarity on the problem space and access to trustworthy insights, they deliver solutions that feel both intelligent and human.

    The real win is cultural: empowering product trios and partner teams to co-own outcomes, not just outputs. That’s how AI moves from a promising experiment to a durable capability—by aligning strategy, analytics, and execution so customers experience value at every touchpoint.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Unlocking AI’s Black Box: How Monitors and Scorecards Elevate CX with Confidence

    Unlocking AI’s Black Box: How Monitors and Scorecards Elevate CX with Confidence

    I followed the energy at Fin Labs Paris and immediately zeroed in on the announcement of Monitors. In my view, it’s the missing piece that turns Fin’s powerful automation into an observable, trustworthy system—sitting alongside Insights and Recommendations to form a complete observability suite that gives teams confidence in what Fin is doing.

    With Monitors, you define what conversations get reviewed, both Fin and human, and set evaluation criteria using Custom Scorecards. That level of control ensures you’re measuring the metrics that matter most to your business and holding support quality to your bar, not a generic one.

    Used in concert with Insights and Recommendations, you can finally see what’s happening across your support operation, evaluate every conversation against your standards, and take targeted action to continuously move toward perfect customer experiences.

    As Agents become more powerful, transparency and control become critical. I’ve seen this shift firsthand: AI is advancing fast, and the stakes are no longer theoretical—Agents are resolving real customer issues with real consequences at scale.

    Diagram of the AI model lifecycle loop with four stages—Train, Test, Deploy, Analyze—with Analyze highlighted in orange to show monitoring that closes the feedback loop and opens the AI black box.
    Visualizing the AI development flywheel—Train, Test, Deploy, Analyze—this graphic spotlights Analyze in orange to introduce Monitors, turning opaque model behavior into measurable signals and continuous customer service insights.

    Fin has almost 8,000 customers, averages a 67% resolution rate, and resolves close to 2 million customer queries every single week, including highly complex queries in regulated industries.

    At that scale, observability isn’t a nice-to-have; it’s a necessity. Traditional CSAT and small QA samples weren’t built for Agent-led operations—they miss edge cases, don’t scale, and can’t explain drift. The result is a black box. What teams need most right now is confidence, built on data you can trust and act on.

    At Intercom, this is called the Fin Flywheel: Train, Test, Deploy, Analyze.

    Intercom Monitors dashboard with review queues and analytics cards, plus an Edit monitor panel configuring a 'Vulnerable customers' rule set with sample testing and continuous monitoring for Fin conversations.
    See inside Intercom's Monitors: a streamlined dashboard with pass‑rate charts and review queues, alongside a panel to define a 'Vulnerable customers' monitor, test it on sample chats, and run continuous checks.

    Analyze is the step where you find out what’s actually happening and it’s where improvement begins.

    In my experience, achieving confidence in an AI support operation requires three things: (1) a complete understanding of what Fin, your human team, and your customers are talking about; (2) a way to monitor and score conversations based on the criteria that matter most to your business; and (3) AI-powered recommendations that make it easy to act on what you find. Intercom launched Insights and Recommendations to address the first and third. Now, Monitors completes the system for full observability and opens the black box.

    Monitors: know whether every conversation met your standards. Customer sentiment is important, but it’s different from determining whether a conversation was handled correctly. With Monitors, you can do both—and do it at scale.

    Quote graphic for Announcing Monitors: Opening the AI black box, featuring a testimonial on tracking AI quality continuously vs. spot checks, attributed to Ineke Oates, Head of Support at Agorapulse.
    Customer support leaders praise Monitors for turning AI performance from a black box into measurable signals. This quote from Ineke Oates of Agorapulse highlights the shift from manual spot checks to continuous quality tracking.

    Monitors is a new QA capability that delivers a structured, repeatable way to define which conversations get reviewed and evaluate them against quality criteria you set. It replaces ad-hoc sampling and spreadsheet-driven QA with a system that scales as your volume grows.

    Two components work together: Monitors define what gets reviewed and Custom Scorecards define how each conversation is evaluated. That pairing brings the rigor of Agent Analytics and the discipline of eval-driven development to everyday CX operations.

    Random sampling has always been a blunt tool. When AI is handling thousands of conversations a week, a small, arbitrary slice won’t reliably capture your highest-risk edge cases, your most complex escalations, or where quality is starting to drift. I’ve felt that pain in operations reviews—too many unknowns, not enough signal.

    Product screenshot of a Monitors dashboard with review queues and bar-chart analytics, plus a New scorecard panel to assess human teammates or an AI agent using configurable criteria and pass rates.
    Open the AI black box with Monitors: track conversations, triage unreviewed items, and build transparent scorecards with criteria like accuracy, process adherence, and efficiency to lift customer support quality.

    With Monitors, you select and evaluate conversations with intent. You can target specific signals of risk or failure, like “the customer showed signs of financial vulnerability” or “Fin looped around with the same answer without resolving the issue.” Or you can create consistent, repeatable samples to benchmark quality over time. Use the existing library of filters (customer data, channel, Fin-specific metrics) or describe nuanced scenarios in natural language. Most teams will do both: hone in on the conversations that matter most and maintain a steady, structured QA sample each week.

    "When I saw Monitors, my first reaction was — this is exactly what we need. The ability to track quality continuously, instead of relying on spot checks, is a big shift for us." Ineke Oates, Head of Support, Agorapulse

    Custom Scorecards make your standards explicit and enforceable. One-size-fits-all rubrics never reflect your brand voice, industry constraints, or customer expectations. With Custom Scorecards, you define what “good” looks like for your business and turn that into a measurable, comparable quality score for every conversation.

    Minimalist testimonial graphic on an off‑white background quoting a customer about Monitors enabling QA where conversations happen, running across Fin and human support in one place; attributed to a Culture Amp leader.
    A customer testimonial underscores the promise of Monitors: bring quality assurance into the flow of work, unifying AI assistant Fin and human agents in a single place for faster, clearer customer support.

    You define the criteria that matters, how each should be measured, and how important each one is. Some criteria can be scored automatically by AI, others reviewed by a human, or both — all within the same scorecard. This means you’re not choosing between scale and judgment; you get both in one system.

    Each conversation is then evaluated against these criteria, and the system calculates an overall quality score based on your configuration. You can weigh what matters most, or mark certain criteria as critical, so a single failure can fail the entire evaluation when needed.

    The result is a single, consistent quality score that reflects your standards—not a generic metric, and not a collection of disconnected checks. That’s what makes quality measurable over time and comparable across AI and human support.

    Dashboard screenshot of Monitors review queues showing users, monitor types, colored review scores, reviewers, review status, notes, and follow-up actions with AI auto-review labels.
    Monitors helps open the AI black box by turning model outputs into trackable reviews. This clean queue groups customers, monitor types, scores, and actions—with AI auto-review—so teams improve quality faster.

    There’s an important distinction here: CX Score tells you how customers felt about a conversation. Custom Scorecards tell you whether it met your standards. You need both.

    "We looked at dedicated QA tools, but what's compelling about Monitors is that it lives where our conversations already happen. We don't need another system — we can run QA across Fin and our human team in one place." Jared Ellis, Senior Director, Global Product Support, Culture Amp

    When a conversation meets your criteria for review, Monitors routes it into a Review Queue. Each conversation is assigned to the right reviewer with its scorecard attached and status tracked end to end: Not reviewed, Reviewed, Needs a fix, Fix complete. Reviewers work directly in Intercom, capture what went wrong, and propose concrete fixes—like updating documentation or refining a workflow—so quality loops end in action, not just scores.

    Fin quality dashboard showing AI support monitor metrics and a line chart of criteria trends over time; cards list 75.2% average review score, 92.8% reviews passed, 856 reviews, and 62 failed, with date and filter controls.
    Monitors turn AI performance from opaque to measurable. The Fin quality view summarizes review score, pass rate, and review counts while a time‑series chart tracks escalation ease, clarification, and efficiency—delivering fast, actionable CX insights.

    Reporting turns QA into a continuous signal rather than a one-off audit. You can track review scores over time across Monitors and Scorecards, and compare them directly to CX Score, resolution rate, and other performance metrics. Patterns that were previously invisible become clear: a topic consistently underperforming, a quality dip correlated with a recent knowledge base change, or a team whose scores are improving week over week. This is observability applied to CX—evidence you can act on.

    Monitors for Fin conversations is live today, and the roadmap goes further. Human agent QA will bring the same structured evaluation to your human team’s conversations, creating one consistent quality system across your entire support operation.

    Real-time alerts will notify you the moment a conversation crosses a threshold you’ve defined—before the issue reaches more customers and risks compounding negative sentiment.

    Promotional banner reading "Get started with the #1 Agent today" over a dark, aurora-like gradient background, featuring a white button labeled "Start a free trial"; marketing graphic for an AI support agent.
    Kick off your journey with the #1 Agent—an AI partner designed to turn resolutions into real outcomes. Tap “Start a free trial” to explore faster, smarter customer service and see how Fin delivers value from day one.

    Knowledge base evaluation will connect AI scoring directly to your content so conversations are assessed against your latest policies and documentation, catching inaccurate or outdated responses and providing clear rationale linked to the relevant source.

    Creating perfect customer experience with AI requires transparency. You need to understand how the system is performing if you want to maintain and improve quality over time. With Insights, Monitors, and Recommendations, this is now possible—a complete analysis suite that lets you see what’s happening across every conversation, ensure it meets your standards, and pinpoint improvement opportunities when they matter most.

    I’ve long advocated for a retrieval-first, eval-driven approach to AI Strategy because it makes risk visible and manageable. Monitors operationalizes that philosophy for CX leaders: you get continuous signal, shared definitions of quality, and a direct path from flags to fixes. If you’re scaling AI support, this is how you replace uncertainty with control—and turn the black box into a competitive advantage.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Bad Advice from Your AI Clone? Ethics, IP, and How Product Leaders Protect Quality

    Bad Advice from Your AI Clone? Ethics, IP, and How Product Leaders Protect Quality

    What happens when an AI starts giving advice in your voice—advice you’d never actually give? I’ve been thinking a lot about that question, and this conversation hit home for me as a product leader navigating the fast-evolving reality of AI “clones.”

    Listen to this episode on: https://open.spotify.com/episode/7DNDIlIimwbbMOytArewRp?ref=producttalk.org | https://podcasts.apple.com/kh/podcast/bad-advice/id1794203808?i=1000756914818&ref=producttalk.org. Prefer video? Watch on YouTube: https://www.youtube.com/embed/RF4BwaeMMlg?feature=oembed

    The episode examines AI “clones” built from podcast transcripts and public content—where the experimentation feels exciting, where it crosses ethical lines, and what happens when mediocre AI outputs get attributed to real people. The tension is real: when a bot confidently answers in your style but misses the nuance, “it’s not me” becomes more than a disclaimer—it’s a reputational defense.

    We dig into the messy parts: IP ownership of open-sourced transcripts, the role of pirated books in LLM training sets, rising inference costs, and the uncomfortable economic question: if anyone can prompt “act like Teresa,” how do creators make a living? In my own decision-making, I look for clear consent, guardrails that prevent impersonation, and transparent UX that never confuses a synthetic perspective with a human expert.

    This isn’t anti-AI. It’s a nuanced conversation about quality, consent, and remembering there are real humans behind the ideas.

    Here’s how I translate the key takeaways into practice. Using AI for perspective is fine—equating it to the real person isn’t. Free-feeling AI outputs still rely on someone’s work. Expertise is more than past content—it’s context, judgment, and evolution. If someone’s work influences you, find a way to support them. These principles help teams benefit from gen ai without eroding trust or the creator ecosystem.

    “Technically possible” doesn’t mean “ethically okay.” My AI Strategy playbook includes privacy-by-design, clear data governance on training materials, and a bright line between inspiration and impersonation. When we ship AI features, we label synthetic outputs, avoid mimicking living experts without permission, and create paths to compensate or promote the humans whose thinking underpins the experience.

    I’ve also tested the “act like X” pattern to stress-test product quality. Even when outputs sound plausible, they rarely capture the expert’s mental models, trade-offs, or the evolution of their thinking—especially in complex product discovery work. That gap is the difference between average AI text and expert product management leadership.

    If you listen, consider a few reflection prompts: Have you ever used AI to “act like” someone you admire? Could you tell whether the output matched that person’s actual thinking? How do you decide what’s ethically okay when using public content in LLMs? And how can we support creators while still embracing new tools?

    Resources & Links you may find helpful: Follow Teresa Torres: https://ProductTalk.org; Follow Petra Wille: https://Petra-Wille.com; Delphi.ai (AI bot platform discussed): https://www.delphi.ai/?ref=producttalk.org; Lenny’s Podcast: https://www.lennysnewsletter.com/podcast?ref=producttalk.org; ChatGPT: https://chatgpt.com/?ref=producttalk.org; Petra’s Coaching Packages: https://www.petra-wille.com/coaching-packages?ref=producttalk.org; Teresa’s Product Talk: https://www.producttalk.org/; Teresa’s book Continuous Discovery Habits: https://www.producttalk.org/continuous-discovery-habits/; Lenny’s open-sourced podcast transcripts: https://www.dropbox.com/scl/fo/yxi4s2w998p1gvtpu4193/AMdNPR8AOw0lMklwtnC0TrQ?rlkey=j06x0nipoti519e0xgm23zsn9&e=1&st=ahz0fj11&dl=0&ref=producttalk.org

    Have thoughts on this episode or practices that have worked in your org? Share them below—I’m keen to learn how other teams are balancing innovation with integrity.


    Inspired by this post on Product Talk.


    Book a consult png image
  • How I Structure Documentation for AI and Humans: Battle‑Tested, SEO‑Smart Tactics That Scale

    How I Structure Documentation for AI and Humans: Battle‑Tested, SEO‑Smart Tactics That Scale

    Every week, I coach product and documentation teams on a simple truth I keep pinned above my desk: "AI is reading your documentation! Learn tips from the Amplitude docs team about how to structure your documentation for both human and AI audiences." That line captures the shift we’re all living through—our docs must now serve customers, support engineers, and increasingly, LLMs powering chat, search, and in‑product help.

    My AI strategy for documentation starts with intent. I map the core questions users ask at activation, onboarding, escalation, and renewal, then shape information architecture to reduce ambiguity. This helps humans find answers faster and helps LLMs retrieve the right chunks with higher precision—a win for UX writing, product-led growth, and support deflection.

    Structure beats style when AI is in the loop. I rely on semantic headings (H1–H3), consistent slugs, stable anchors, and one‑topic pages that can stand alone. Short paragraphs, scannable summaries, and canonical references reduce duplication and improve retrieval quality. Treat docs-as-code with CI/CD so changes are reviewed, versioned, and shipped reliably—documentation deserves the same rigor as product releases.

    Chunking matters for LLMs. I design content for context window management: one concept per section, tight procedures with numbered steps, and FAQs that mirror real queries. Glossaries define canonical terms and accepted synonyms so retrieval-first pipelines match user language without fragmenting meaning. Error messages and parameter names appear verbatim to strengthen search and grounding.

    Metadata is a multiplier. I add clear titles, descriptions, last‑updated dates, product area tags, and audience labels (admin, developer, analyst) to boost SEO and machine readability. Stable IDs for components, examples, and API objects improve deep linking and evaluation. Where appropriate, I include structured examples that align with prompt engineering best practices so AI assistants can extract inputs, outputs, and constraints cleanly.

    Quality is measured, not hoped for. I pair content audit checklists with analytics to see what’s searched, where users pogo‑stick, and which articles drive successful task completion. Tools like Amplitude analytics reveal gaps and dead‑ends, while lightweight evals (answer accuracy, grounding rate, latency) ensure LLMs retrieve the right doc chunks at the right time.

    Consistency is a feature. I standardize terminology across UI, API, and docs, and I avoid synonym sprawl that confuses both readers and LLMs. Page intros state the job-to-be-done; conclusions link to adjacent tasks; and deprecation notes are explicit with forward paths. This coherence lowers cognitive load and improves both RAG performance and human trust.

    Governance keeps it scalable. I assign owners per section, define SLAs for updates, and automate checks for broken links, orphaned pages, and outdated screenshots. Redirect rules avoid 404s, and version banners prevent LLMs from mixing deprecated guidance into current answers—small details that cumulatively protect customer experience.

    If you’re just getting started, begin with three moves: clarify intents, restructure pages into atomic, linkable units, and add metadata that reflects how customers actually search. From there, tighten your retrieval-first pipeline and run regular evals. The payoff is durable: faster time to value for users, lower support load, and AI assistants that answer accurately, confidently, and consistently.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Docs-as-Code Leadership at Scale: How Jeff Scattini Elevates End-to-End Product Documentation

    Docs-as-Code Leadership at Scale: How Jeff Scattini Elevates End-to-End Product Documentation

    Great products aren’t just shipped; they’re understood. In my product management practice, the difference between a good release and a great one often comes down to disciplined documentation that moves at the speed of delivery. That’s why the docs-as-code approach has become a cornerstone of how I build, lead, and measure product experiences across teams.

    As I reflect on leaders who set a high bar in this craft, one description stands out: "With years of experience as Senior Documentation Manager, Jeff leads teams and oversees the end-to-end creation of documentation using docs-as-code methodology." That concise statement captures a model I deeply respect—one that treats documentation as a first-class citizen in the product lifecycle.

    In practice, docs-as-code integrates documentation into CI/CD pipelines, version control, and peer review workflows—exactly how we ship software. This elevates quality, enforces consistency, and accelerates responsiveness to change, all while enabling rigorous content audit and UX writing standards. When documentation evolves with code, it becomes discoverable, testable, and measurable—key traits for scalable product management leadership.

    The downstream impact is tangible. Users ramp faster through onboarding, in-app guides, and product tours because the narrative aligns with the product’s true state at any given commit. Support tickets drop, developers work with greater clarity, and PMs gain the feedback loops needed for continuous discovery. In a product-led growth motion, this clarity compounds—reducing time-to-value and enabling teams to ship confidently.

    Equally important is the leadership pattern behind the methodology: aligning product, engineering, and customer-facing teams around shared truths. I’ve seen empowered product teams operate at their best when documentation is embedded in planning, sprint reviews, and release gates. This creates a single source of truth that scales knowledge, preserves intent, and shortens the path from decision to delivery.

    For me, the standard expressed above isn’t just a role description—it’s a blueprint for operational excellence. When we manage documentation with the same rigor as code, we build trust at every touchpoint and create the conditions for sustained product velocity. That’s the level of clarity and execution I strive to foster across every product line.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Kaizen for the AI Era: Tiny Daily Wins That Build Smarter, Scalable Customer Support

    Kaizen for the AI Era: Tiny Daily Wins That Build Smarter, Scalable Customer Support

    Every day, I challenge my teams to make one small, meaningful improvement—something so lightweight it’s impossible to ignore and easy to repeat. That tiny daily motion compounds, and over time it reshapes customer experience, operational quality, and team culture.

    That’s the essence of Kaizen, the Japanese philosophy of continuous improvement. Developed in post-war Japan and popularized by companies like Toyota, Kaizen proves that small, steady changes lead to significant long-term results. In product management and customer support, this approach transforms big ambitions into daily behaviors that actually stick.

    Crucially, Kaizen isn’t passive or unstructured. It thrives on three principles I reinforce across my org. First, small changes reduce resistance—when you lower the activation energy, teams move faster. Second, improvement is continuous, not occasional; instead of waiting for quarterly reviews or major releases, you ask: “What can we improve right now?” Third, everyone participates—the people closest to the work are best positioned to improve it. That’s how momentum spreads.

    In practice, the cycle is simple: identify a small problem, test the change, measure the result, refine, and repeat. The point isn’t radical transformation in a single swing; it’s steady progress guided by data and observation—a rhythm that aligns beautifully with eval-driven development and continuous discovery.

    At Intercom, we apply this same philosophy to how we manage our Agent Fin through a process we call the “Fin Flywheel”. Here’s how this works.

    Train: Teach Fin how to handle and resolve the most complex customer queries.

    Test: Run fully simulated customer conversations from start to finish to see exactly how Fin will behave before going live.

    Deploy: Launch Fin across all channels so customers get consistent support wherever they reach out.

    Analyze: Use AI-powered insights to review and improve Fin’s performance so it can deliver better customer experiences.

    This isn’t a one-time setup; it’s a continuous loop where every interaction feeds ongoing improvement. Rather than deploying AI and assuming it will perform as expected, improvement is built into the system itself. The more Fin is used, the better it gets. That’s the hallmark of agentic AI done right—tight feedback loops, purposeful conversation design, and clear Agent Analytics that illuminate what to tune next.

    But continuous improvement doesn’t stop with AI. Within our Human Support operations, I emphasize the same mindset that drives great LLMs for product managers: you instrument the experience, learn from real usage, and close gaps fast. We operate with a simple mindset: the first time that you solve a customer issue should be the last time it happens.

    When a conversation reaches a human, we pause to diagnose and prevent recurrence. Why did this reach me? Why couldn’t Fin resolve it? How can we prevent this from happening again? Those questions anchor a culture of root-cause thinking and accelerate product-led growth by removing friction at the source.

    To make this effortless, we’ve built a lightweight, AI-powered way to log suggestions in the moment—no long explanations or heavy admin required. Ideas are reviewed quickly and implemented by subject matter experts or by the team themselves. This keeps the flywheel spinning: insights flow in, fixes go out, and measurable outcomes improve.

    The result is a frontline that evolves from reactive problem-solvers into a proactive improvement engine. The people closest to customers spot friction, suggest fixes, and see their insights shaped into meaningful change. It’s continuous discovery embedded in everyday work, not a side project.

    Kaizen demonstrates that lasting progress doesn’t come from occasional transformation; it comes from intentional, everyday refinement. The “Fin Flywheel” applies that philosophy to AI. Our Human Support continuous improvement process applies it to human insights. Together, they create a shared system where both people and AI learn continuously from customer interactions.

    When improvement is built into the mechanics of how you work, it stops being a one-off project and becomes an ingrained capability. Over time, those small daily improvements don’t just add up—they compound into a sustainable, data-driven advantage that elevates customer experience and differentiates your customer support ai strategy.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • We Built Agent Analytics After Observability Broke—Why Your AI Team Needs It Now

    We Built Agent Analytics After Observability Broke—Why Your AI Team Needs It Now

    I remember the exact moment our product crossed the threshold from scripted automation to truly agentic AI. The excitement was real—so was the pit in my stomach when our dashboards went dark. Our trusted analytics and observability stack, which had served us flawlessly for traditional software, suddenly couldn’t explain what the agent was doing, why it made certain choices, or how to reproduce outcomes across runs.

    "The moment our product became a AI agent, our entire observability stack became irrelevant—not something you want as an analytics company. Here's what we did."

    Why does this happen? Agentic AI doesn’t behave like conventional apps. Instead of deterministic flows and neatly tagged events, we face non-deterministic trajectories, tool-use chains, evolving prompts, context window dynamics, and policy guardrails that influence outcomes in real time. Clicks and pageviews give way to tokens, tool calls, and conversation turns. Without purpose-built observability, you can’t do credible product discovery, measure behavioral analytics, or run eval-driven development with confidence.

    That’s why we built Agent Analytics. We needed a unified lens to trace every step of an AI workflow—from user intent to model prompts, function calls, retrievals, tool outputs, and final responses—while capturing latency, cost, guardrail hits, fallbacks, and outcome tags. We instrumented runs end-to-end, added experiment support for prompt engineering and policy variants, and wired in evaluations so we could turn subjective quality into objective signals the team could act on.

    The impact on product management was immediate. We shortened iteration cycles by making failure states obvious and reproducible, turned ambiguous feedback into structured data, and gave engineers and designers a shared source of truth for conversation design and AI workflows. With visibility into containment, escalation, autonomy ratio, and step-level success, we could ship confidently, rollback safely, and align roadmap bets to measurable outcomes—not anecdotes.

    Building this capability demanded more than logging. We invested in data governance and privacy-by-design to mask sensitive content while preserving semantic context, and we separated human-identifiable data from model telemetry. We treated prompts and policies like code—versioned, diffable, and safely rolled out behind feature flags and CI/CD—so we could experiment without risking regressions in production.

    What should every team measure? Start with outcome quality (task success, resolution, containment), reliability (tool success rate, guardrail triggers, fallbacks), performance (time-to-first-token, total latency, step-level latency), and efficiency (tokens and cost per successful task). Add groundedness checks for retrieval steps, regression evals for core journeys, and post-release anomaly detection to catch drift before users do. These metrics become your operating system for agent performance and your compass for product strategy.

    If you’re building or scaling AI agents, you need Agent Analytics before you hit your first incident. It’s the difference between guessing and knowing—between reactive firefighting and proactive iteration. With the right observability, your team can move faster, manage risk intelligently, and translate agent behavior into business outcomes that compound over time.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • The CPO Playbook I Wish I’d Had: Ditch Bad Wisdom, Ship Faster, and Lead with Clarity

    The CPO Playbook I Wish I’d Had: Ditch Bad Wisdom, Ship Faster, and Lead with Clarity

    I keep a running list of product wisdom that sounds great on a slide but quietly sabotages execution. Recently, I revisited that list after a deep conversation with a seasoned CPO from a leading security and compliance platform and reflected on how these lessons show up in my own operating rhythm. What follows is my practical playbook for scaling product organizations without losing speed, quality, or the soul of the product.

    Most big-tech veterans struggle when they leap into startups because the safety net of process disappears. At a startup, the buck truly stops with you—there’s no committee to shield a decision and no process to rescue a weak plan. The mindset shift is simple to say and hard to do: own outcomes end to end, reduce your reliance on institutional scaffolding, and make decisions with incomplete information while keeping standards high.

    “Great product leaders stay in the details.” I sample artifacts every week—PRDs, design flows, user research notes, postmortems—and I read customer threads to calibrate my intuition. To maintain shipping velocity as headcount grows, I instrument a few critical indicators (deployment frequency, change failure rate) and favor outcomes over output. Data guides my attention; it never replaces judgment.

    As teams scale, I use a blunt rule to keep speed high: small autonomous teams, small batch sizes, short feedback loops. One clear owner, one prioritized backlog, and weekly demos to customers. We ship thin slices, not big bangs. And “Great CPOs should avoid comfort metrics”—the easy dashboards that rise when nothing meaningful is moving. I push for outcome-centric OKRs tied to customer value, not vanity charts.

    Rigid hierarchies derail quality decision-making. They slow signal, encourage escalation theater, and suppress the truth from the edges. I shorten paths between PMs, engineers, designers, research, and go-to-market leads, and I strip out stage gates that don’t add learning. Above all, I refuse to “Stop making your team fetch rocks”—randomized executive requests without context. Instead, I frame clear problem statements, explicit constraints, and observable success criteria.

    Revenue and product can feel at odds, but they don’t have to be. The key to a quality CPO and CRO relationship is a shared operating model: one customer narrative, a joint pipeline of problems worth solving, and a common scorecard. We meet weekly, review the same signals, and align on sequencing: what we solve now for impact, what we stage for scale, and what we sunset to reduce complexity. When trade-offs get tough, we anchor on customer value and long-term defensibility.

    Who ultimately oversees the quality bar? I do—and I do it through clarity, exemplars, and consistent feedback loops, not micromanagement. When I leave feedback, I make it actionable and specific: name the user scenario, note the friction, propose a sharper decision frame, and suggest a smaller, testable slice. I expect narrative memos and crisp acceptance criteria; I offer rapid, detailed responses so momentum never stalls.

    Open office hours are my forcing function for transparency and speed. Anyone can bring a thorny escalation, a design in progress, or a customer insight. Pair that with weekly 1:1s—non-negotiable for developing leaders and unblocking work—and the organization learns to surface issues early, make faster decisions, and self-correct without drama.

    Here’s a glimpse into my working week: Mondays set priorities and confirm the few decisions that matter; midweek is for deep reviews across roadmap, research, and engineering readiness; Thursdays I’m with customers and partners; Fridays I write and synthesize. I leave space for unscripted time with individual contributors—because ICs are the unsung heroes of a company—and I celebrate excellent craft out loud.

    The hardest leadership skill is knowing when to push and when to give space. I push on clarity, sequencing, and quality; I give space on solutions and implementation paths. I reject comfort metrics, reinforce outcomes vs. output, and keep the organization close to customers and details. If you’re stepping from big tech into a startup or scaling your product org through rapid growth, these practices will help you ship faster, decide better, and raise the quality bar without burning out your team.


    Book a consult png image
  • Outcomes vs Outputs: How I Stopped the Feature Factory and Drove Real Product Impact

    Outcomes vs Outputs: How I Stopped the Feature Factory and Drove Real Product Impact

    “Outcomes over outputs” is the right mantra—and one I’ve championed across product teams—but turning it into daily practice is where most teams stumble.

    It’s simple in theory: focus on the impact of what we build, not just shipping features. In reality, it’s rarely black and white because most teams are asked to do both—hit outcomes and deliver specific outputs—at the same time.

    In a benchmark survey, 20% of product teams claim to be outcome-focused, nearly half describe themselves as working in a mix of outcomes and outputs, and about 30% are still primarily working with outputs. I’ve seen versions of this in my own org: we aspire to outcomes, but our rituals, roadmaps, and reporting still reward shipping.

    Here’s how I draw the line clearly, coach my teams to avoid common traps, and negotiate better, more actionable outcomes that unlock genuine product discovery and business results.

    Simple definitions we live by

    An output is something you build or produce—a feature, a project, an initiative. It’s something your team ships.

    An outcome is the impact of that output—a change in customer behavior or a business result.

    Josh Seiden puts it well in his book Outcomes Over Output: “An outcome is a change in human behavior that drives business results.”

    Infographic comparing outputs vs outcomes in product management: outputs are what you ship—feature, project, integration; outcomes are what changes—customer behavior and business results; arrow notes where value happens.
    Shift from shipping to shaping results. This graphic clarifies outputs vs outcomes, revealing that value emerges between deliverables and impact—when features change customer behavior and move business results.

    I distinguish business outcomes from product outcomes. Business outcomes are typically financial metrics that measure the health of the business (e.g. increase revenue or reduce costs) while product outcomes measure a customer behavior in the product or a sentiment about the product.

    Here’s a simple example I’ve used with platform teams. Many B2B companies support a number of integrations. Integrations are outputs. Having integrations alone doesn’t create value. Customers using and finding value in those integrations—that’s an outcome. If those customers retain their subscriptions longer because of the integrations—that’s also an outcome.

    Building something isn’t the same as creating value. That’s the core of this distinction, and it’s what separates empowered product teams from feature factories.

    Why this distinction matters for empowered product teams

    When we task teams with delivering outputs, they’re done when the software ships. When we task teams with delivering outcomes, they aren’t done until the software ships and has the expected impact.

    That small shift changes almost everything about how a team works: what we measure (impact, not just delivery), how we know we’re done (measurable behavior change, not release notes), the autonomy we grant (told what to achieve, not what to build), and the planning artifacts we use (an opportunity solution tree beats a feature roadmap when we’re exploring the best path to an outcome).

    When I assign outcomes, I’m giving the team latitude—and responsibility—to figure out the best path to success. That’s what opens the door for real product discovery and continuous discovery habits.

    Infographic comparing output-driven vs outcome-driven teams, covering metrics measured, team autonomy, definition of done, and planning artifacts: feature roadmap vs opportunity solution tree.
    Shift your lens from shipping features to achieving impact. This side-by-side visual explains how outcome-driven teams measure success, grant more autonomy, define 'done' by results, and plan with an opportunity solution tree.

    Examples: spotting outputs disguised as outcomes

    Clear-cut example: “Our outcome is to deliver an Android app.” An Android app is something we build and ship. It’s clearly an output.

    To get to an outcome, I ask, “What’s the value of having an Android app?” or “How will we know the Android app is successful?”

    We might answer: “Having an Android app will allow us to engage more users. We’ll know it’s successful when people engage with the app on a regular basis.”

    This answer uncovers the hidden outcome: engage more people. Now we can set the right scope: increase the percentage of engaged users across any platform; increase the percentage of engaged mobile users; or increase the percentage of engaged Android users.

    Any of these outcomes gives us more room to explore than a fixed output. Maybe we don’t need a native app at all. We could deliver the same engagement through a mobile web experience, notifications, or email. And we’re not done when we ship—we’re done when the right people are actually engaged.

    Tricky example 1: measure the value creation moment (hires, not applicants)

    Infographic showing shift from output to outcome: build an Android app -> ask when it is successful -> increase engaged users. Highlights value, goals, and accountability in product management.
    Move beyond shipping features to the impact that matters. This visual maps the path from build an Android app to the real goal, increase engaged users, by asking why, defining value, and owning results.

    When setting outcomes, it’s tempting to choose the easiest-to-measure metric. But a good outcome measures the customer’s value creation moment.

    I worked at a company that helped new college grads find their first job. When I started working there, the primary outcome was “increase job applications.” This technically is an outcome—it measures a specific behavior in the product.

    But it doesn’t measure the value creation moment. A job seeker doesn’t get value when they apply for a job. They only get value when they get the job. Similarly, employers don’t get value from any job applicant, they get value when the right job applicant applies.

    Many job boards try to measure qualified applicants—instead of counting any applicant, they compare the credentials of the applicant to the job description and only count qualified applicants. This is better. But it still doesn’t measure the value creation moment. Both the job seeker and the employer get value when an open job is successfully filled. The right metric is hires.

    Yes, “hires” can be hard to instrument because it happens off-platform and incentives misalign. Measure it anyway, even with proxies. The easy metric isn’t always the right outcome.

    Tricky example 2: measure impact, not user-generated output (the course reviews trap)

    I worked with a team that helped students choose university courses. They set their outcome as: “Increase the number of course reviews on our platform.”

    Infographic titled '4 Outcome Traps to Avoid' for product teams, highlighting wrong moment, output in disguise, traction trap, and sentiment alone with concise guidance.
    Confusing activity with impact? This visual breaks down four common outcome traps—measuring at the wrong moment, mistaking outputs, chasing adoption, and relying on sentiment—so teams focus on real value.

    Sounds like an outcome, right? It’s a metric. You can measure it. It’s an action users take on the site—writing a review. But it’s actually an output in disguise.

    Reviews are valuable when they help a student evaluate a course. They don’t create any value if a student never sees them. More reviews aren’t always better, especially if they’re clustered where nobody looks.

    A better outcome is “Increase the number of course views that include reviews.” Now we’re measuring impact on the decision moment, not just the production of content.

    If you can hit your metric without helping customers, you’re tracking an output, not an outcome.

    Tricky example 3: measure success, not just adoption (the traction metric trap)

    “Increase the percentage of users who viewed the performance report.”

    This looks like a good outcome. It measures a specific behavior in the product. It’s within the team’s control. But it’s what I call a traction metric—it measures adoption of a single feature, not value to the customer.

    Infographic 'Why Teams Stay Stuck on Outputs' with a trust cycle—manager micromanages, team reports features, manager stays in details—and an accountability trap about safe targets and disguised outputs.
    Why teams get trapped in shipping features: a vicious trust cycle fuels micromanagement, while performance-linked outcomes push safe targets. Break the loop and refocus on customer outcomes that truly move the needle.

    Two problems arise. First, people can view the report and still not find what they need. Second, we might have perfectly happy customers who don’t need the report at all. Driving usage of an unneeded feature wastes time and erodes trust.

    Measure the value creation moment, not just feature adoption.

    Tricky example 4: pair sentiment with behavior

    I define a product outcome as a metric that measures either 1. a specific behavior in the product or 2. a sentiment about the product. But sentiment metrics—like CSAT or NPS—can be tricky on their own.

    Sentiment metrics are outcomes, but they aren’t directional. They don’t tell us where to explore or set guardrails for what to avoid. So I pair a behavior with a sentiment, for example: “Increase engagement without negatively impacting satisfaction.” I use sentiment as a counterweight.

    Facebook and Instagram illustrate why this matters. Meta is exceptional at driving engagement—but to a fault. Many of us don’t like these addictive products. Pairing engagement with a satisfaction guardrail prevents “engagement at all costs.”

    Why getting this right is hard (and how I counter it)

    Infographic, 'How to Make the Shift,' shows five steps to move teams from outputs to outcomes: translate metrics, negotiate with teams, expect iteration, watch for traps, and go deeper.
    Ready to move from shipping features to creating impact? This visual playbook shares five practical moves—translate metrics, partner with teams, iterate, avoid traps, and dig deeper—to turn outputs into measurable outcomes.

    The trust cycle. Managers don’t trust that teams can reach outcomes on their own. So managers micromanage the outputs. Teams, in turn, don’t communicate their progress toward outcomes—they communicate their progress on features. This reinforces the manager’s belief that they need to stay involved in the details. It’s a vicious cycle.

    I break it by asking teams to show their work—share assumptions, research, opportunity solution trees, and evidence behind choices—and by giving feedback on the thinking, not just the solutions.

    The accountability trap. When performance reviews are tied to hitting outcomes, teams play it safe. They sandbag their targets. They disguise outputs as outcomes to guarantee “success.”

    I treat outcomes as learning opportunities first. When we start on a new outcome, I set a learning goal—“learn what moves the needle on this metric”—before a performance goal—“increase X by Y%.” This creates space to explore without fear.

    How I get teams started with better outcomes

    Translate business outcomes to product outcomes. Business outcomes like revenue, retention, and market share are lagging indicators—by the time you see them, it’s too late to act. Product outcomes measure behavior changes within the product that lead to those business results. They’re leading indicators within the team’s control.

    Negotiate outcomes with your team. Outcome-setting should be a two-way conversation. Leadership brings the cross-company context. The team brings customer insight and technical realities. Neither side dictates; we co-own the target and the constraints.

    Infographic on outcomes vs outputs in product management: side-by-side panels show Feature Factory (measure what you ship) versus Product Team (measure what it changes), highlighting the shift to impact.
    Stop celebrating shipped features and start celebrating change. This visual contrasts a feature factory mindset with a true product team, urging teams to track impact, not output, and define success by outcomes.

    Expect to iterate on your metrics. Your first outcome metric probably won’t be right. That’s normal. Sonja at tails.com went through four iterations—from 90-day retention to 30-day to 5-day to behavior-based metrics—before landing on something actionable. Thomas at Bluestone Analytics iterated three or four times before finding the right metric. Iteration is the work.

    Watch for common mistakes. Outputs disguised as outcomes. Traction metrics masquerading as product outcomes. Sentiment metrics without direction. Business outcomes assigned directly to product teams without translating to behavior change.

    Use the right artifacts. Replace feature roadmaps with an opportunity solution tree to explore multiple paths, test assumptions, and sequence bets explicitly against a clear outcome.

    Align OKRs with outcomes. If your company uses OKRs, make sure the “KR”s are true product outcomes (behavior change and value creation), not a list of features to ship.

    The bottom line

    When we shift from an output-first mindset to an outcome-first mindset, it doesn’t mean that outputs stop mattering. Product teams will always ship features, and the ability to do so quickly and with quality still matters. This shift simply ensures those features achieve the intended impact. We aren’t done when we ship—we’re done when what we shipped has the intended impact.

    Measure success by the impact of what you ship and you’ll build a product team that learns, adapts, and creates real value. Measure success by what you ship and you’ll get a feature factory.

    Quick self-check: is your “outcome” really an outcome?

    Ask yourself: 1) Does it measure a behavior change or a sentiment tied to value creation? 2) Could we hit it without helping customers? 3) Is it adoption of a single feature (a traction metric) or a result that customers and the business care about? 4) Do we have a counter-metric to prevent unintended harm? If you stumble on any of these, refine it before you commit.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Staying Sane as a Product Leader: Practical Strategies I’m Using from Teresa Torres & Petra Wille

    Staying Sane as a Product Leader: Practical Strategies I’m Using from Teresa Torres & Petra Wille

    The world can feel like it’s spinning, and as a product leader, I feel that pressure acutely—juggling customer needs, stakeholder expectations, and the relentless news cycle. I recently listened to a powerful conversation with Teresa Torres and Petra Wille about staying grounded when everything feels “bonkers,” and it offered a practical, human way to keep showing up without losing yourself.

    What resonated most was the invitation to live my values through small, consistent actions. Rather than waiting for grand gestures or perfect solutions, I’m leaning into the mindset of “Something is better than nothing.” It’s the same spirit we bring to continuous improvement in product: make a change, evaluate impact, iterate.

    “Create the world you want to live in” has become a daily prompt for me. I’m applying it to how I spend my attention, time, and platform—three scarce resources for any product management leader. I’m not going to do everything perfectly, but I can make better trade-offs this week than I did last week, and I can keep improving.

    Practically, that looks like reconsidering which speaking invites I accept, especially when representation is skewed. If a stage is heavily male, I now ask organizers about their plan for balance before committing. I also question travel expectations for short talks when a high-quality virtual experience is possible—good for sustainability, budgets, and energy. These choices compound, just like product roadmapping and sprint planning decisions.

    Petra’s “under-complexity” lens was a wake-up call. In product, oversimplified narratives—whether a single KPI, a vanity metric, or a forced binary—usually increase fear and bad decisions. The same is true in civic discourse. To counter that, I’m seeking more nuance on purpose: reading multiple sources on the same story, listening for who’s not in the room, and noticing how the same facts can carry different meanings depending on who’s telling it.

    One simple habit helps: I’ll read The New York Times and The Wall Street Journal on a headline, then follow up with Tangle by Isaac Saul, which lays out “what the left says / what the right says / editor’s take,” sometimes including perspectives from affected communities. It’s a lightweight form of personal knowledge management that improves my product judgment and my citizenship.

    Another idea that stuck with me is swapping media proxies for human connection. In product, we don’t ship based on secondhand opinions—we run customer interviews, co-create with users, and build empowered product teams. The same principle applies in community: talk to someone directly affected, ask real questions, and stay curious. When conversations get heated, I try to build bridges, reduce proxies, and look people in the eye.

    I’m also reflecting on platform responsibility. Even a “small” platform can snowball through weak ties inside a company or community. I’m asking: When should I speak up? Where should I draw lines? And when is “staying in your lane” actually a way to avoid necessary leadership? These are the same stakeholder management questions we navigate in product strategy—assess impact, clarify intent, and act with integrity.

    Local grounding matters, too. I’ve found energy and clarity in community-level action: voting, attending public protests when it feels right, mentoring, and supporting nonprofits like World Pulse. I love the framing of “don’t mess with my neighbors”—it keeps me focused on tangible care when the internet starts to feel like reality. I’ve also seen leaders use angel investing in agriculture-related efforts as a counterbalance to “internet reality,” channeling resources into durable, real-world outcomes.

    If you want to experiment this week, pick one small lever you control: where you spend money, time, attention, or your platform. Add nuance by reading at least two different perspectives before reacting. Replace proxies with people by talking to someone with lived experience. Reduce polarization by asking, “what shaped that view?” before judging it. And go local—connect with neighbors or a community group and let small actions compound.

    If you’d like to hear the full conversation that inspired these reflections, you can listen on Spotify or Apple Podcasts. Here are the direct links: Spotify: https://open.spotify.com/episode/1sxEFquu73ZB9fL9gGk6Om and Apple Podcasts: https://podcasts.apple.com/kh/podcast/staying-sane/id1794203808?i=1000755696295

    Resources I’m exploring and recommend: World Pulse (https://www.worldpulse.org/), The New York Times (https://www.nytimes.com/), The Wall Street Journal (https://www.wsj.com/), and Tangle by Isaac Saul (https://www.readtangle.com/ and https://www.readtangle.com/author/isaac-saul/). For builders and writers, I also appreciate Ghost (https://ghost.org/) as an open-source publishing platform. If you work in or with the MENA ecosystem, take a look at MENA Product Summit ’26 (https://www.prdkt.plus/summit26). Colleagues like Jeff Merrell (https://jeffdmerrell.com/) and grassroots efforts such as No Kings Protest (https://www.nokings.org/) offer additional perspectives and ways to get involved.

    If this resonates, share it with a teammate who’s been feeling the weight of the world. I’d love to hear one small, values-aligned action you’re taking this month—what “something” will you try next?


    Inspired by this post on Product Talk.


    Book a consult png image
  • How We Automated 81% of Customer Support with AI—While Uplifting CX, Speed, and ROI

    How We Automated 81% of Customer Support with AI—While Uplifting CX, Speed, and ROI

    Leading the Support function for a company that builds a leading Agent and AI-forward customer service platform has been, for me, unique, exciting, and yes—daunting. It’s where product ambition meets operational reality, and where every decision I make is immediately tested by customers who expect excellence.

    It’s unique because we use the same technology as our customers. We live in the product every day, which puts us in a privileged position to be the voice of the customer across the organization. That tight feedback loop has shaped how I prioritize, what I build next, and how I measure success.

    It’s exciting because we get to try all of the new features and capabilities of Fin and the Intercom helpdesk. With a relentless focus on AI innovation, I’ve had access to remarkable tools that help us deliver an incredible customer experience—and I’ve seen firsthand how the right workflows and guardrails turn those tools into outcomes.

    And it’s daunting because expectations for our own Customer Support (CS) team are sky high. If we can’t deliver incredible support using our own technology, we undermine its value proposition. That imperative has kept me honest, focused, and fast.

    In our new research, “The 2026 Customer Service Transformation Report,” we’ve been sharing how forward-looking teams use AI to transform their support models. If you’d like to get straight to the report, download it here.

    When Intercom changed its focus in late 2022 to prioritize the customer service use case, we undertook a critical review of the support experience we were delivering and committed to driving meaningful change under an AI-first framework. That was a turning point: I aligned product strategy and operations around a single north star—automate with quality, and elevate humans to higher-value work.

    Three years on, Fin now resolves over 81% of all our customer support volume, delivering immediate and high-quality resolutions. We have absorbed a 300%+ increase in customer demand since 2022 without proportional headcount growth. Without Fin, we would have needed at least 100 additional CS team members to meet that demand and our improved service levels – a net saving to Intercom of between $7.5M–$9M annually.

    Throughout this work, we drew on research from the 2026 Customer Service Transformation Report and applied the lessons directly to our own org design, knowledge management, and AI workflows. What follows is our story of transformation and how we achieved a mature deployment of Fin.

    The problems we set out to solve

    Back in 2022, our challenges looked familiar to any modern support organization, and I knew we needed a step-change—not incremental tweaks.

    We faced increased support demand from new and existing customers: Intercom was launching major features and changes at speed, driving up overall customer conversation volume and requiring additional headcount for the CS team. I could see we were scaling people faster than processes—unsustainable without automation.

    Our support policy (as defined by our service level objectives) was not based on a high bar: In most cases, we were only committed to “business hours” coverage for the majority of our customers, impacting first response times. Even with SLOs that were not considered best in class, we were struggling to meet our commitments. I wanted 24/7 coverage and faster first responses without sacrificing quality.

    We wanted to do more: As we pivoted our strategy, we wanted to open new routes to our support team, such as providing support to website visitors with technical questions and to trial customers. That meant meeting customers earlier in their journey with accurate, on-brand responses—at scale.

    What we did

    We made a very conscious decision to become our own best reference customer. As Intercom embraced the opportunity that generative AI presented to transform customer service, we intentionally moved to an AI-first strategy for our Customer Support team. I set a simple operating principle: ship value quickly, measure relentlessly, and let evidence guide the next bet.

    We started with the highest-volume, informational queries and saw our resolution rates climb quickly. With that foundation in place, we pushed Fin further, training it on deeper documentation and internal procedures, and eventually giving it the ability to take actions on behalf of customers. As Fin took on more complex work, our results started to compound—and trust in the system grew across the organization.

    Early adoption and building trust. When “AI Assist” features came to the Intercom Inbox, the CS team got early exposure to AI and were empowered to provide feedback directly to our product teams. This built awareness and trust across the team about what we were trying to achieve with AI, and helped shape the product roadmap. We were also the first beta customer for Fin, rolling it out to a subset of customers to watch sentiment and outcomes closely. With no adverse reaction and an initial resolution rate of over 25%, we deployed Fin to most customer segments within weeks. I’ll never forget the first week we put Fin in front of real customers—the silence of issues that never reached humans was the loudest signal of success.

    Knowledge management as a product. We recognized quickly that time spent tuning our help center and knowledge assets for Fin would pay dividends. We transitioned our Help Center Manager into a “Knowledge Manager,” with a dedicated remit to optimize content for Fin. We embedded knowledge creation into our “New Product Introduction” (NPI) process, targeting that Fin would resolve at least 50% of customer issues at every new product and feature launch. Over time, we added new sources, including “Developer Documents,” enabling Fin to handle increasingly complex issues. We built a culture of continuous improvement—allocating “out of the inbox” time so every teammate could close content gaps and raise the bar.

    Conversation design end-to-end. To ensure a consistent, high-quality customer experience, we created a new “Conversation Designer” role that owns the journey across automation and human handoffs. Using Intercom’s Workflows, we introduced “skills-based routing” so that when a customer asks for a human, the conversation reaches someone with the right expertise quickly. This is now handled by Fin directly using a feature called “Attributes.” The result: a seamless, on-brand experience regardless of channel or escalation path.

    Neon green hero graphic reading 'The 2026 Customer Service Transformation Report', with subhead 'The AI deployment gap is widening' and a black 'Get the report' button over a bar-chart pattern.
    Leaders are racing ahead with real AI in support. Explore the 2026 Customer Service Transformation Report to see where deployment is stalling, benchmark your team, and get practical steps to scale automation that delights.

    Organization changes that unlocked leverage. As we scaled Fin, we stood up a dedicated AI Support team under a senior CS leader to continuously optimize automation and define our AI adoption strategy across the journey. We restructured human roles into “Technical Support Specialist” and “Technical Support Engineer” to better align with the complexity of incoming work. We also expanded Support Operations to focus on optimization—using AI to uplevel Enablement, Workforce Management, QA, Process Management, and Data Insights. Just as important, we reset expectations about the balance between time spent supporting customers directly versus improving AI. That mindset shift created compounding returns.

    Pushing Fin further with new capabilities. As capabilities matured, we were early adopters and saw measurable wins:

    Fin Guidance: Multiple Guidance rules provide additional controls and a more personalized, targeted experience for customers.

    Fin Tasks and Procedures: Enables Fin to carry out activities such as updating customers on incident status and deep troubleshooting for technical issues.

    Insights: AI-driven dashboards provide deep insight into Fin’s performance and surface recommendations for further optimization. Insights also provides a Customer Experience (CX) Score for every customer interaction, enabling more targeted improvement efforts and opening up new ways to close the loop with customers who have had a poor experience.

    What we achieved

    What started as a focused effort to improve our customer support experience became the strongest proof point for what’s possible when you fully embrace AI. Fin now resolves over 81% of all our customer support volume and has allowed us to absorb a 300%+ increase in demand without proportional headcount growth. Over 90% of our customers now benefit from improved first response performance, 24/7 coverage, and outbound phone support.

    What the numbers don’t fully capture is the shift in how our team operates. With volume absorbed by Fin, our CS teammates now deliver consultative support—guiding next best actions, deepening product adoption, and contributing directly to retention and expansion. Customers that receive these engagements adopt Fin at a much deeper level and achieve greater support success. What was once a reactive, volume-driven team is now a function that generates significant revenue.

    What’s next

    Customer expectations are always rising, so we’re building on our progress by embracing the Fin Flywheel—an actionable framework for ongoing improvement and optimization. This keeps us honest about the discipline required to sustain AI performance at scale.

    Train: Teach Fin to resolve even the most complex queries with Procedures, knowledge, and policies.

    Test: Run fully simulated customer conversations from start to finish to see exactly how Fin will behave before going live.

    Deploy: Set Fin live across every channel – voice, email, chat, and social – for consistent support wherever customers reach out.

    Analyze: Use AI-powered Insights to analyze and improve Fin’s performance and deliver better customer experiences.

    We are also investing in our support teammates so they can adjust to the new world of AI—taking on more complex work and being valued for the subject matter expertise, consultative engagement, and empathy they bring to the role. That human layer is where differentiation shines.

    We will continue to develop and share best practices for deploying an Agent, based on our own experience with Fin and the lessons learned from our most forward-looking customers. These are captured and continually evolving in The Agent Blueprint.

    Transformation takes commitment

    The most successful teams aren’t bolting AI onto old processes; they’re rebuilding support around it—investing in knowledge and people alongside technology, and treating AI as a continuous discipline rather than a one-time deployment. That’s the real change required. For support teams willing to make it, there’s a rare opportunity to redefine what customer service can deliver—higher CSAT, faster resolution, and durable ROI.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • From Resolutions to Outcomes: How We Price AI Agents Fairly and Amplify Customer Value

    From Resolutions to Outcomes: How We Price AI Agents Fairly and Amplify Customer Value

    I’ve long believed a simple truth about AI in customer support: if AI is going to earn trust, pricing has to be aligned with value. That principle has guided my product decisions and the way I hold our teams accountable for measurable outcomes, not activity.

    When we shared our perspective on pricing AI Agents in 2023, we made a simple argument: if AI is going to earn trust, pricing has to be aligned with value. At the time for Fin, that value was clear. You pay when the AI resolves a customer’s problem. If it doesn’t, you don’t. That’s fair, easy to understand, and grounded in results, not activity. We were the first to introduce this pricing model because we believed that pricing and value should be inherently linked.

    That belief hasn’t changed, it’s grown stronger over time. What’s changed is what Fin can do. As we expanded capabilities and pushed deeper into complex workflows, it became clear that measuring value solely by end-to-end resolutions no longer captured the full picture of impact.

    Resolutions were the right place to start. Historically, we measured value based on whether Fin fully resolved a conversation on its own. These are known as resolutions and they gave support teams a clear way to measure ROI, easily comparing the cost of AI versus human support. They also aligned our incentives with our customers, as our revenue was directly tied to Fin’s performance.

    That clarity worked. Today, more than 7,000 teams use Fin. Our average resolution rate across customers has increased every month and now stands at 67%, even as Fin increasingly handles more complex queries. That progress came from building an Agent that could take on harder problems and still deliver.

    But as Fin got more powerful, “success” stopped being binary. I saw this first-hand in customer design sessions where policy, risk, and compliance needs rightly demanded human-in-the-loop confirmation. We weren’t failing to deliver value; we were delivering it differently.

    Over the last couple of years, we invested heavily to ensure Fin could handle the most complex parts of support. As Fin’s capabilities expanded, customers began pushing what Fin can do for them by deploying Fin deeper into their workflows to handle the toughest queries.

    In some cases, this required Fin to work in tandem with a human agent because that’s what customer policies and oversight needs dictated. Subscription changes, transaction disputes, billing issues, and other multi-step support scenarios can often require Fin to gather context, read and write to external systems, and execute actions before handing off to a human agent for confirmation.

    Fin is still doing what it was configured for – intentionally handing off after doing more of the heavy lifting, saving valuable time for support teams and overall time to serve for their customers. But our pricing metric only recognized value when the conversation ended in a full “AI resolution” (i.e. a human was never involved).

    That’s why we’re evolving Fin’s pricing metric from resolutions to outcomes. This shift reflects how customers now define value: not just in full automation, but in safe, efficient progress toward the right result across complex, multi-step, and policy-constrained workflows.

    An outcome represents when Fin successfully completes the action it was configured to perform, as part of a conversation. Resolutions are still one type of outcome Fin can deliver, where it handles the issue end-to-end. Another type of outcome can be a Procedure where Fin gathers context, takes action, and hands the conversation off when that’s what customers configured it to do.

    Promotional banner reading "Get started with the #1 Agent today" over a dark, aurora-like gradient background, featuring a white button labeled "Start a free trial"; marketing graphic for an AI support agent.
    Kick off your journey with the #1 Agent—an AI partner designed to turn resolutions into real outcomes. Tap “Start a free trial” to explore faster, smarter customer service and see how Fin delivers value from day one.

    Increasing end-to-end AI resolutions is still a core component of scaling Agents, but they are no longer the only measure of Fin's success and utility. Especially as Fin takes on more complex work. Moving to outcomes recognizes that solving a customer problem with full automation isn’t always appropriate. It’s about getting to the right result, safely, and efficiently.

    As Fin’s capabilities expand, teams should feel empowered to use it in more nuanced, collaborative work. Outcomes support that by allowing customers to design workflows that meet compliance requirements and include a human agent when necessary. From a product management standpoint, this is how we align incentives, keep risk controls intact, and still accelerate time-to-value.

    Fin is becoming even more powerful at handling complex, multi-step support queries. With outcomes, we can support that growth without constantly reinventing how value is measured. And this change gives us a strong pricing foundation that can scale as Fin continues to grow and take on more roles beyond service. This aligns with our vision of Fin becoming a “Customer Agent,” capable of handling the entire customer experience.

    What this means for pricing is intentionally straightforward. An outcome will be counted when Fin successfully completes an action it was configured to perform, as part of a conversation. That keeps the model predictable for finance leaders while staying transparent for operators and product teams managing AI workflows.

    The pricing model stays simple and the definition of value becomes more accurate. In other words, we’re doubling down on fairness, predictability, and competitiveness—core tenets for any consumption SaaS pricing strategy tied to real business impact.

    When we first wrote about outcome-based pricing, we said that trust is the currency of AI. That’s still true. Trust is earned when customers see pricing move in lockstep with utility and risk posture, especially as gen AI and agentic AI take on higher-stakes tasks.

    Pricing has to feel fair, it has to be predictable, and it has to stay competitive. Evolving from resolutions to outcomes isn’t a departure from that belief. It’s the natural maturation of how we measure value as AI moves from simple Q&A into complex procedures and human-in-the-loop collaboration.

    Fin has grown more powerful because customers asked more of it. Outcomes are how we reflect that progress honestly, while staying true to the same principles that guided us from the start. This is product strategy in action: align incentives, measure what matters, and scale what works.

    And as Fin continues to get stronger, we’ll keep holding ourselves to the same standard: price based on the value delivered. That’s how we build durable trust, sustainable ROI, and a better customer experience at scale.


    Inspired by this post on The Intercom Blog.


    Book a consult png image