Tag: customer support ai strategy

  • Inside Pendo’s Decision: Replacing the Website Chatbot With an AI Agent to Boost ROI

    Traditional website chatbots promised instant answers but rarely delivered the depth, context, and actionability modern buyers expect. After seeing patterns of high drop-off and shallow engagement, I stepped back and reframed the problem: We did not need another scripted bot—we needed an AI Agent capable of understanding intent, personalizing responses, and taking meaningful actions in the flow of discovery.

    That is why Pendo replaced the website chatbot with an AI Agent. From a product management lens, the decision hinged on three criteria: accelerate time-to-value for visitors, reduce operational overhead through automation, and improve the quality of demand captured at the top of the funnel. An agentic AI approach met all three.

    Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.

    This statement crystallizes the business case. An AI Agent can translate product intent into measurable outcomes by connecting to knowledge sources, analytics, and workflows. Instead of handing off a prospect to a form or a static knowledge article, the agent can surface relevant guidance, qualify interest, book meetings, and even trigger product tours—closing the loop between marketing, product, and customer success.

    We anchored the implementation in data governance and privacy-by-design. That meant carefully curating training corpora, instituting role-based access controls, applying guardrails for sensitive topics, and designing graceful human-in-the-loop fallbacks. The result was not just a smarter front door, but a safer one—critical for regulated buyers and enterprise stakeholders.

    To validate impact, we ran disciplined A/B testing with a clearly defined minimum detectable effect across conversion, engagement depth, and time-to-response. We also monitored secondary signals such as escalation rate to human support, session quality, and downstream product adoption. Early signals showed more qualified conversations, fewer dead ends, and faster paths to value—exactly the outcomes a product-led growth motion requires.

    The experience uplift did not stop at the website. By aligning the agent with in-app guides and product tours, we created continuity from pre-signup exploration to onboarding and activation. Visitors received consistent, contextual help before and after they became users, which strengthened our product positioning and reduced friction across the journey.

    Operationally, the shift lowered the marginal cost of each high-quality interaction while improving reliability. Agent handoffs to sales or support became intentional rather than reactive, and insights from conversations fed directly into product discovery. That closed feedback loop informed roadmap decisions and sharpened our go-to-market strategy.

    If you are considering a similar move, start with a clear AI Strategy tied to measurable outcomes, a robust governance model, and a pragmatic rollout plan. Focus the agent on high-intent moments first, surround it with analytics and experimentation, and let the data guide expansion. The goal is not to replace humans—it is to elevate them by letting the AI Agent handle the repetitive, high-volume work so your teams can focus on complex, high-value interactions.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image
  • Unify Your Analytics to Accelerate Growth: Cut Costs, Move Faster, and Decide in Real Time

    Unify Your Analytics to Accelerate Growth: Cut Costs, Move Faster, and Decide in Real Time

    I’ve learned that the fastest way to stall growth is to scatter your data across a maze of dashboards and point solutions. My guiding principle is simple: Escape fragmented tools with a unified analytics platform that accelerates growth, reduces costs, and empowers smarter, real-time decision-making. When every team can trust a single source of truth, momentum compounds.

    By “unified analytics,” I mean a single platform that integrates product, marketing, sales, support, and finance data with consistent definitions, shared metrics, and strong governance. The right foundation pairs real-time instrumentation and event streaming with standardized taxonomies and role-based access. This is what transforms raw data into reliable insight that product managers and executives can act on with confidence.

    Growth accelerates when hypotheses move faster from discovery to delivery. A unified analytics platform tightens the experimentation loop, informs product discovery, and aligns product roadmapping and sprint planning with measurable outcomes. It anchors outcomes vs output OKRs in trustworthy metrics, so QBRs and executive reviews focus on impact, not anecdotes. The result is clearer prioritization, sharper bets, and faster compounding wins.

    Costs come down just as decisively. Consolidating analytics reduces redundant SaaS, manual reporting, and bespoke pipelines that are expensive to build and maintain. With one data model, we cut duplication, improve data quality, and negotiate smarter under consumption SaaS pricing. Teams spend less time wrangling CSVs and more time shipping value.

    Real-time decision-making is where unified analytics truly pays off. Proactive alerts and cohort insights surface anomalies before they become churn. LTV, funnel, and retention forecasts inform pricing and packaging moves. Layering gen ai on top of clean, unified data speeds synthesis and narrative insight, while a thoughtful customer support AI strategy connects voice-of-customer signals directly to the roadmap.

    Implementation starts with clarity. Identify the highest-impact decisions you want to improve, map KPIs to events, and instrument end-to-end tracking with quality SLAs. Establish governance early, align stakeholders across data, engineering, RevOps, and finance, and empower product trios to own their metrics. With disciplined stakeholder management and empowered product teams, the platform becomes a force multiplier rather than another tool to maintain.

    The payoff is strategic agility: faster learning cycles, lower operating costs, and confident calls made in the moment, not after the fact. If you’re ready to break free from fractured dashboards and lagging reports, commit to a unified analytics platform and let your data become a competitive advantage.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Why IT Must Lead Your AI Revolution: A Strategic, Cross-Functional Playbook That Wins

    Why IT Must Lead Your AI Revolution: A Strategic, Cross-Functional Playbook That Wins

    I’ve led and observed AI initiatives across fast-moving product organizations, and one pattern is unmistakable: “The AI revolution needs a departmental leader.” When that leader is unclear, pilots stall, risk mounts, and value gets trapped in proof-of-concept purgatory. When it’s clear, AI moves from demos to durable outcomes.

    In my experience, IT is uniquely positioned to play that leadership role. IT sits at the nexus of data, identity, security, and infrastructure—exactly where scalable AI capabilities live. IT also has the vantage point to connect use cases across teams, manage risk, and operationalize change without derailing core systems.

    Put simply, this is the promise: “Learn the key reasons why IT teams are uniquely positioned to be the strategic leaders of your company’s AI projects.” The reasons are pragmatic—access to systems of record, stewardship of data governance, ownership of integration patterns, and accountability for reliability and compliance—yet the impact is strategic.

    Here’s how I frame the operating model. IT provides strategic leadership and platform stewardship; Product owns the outcomes; Engineering delivers services and integrations; Security and Legal codify guardrails; and Finance supports cost modeling. We establish tight collaboration through product trios (Product, Design, Engineering) that plug into an IT-led AI platform, enabling empowered product teams to ship safely and quickly.

    Governance turns intent into repeatable action. I use outcomes vs output OKRs to force clarity on value, pair them with lightweight QBR cadences for course correction, and require architecture reviews that cover model/data governance, observability, privacy, and vendor risk. This ensures we can scale gen ai without surprise failures or compliance gaps.

    On the delivery side, forward deployed engineers embedded with business units accelerate discovery and reduce translation loss. We leverage gen ai for product prototyping to validate desirability and feasibility early, then harden solutions on our shared AI platform. This keeps experimentation fast while maintaining an enterprise-grade backbone.

    Roadmapping balances ambition with throughput. I tie product roadmapping and sprint planning to value streams, not just features, and I make stakeholder management explicit—especially with customer support, finance, and operations—so we design for adoption. For example, a customer support ai strategy isn’t a chatbot alone; it’s an outcome-driven service redesign, with training, playbooks, and measurable deflection and CSAT targets.

    Success demands the right metrics. Beyond typical velocity measures, I track time-to-first-value, model quality and drift, cost-to-serve, and risk posture. These roll into OKRs that link frontline improvements (e.g., resolution time) to enterprise outcomes (e.g., gross margin, retention), giving executives confidence and teams a clear definition of done.

    If you lead IT, this is your moment to step into strategic ownership and elevate AI from scattered experiments to a coherent platform. If you lead Product, partner with IT to align discovery, outcomes, and guardrails so empowered teams can move fast and responsibly. Together, we can turn AI from a buzzword into a durable advantage.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image
  • Unveiling The AI Agent Blueprint: Launch Fast, Scale Smarter, Transform Customer Experience

    Unveiling The AI Agent Blueprint: Launch Fast, Scale Smarter, Transform Customer Experience

    AI Agents are reshaping how businesses deliver service, earn loyalty, and create measurable value. From my vantage point leading product management, I see this shift accelerating across support and CX as organizations move from experiments to production-grade systems.

    Very soon, I believe AI Agents will handle the majority of customer service – and eventually, every customer interaction. Human teams won’t disappear, but their roles will evolve from answering questions to analyzing performance, improving systems, and designing better customer experiences.

    The pressure to adopt AI is real. So is the opportunity. The leaders who win won’t just add technology; they’ll redesign operations to capture durable value while safeguarding customer trust.

    But for many support leaders, the path forward is still unclear. Where do you start? What should success look like? How do you actually test and deploy these solutions? I hear these questions every week, and I’ve seen promising initiatives stall without a clear roadmap, evaluation framework, or governance model.

    That’s why we created The AI Agent Blueprint: a strategic map for support, CX, and AI transformation leaders. It’s designed to help you launch fast, scale with confidence, and achieve meaningful business transformation with AI.

    The AI Agent Blueprint is structured in two parts:

    1. Launch it. Go from zero to a successful deployment. Get immediate value from an AI Agent. 2. Scale it. Rewire your organization to sustain and expand impact.

    Part 1 – “Launch it”

    Learn how to unlock immediate efficiency and value from an AI Agent. We cover how to build a business case, evaluate and deploy an AI Agent, and prove its impact, fast.

    You’ll learn how to:

    Vector-style graphic with the word Blueprint outlined on a grid, panels labeled 1 Launch it and 2 Scale it, and the URL fin.ai/blueprint, promoting the AI Agent Blueprint guide.
    Launch it. Scale it. The AI Agent Blueprint lays out a clear framework to deploy and grow automation in customer service. Explore the step-by-step guide at fin.ai/blueprint and turn pilots into production results.

    Get clear on what an AI Agent is: Discover why they’re different from chatbots and how they work.
    Build a business case: Prove the basic economics of AI, decide whether to buy or build, and get the buy-in and budget you need to move forward.
    Evaluate an AI Agent: Learn how to define success, choose the right evaluation criteria, and run a focused, high-impact assessment with our four-step framework.
    Deploy with confidence: Build a deployment plan that balances speed with safety. Learn what to expect at each stage.
    Continuously improve performance: After launch, your AI Agent becomes a system to manage. We’ll show you how to implement a repeatable process to train, test, deploy, and optimize.

    Part 2 – “Scale it”

    Launching AI is only the beginning. To unlock its full potential, you need to rewire your systems across three core pillars:
    → Customer experience
    → Organizational and system design
    → Economics

    If you stop at launch, results will plateau. Your team won’t transform how they work. The system won’t evolve – and neither will the value.

    The second part of the Blueprint shows you how to scale AI intentionally and sustainably. That means:

    Designing AI-first customer journeys and building trust with AI.
    Embedding new roles, AI-first systems, governance structures, and ownership models.
    Rethinking how support is measured and funded in an AI-first world by exploring new metrics, ROI models, and reinvestment strategies to elevate your support function from a cost center to a strategic growth lever.

    This is where AI becomes infrastructure and support becomes a lever for growth.

    The AI Agent Blueprint is live now.

    In practice, this is how I recommend teams approach their customer support AI strategy: start with a narrow, high-value use case, define your success metrics and guardrails, and iterate quickly with human-in-the-loop quality reviews. Once you establish confidence, expand coverage, evolve your organizational design and governance, and update your ROI model to reinvest efficiency gains into customer experience. This blueprint distills the lessons I’ve learned guiding gen AI programs from pilot to platform—so you can accelerate time-to-value and de-risk deployment.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Intuition, White-Glove Support, and Relentless Execution: Lessons from Looker to Omni

    Intuition, White-Glove Support, and Relentless Execution: Lessons from Looker to Omni

    I’m constantly drawn to product stories where intuition, customer obsession, and raw effort compound into durable advantage. This conversation with Colin Zima crystallized that arc—from pioneering high-touch support at scale to balancing gut feel with data to ship what matters. The through-line for me: when you operationalize empathy and pair it with disciplined execution, you create momentum that’s almost impossible to copy. Colin Zima is the co-founder and CEO of Omni, a business intelligence tool that has raised over $26.9m. Prior to starting Omni, Colin was Chief Analytics Officer and VP of Product at Looker, which was acquired by Google for $2.6b. Colin was an early employee at Looker, and stood up its high-touch customer support arm, which turned into a cornerstone competitive advantage for the company. What resonated most with my own practice is how deliberate investment in white-glove customer support can become a product strategy lever—not just a service function. When you’re in a category-creating phase or displacing an entrenched incumbent, those high-touch loops are how you learn the truth fast, reduce onboarding friction, and convert early believers into reference customers. The trick isn’t whether to do it; it’s when, why, and how to sequence it so the economics still make sense as you scale. On scaling high-touch support, I look for three signals before pushing the gas: repeatability in the top 5 user pain patterns, a crisp path to tooling and self-service, and tight product feedback loops that turn today’s premium assistance into tomorrow’s default experience. That’s how white-glove support pays for itself—first as acceleration for adoption, then as inputs that harden the core product. I also emphasize role clarity and career ladders so support becomes a talent engine, not a cul-de-sac, which makes hiring for and hiring from customer support a strategic advantage. Colin’s intuition-based approach to product echoes a belief I hold closely: data is essential for validation and prioritization, but it rarely originates the leap. Intuition frames the bet; data sizes the risk; customers ground the narrative. I’ve seen the merits—speed, conviction, and differentiated UX—and the downsides when intuition goes unchecked—overfitting to edge cases or mistaking novelty for value. The balance is intellectual honesty: writing down the thesis, the counter-thesis, and the disconfirming evidence you’ll accept before you commit resources. I was especially struck by the operational rigor behind hitting goals for 24 quarters in a row. That kind of consistency doesn’t happen by accident; it comes from outcomes over output, sober forecasting, and the cultural discipline to cut or delay work that doesn’t ladder up. I coach teams to make the target visible, tie metrics to customer value, and then prune relentlessly—because the opportunity cost of “almost done” is usually invisible until the quarter slips. The founding story of Omni reminds me that category shifts rarely come from a single breakthrough. They’re the product of dozens of earned insights about where the market is going and what’s still too hard for customers today. I pay close attention to how founders maintain intellectual honesty as the narrative tightens—keeping a clear line between what we know from the field and what we’re assuming, and revisiting that line often. There’s also practical career wisdom here. When choosing which startup to join, I look for founder clarity on the core problem, the early design partners, and the distribution wedge. On founder-market fit, I care less about domain tenure and more about a pattern of shipping, learning, and adjusting fast. And Colin’s unpopular opinion on how to hire good PMs aligns with my experience: bias toward builders who can synthesize customer reality, technology constraints, and go-to-market timing—then communicate clearly and commit. If you’re building in data and analytics, these references are useful context for the ecosystem and buyer expectations: BigQuery: https://cloud.google.com/bigquery, Hotel Tonight: https://www.hoteltonight.com/, Omni: https://omni.co/, Tableau: https://www.tableau.com/. For those who want to go deeper with Colin’s thinking and product journey, you can find him here: Twitter: https://twitter.com/drinkzima?lang=en and LinkedIn: https://www.linkedin.com/in/colinzima/. My takeaway as a product leader: make white-glove customer support a strategic instrument, not a cost center; let intuition set bold direction while data governs scope; and cultivate the operational cadence that makes hitting your goals a habit, not a headline. That combination is how you compound trust with customers and ship products that stand the test of time.
    Book a consult png image
  • Scaling Enterprise AI That Sells: Battle-Tested Playbooks for PMF, Champions, and Agentic AI

    Scaling Enterprise AI That Sells: Battle-Tested Playbooks for PMF, Champions, and Agentic AI

    Enterprise AI is exhilarating and unforgiving. I’ve seen gorgeous demos fall apart under real-world constraints and seemingly modest pilots unlock outsized value at scale. In this reflection, I share the playbooks I rely on to build, scale, and sell generative AI in the enterprise—what actually moves deals, secures product-market fit, and sustains trust with the C-suite and the front line.

    Why is it so difficult to scale AI products for enterprise? Because the bar is higher on every dimension: data governance, security, extensibility, integration depth, reliability, and measurable ROI. An enterprise-grade, full-stack generative AI platform isn’t just a model; it’s the surrounding system—observability, evals, policy, workflow, and human-in-the-loop—that makes outcomes predictable, auditable, and safe. The fastest path to adoption is simple: deliver on-brand, on-policy content and decisions using a customer’s first-party data, and prove that quality holds up under load.

    My north star is dependability over demo magic. The number one challenge is making model output dependable across messy, high-variance enterprise inputs. I build an evaluation harness early, with gold datasets, task-specific metrics, and human adjudication. Every change ships behind guardrails and is measured against cost, latency, and quality SLOs. When governance, change management, and procurement show up (they always do), I treat them like first-class product requirements, not hurdles.

    Champions are the secret to winning complex accounts. I map the org, find operators who feel the pain daily, and quantify that pain in dollars and hours. Then I define success criteria upfront—time-to-value in under 30 days, measurable uplift (e.g., deflection, conversion, cycle time), and a plan for scale. I deploy forward deployed engineers alongside the business to co-design workflows, refine prompts and evaluators, and document before/after outcomes. Champions don’t just approve pilots; they co-author the business case and defend it.

    To win the enterprise, trust architecture matters as much as model architecture. I lead with clear answers on data residency, encryption, SSO, RBAC, DLP, and retention policies; I address whether customer data trains models, default behaviors, and opt-in controls. I offer flexible deployment (VPC or private networking when needed), transparent pricing, and SLAs with real teeth. I also integrate where work already happens—CRM, help desk, knowledge bases—so value shows up in the flow of work.

    Signs of healthy product-market fit are unmistakable: pull from lookalike buyers, multi-threaded expansions, champions who present results internally without me in the room, and usage that moves from experimentation to business-critical. I watch for weekly active usage above pilot thresholds, POCs converting to multi-year deals, and adjacent teams (Support, Marketing, Legal, RevOps) asking to onboard with minimal push. PMF feels less like persuasion and more like coordination.

    Scaling large language models for specific use cases requires ruthless focus. I constrain scope to tightly defined workflows, pair retrieval with structured knowledge, and mix model strategies (base models, fine-tunes, tools, and function calling) based on cost and latency budgets. I codify policy-as-code and deploy guardrails at the orchestration layer, not just the model layer. Continuous evaluation—both automatic and human—is the heartbeat of quality.

    My advice to AI founders in 2024 is pragmatic. Start with outcomes, not demos. Establish outcomes vs output OKRs that tie directly to revenue, cost, risk, or customer experience. Use gen AI for product prototyping to shorten discovery cycles, but graduate quickly to instrumented workflows in production. Align early with InfoSec and Legal; your speed will be gated by trust, not code. And when in doubt, ship smaller, safer increments faster.

    Healthy co-founder relationships look the same across winning companies: clear domains, fast escalation, and a shared appetite for “disagree and commit.” I keep a decision log, time-box debates, and make moments-of-truth visible to the team and board. You’ll know it’s working when you have more energy after hard conversations than before.

    The future of agentic AI is deeply enterprise: multi-agent workflows that plan, act, and verify with human oversight where it matters. The winners will combine reasoning, tool use, retrieval, and policy with audit trails that satisfy compliance while keeping velocity high. Think of it as re-engineering business processes around AI-native steps, not sprinkling AI on top of legacy workflows.

    Culture turns strategy into reality. I anchor my teams on “connect, challenge, and own.” Connect means obsess over the customer problem and internal alignment. Challenge means we red-team our ideas, run experiments, and measure impact. Own means we accept outcomes, not just output, and we iterate until the business moves. This is how a customer support ai strategy becomes a durable moat, not a slide.

    If you’re a product creator or product management leader, the above playbooks are meant to be lifted and adapted. Start where the pain is loudest, quantify the win, and let champions carry the story. The compound interest of disciplined product discovery, strong governance, and relentless evaluation is a generative AI business that sells itself—and scales.


    Book a consult png image
  • DevTools at Scale: Hard-Won Lessons on PMF, AI, and Culture from Apple, AWS, Microsoft

    DevTools at Scale: Hard-Won Lessons on PMF, AI, and Culture from Apple, AWS, Microsoft

    Building and scaling DevTools has taught me that world-class culture and relentless product focus are non-negotiable. Drawing on experiences across Amazon, Apple, and Microsoft—and hard-won lessons from startups like Unblocked and Buddybuild—I’m sharing the principles I rely on to ship great developer products at scale.

    Why building for developers is different: developers are discerning, allergic to friction, and quick to churn if the DX isn’t exceptional. That means fast setup, clear docs, ergonomic APIs, sane defaults, and deep integrations with GitHub, GitLab, Bitbucket, Confluence, AWS, and Microsoft Azure.

    I benchmark teams against gold-standard platforms like Stripe, Twilio, and Looker—tools that reward mastery, never bury the lede, and make success observable in minutes, not days.

    From the early days of Buddybuild, the signal was unmistakable: remove toil from CI/CD, shorten feedback loops, and teams will expand usage without a sales nudge. The pattern holds across DevTools: when time-to-value approaches zero, the product sells itself.

    Early signs of product market fit: organic team-to-team adoption, repeatable setup success, contribution from power users, and inbound demand you cannot keep up with. When these show up, “Why great product is everything” stops sounding like a platitude and starts reading like a P&L.

    Monetizing product market fit is straightforward if you align value and pricing units. Seat-based maps to collaboration; usage-based maps to compute, API calls, or storage; hybrid models reduce edge-case friction. Keep the packaging simple and double down on “The power of positioning.”

    AI is complicating product market fit. Gen AI accelerates gen ai for product prototyping, but it also introduces instability: model drift, hallucinations, and evaluation blind spots. I build an evaluation harness, human-in-the-loop review for risky flows, and a clear customer support ai strategy before scaling.

    Being customer-obsessed is the moat. I embed forward deployed engineers with key customers to translate real workflows into product decisions, close the empathy gap, and validate behavior in production environments.

    On decision-making, I blend product discovery with crisp documents and measurable bets: PRFAQs or design docs to clarify intent, guardrails in analytics, and outcomes vs output OKRs to keep teams aligned to impact.

    Unblocked, a developer tool that lets you talk to your codebase, points toward a future where code search, context, and refactoring converge into conversational workflows. I’m bullish on the pattern, but I stay sober about failure modes and cost-to-serve.

    Here’s my cautious take on AI: latency, privacy, and provenance matter as much as model quality. The best teams treat prompts as product, training data as liability, and evaluation as a first-class release gate.

    Hiring is where many teams stumble. Don’t over-index on competency when hiring. I optimize for learning velocity, ownership, and kindness under pressure. Competency scales output; character scales organizations.

    As a second-time founder and operator, I treat mental health like uptime. I schedule recovery, define non-negotiables, and surround myself with peers who normalize the hard days. Burnout is a systems failure, not an individual weakness.

    I don’t do demos. I prefer self-serve trials with instrumented onboarding, sample projects, and guardrails that let the product do the talking. If a prospect can’t succeed in 15 minutes, we fix the product, not the deck.

    On customer feedback, I separate noise from signal with cohorts and context. I prioritize requests that reduce time-to-value, unblock integrations, or meaningfully expand the surface area of successful use cases. That’s how to deal with customer feedback without losing strategic focus.

    To build and scale DevTools, keep the bar high and the loop tight: ship small, watch usage, learn fast. Invest in platform reliability, rock-solid SDKs and CLIs, and a developer experience that earns trust release after release.

    Resources and touchstones I revisit often:

    Apple’s acquisition of Buddybuild: https://www.cnbc.com/2018/01/02/apple-agrees-to-buy-buddybuild.html

    AWS: https://aws.amazon.com

    Bitbucket: https://bitbucket.org

    Confluence: https://www.atlassian.com/software/confluence

    GitHub: https://github.com

    GitLab: https://gitlab.com

    Looker: https://looker.com

    Microsoft Azure: https://azure.microsoft.com

    Stewart Butterfield: https://www.linkedin.com/in/butterfield/

    Stripe: https://stripe.com

    Twilio: https://twilio.com

    Unblocked: https://getunblocked.com/

    If you’re building for developers, stay ruthless about simplicity, respectful of their time, and obsessed with proof in production. That’s how durable product-market fit is earned—and monetized.


    Book a consult png image
  • Inside Intercom’s Bold Reboot: Lessons in AI Strategy, Ruthless Focus, and Culture

    Inside Intercom’s Bold Reboot: Lessons in AI Strategy, Ruthless Focus, and Culture

    I’ve been reflecting on a remarkable comeback story that offers sharp lessons for product leaders navigating AI disruption. Eoghan McCabe is the CEO and cofounder at Intercom, an AI customer service platform. Intercom has raised over $240M, and was last valued at $1.3B in 2018. After spending 9 years building the company, Eoghan left Intercom in 2020, but he’s since returned, reshaping Intercom and pioneering its pivot to an AI-first service. That arc—departure, return, and reinvention—captures a founder’s willingness to defy orthodoxy and act from first principles.

    What stood out to me most was the unapologetic embrace of intuition. In high-variance environments like AI and customer support, best practices lag reality. Founder intuition vs. standard practice isn’t a cliché here; it’s a capability. I’ve seen teams overfit to playbooks and underweight the signals that matter—customer truth, product discovery signals, and outcomes vs output OKRs that force clarity on what actually moves the needle.

    McCabe’s reflections since leaving Intercom highlight the value of distance. Stepping away often exposes where complexity crept in and where focus was lost. On return, the immediate moves were decisive: refocus the strategy, simplify priorities, and set a higher bar for cadence and quality. Those changes were anchored by first-principles thinking and a willingness to question everything, including sacred cows.

    The productivity step-change is telling. How Eoghan increased Intercom’s productivity by 41% wasn’t magic—it was management. In my experience, that kind of shift comes from ruthless prioritization, removing low-leverage work, and consolidating teams around fewer, outcome-aligned bets. Tactically, think tighter operating rhythms, clearer decision rights, and forward deployed engineers who sit closer to customers to collapse feedback loops—especially critical in gen ai and customer support AI strategy.

    Strategy-wise, the pivot to AI-first wasn’t about feature-chasing; it was about category leadership. AI and category disruption demand conviction. Why you can’t make small improvements in big categories is simple: customers reward step changes in outcomes, not incrementalism. In customer service, that means rethinking workflows end-to-end, not just sprinkling gen ai for product prototyping on top of legacy processes.

    Hiring was another area where the guidance was crisp. Tactical advice on hiring top talent included raising the bar on slope (rate of learning) and ownership, biasing for product creators who thrive in ambiguity, and building an executive team that can scale the operating model, not just the org chart. I’ve found this is where product management leadership shows up most clearly—pushing beyond conventional resumes to find people who can compound execution and insight.

    Culture carried equal weight. Crafting a culture of ruthless honesty and transparency isn’t about being abrasive; it’s about creating a system where truth travels fast. In practice, that looks like instrumented business reviews tied to outcomes, written decision memos that capture tradeoffs, and a shared language for escalation. It’s uncomfortable at first, then liberating—because it accelerates learning.

    Brand came in for a reality check, too. Why software branding is in crisis resonates in an era where many products sound the same, look the same, and promise the same. The antidote is clarity: a point-of-view that’s inseparable from the product experience. How Intercom thinks about brand appears to lean into differentiated behavior—speed, quality, outcomes—rather than slogans. In crowded categories, that’s what earns attention and trust.

    Under the hood, this story is a masterclass in product-market fit lessons. It reaffirms that PMF isn’t a one-time event; it’s a moving target, especially when technology paradigms shift. The companies that navigate the shift are those that re-baseline their bets, measure what matters, and ship faster with higher standards. That’s the compounding loop I try to build: focused strategy, outcome-centric execution, and continuous product discovery.

    If you’re steering an AI transformation, a few prompts I use: Are we solving for an outcome that customers will feel in minutes, not months? Where are we making bold, non-incremental bets? Which processes can we kill to regain tempo? And do our leaders model transparency in a way that accelerates truth-telling across the org?

    For further context and inspiration, here are some of the references mentioned: 37signals: https://37signals.com, Basecamp: https://basecamp.com, Brian Halligan (HubSpot): https://www.linkedin.com/in/brianhalligan, David Heinemeier Hansson (37signals, Basecamp): https://www.linkedin.com/in/david-heinemeier-hansson-374b18221, Intercom: https://www.intercom.com, Jason Fried (37signals, Basecamp): https://www.linkedin.com/in/jason-fried, Salesforce: https://www.salesforce.com, Marc Benioff (Salesforce): https://www.linkedin.com/in/marcbenioff, Zendesk: https://www.zendesk.com.

    If you want to follow Eoghan directly: LinkedIn: https://www.linkedin.com/in/eoghanmccabe/ and Twitter/X: https://x.com/eoghan. I find it valuable to track leaders who are actively rewriting the playbook in real time.


    Book a consult png image
  • Mastering AI Evals: Real-World Discovery Tactics to Ship Quality, Safe, Reliable AI

    Mastering AI Evals: Real-World Discovery Tactics to Ship Quality, Safe, Reliable AI

    I’ve been shipping GenAI features long enough to know that clever prompts and orchestration aren’t enough. What actually matters is evidence: Does the system work, for whom, and under what conditions? That’s where rigorous AI evals come in—the backbone of building reliable, safe, and continuously improving AI products.

    In a recent conversation focused entirely on evaluation, I dug into what “evals” mean in the AI/ML world, why they’re more than just quality assurance, and how to operationalize them end to end. If you want to explore the discussion, listen on Spotify: https://open.spotify.com/episode/7mSiEGSYNO4sXeGAVTJO4V or Apple Podcasts: https://podcasts.apple.com/kh/podcast/ai-evals-discovery/id1794203808?i=1000727980774. There’s also a video version on YouTube: https://www.youtube.com/watch?v=pfSIQMrWhQE.

    Here’s how I frame evals with my teams. First, define the behavior you want to see in terms real users care about. Then codify that intent as tests that run consistently. I distinguish between golden datasets, synthetic data, and real-world traces. Golden datasets capture canonical examples that represent “ground truth.” Synthetic data fills important gaps quickly and safely. Real-world traces keep you honest and reflect evolving usage.

    The most durable loop I’ve found is simple: identify error modes, turn them into evals, and automate. This is where error analysis pays off. Some checks should be purely deterministic—code-based checks that evaluate structured outputs, schemas, or policies. Others benefit from LLM-as-judge when human-like judgment matters, as long as you calibrate and continuously verify those judges with spot checks and inter-rater agreement.

    Discovery practices should inform every evaluation step. If you’re doing “Story-Based Customer Interviews,” you can derive realistic scenarios, acceptance criteria, and edge cases directly from user narratives. That context sharpens the evals and prevents you from overfitting to toy problems or proxy metrics that don’t reflect user value.

    Evals require ongoing care and feeding. Criteria drift is real—what counted as “good” six weeks ago may not satisfy users after you ship a new capability or your audience evolves. I treat the eval suite like living product infrastructure: versioned, reviewed, and owned. When we change prompts, models, or retrieval strategies, the evals run first, then we examine deltas, regressions, and surprises before anything reaches production.

    Guardrails and human oversight work hand-in-hand with evals. Guardrails enforce non-negotiables (safety, privacy, compliance), while evals measure progress against nuanced goals (relevance, helpfulness, tone). In high-stakes workflows, I combine pre-deployment evals, runtime guardrails, and spot human review. The goal isn’t to eliminate humans; it’s to focus their attention where judgment and context matter most.

    Practically, I start with a minimal eval harness that standardizes inputs and outputs—often in JSON (JavaScript Object Notation)—and writes repeatable tests. I maintain a small golden dataset, add targeted synthetic data for coverage, and stream real-world traces into the suite once we have consent and redaction in place. For subjective criteria (e.g., tone, helpfulness), I layer in LLM-as-judge with calibration. For objective checks (e.g., schema validation, policy compliance), code-based checks are my default.

    Tooling evolves quickly, but the principles hold. Whether you’re working with Anthropic or experimenting with V0 or Lovable in your prototyping stack, the eval loop stays the same: define success, test it the same way every time, and close the loop with learning. If you’re a product creator or leading forward deployed engineers, this discipline accelerates gen ai for product prototyping without sacrificing safety or quality.

    I also tie evals to outcomes vs output OKRs. Instead of “ship three prompts,” we commit to measurable outcomes like resolution rate, time-to-answer, or a target “helpfulness” score. In customer support ai strategy, we monitor real-world traces, CSAT, and handoff quality to ensure the AI augments agents rather than creating silent failure modes. That’s how evals drive product-market fit lessons instead of just dashboards.

    If you want to go deeper, explore these foundational concepts and tools: ML (Machine learning), LLM (Large language model), “AI Evals for Engineers and PMs”: https://maven.com/parlance-labs/evals, “The Product Leadership Wheel – A Framework for Defining and Growing Product Leadership at Scale”: https://www.petra-wille.com/plwheel, “How I Designed & Implemented Evals for Product Talk’s Interview Coach”: https://www.producttalk.org/2025/09/interview-coach-evals/, “Behind the Scenes: Building the Product Talk Interview Coach”: https://www.producttalk.org/2025/08/customer-interview-coach/, V0: https://vercel.com/docs/v0, JSON (JavaScript Object Notation): https://en.wikipedia.org/wiki/JSON, Anthropic: https://www.anthropic.com/, Lovable: https://lovable.dev/, and “Story-Based Customer Interviews”: https://learn.producttalk.org/course/story-based-customer-interviews.

    If this resonates, I’ll be sharing weekly lessons learned from building and evaluating AI features in the wild, plus conversations with cross-functional teams about real-world AI development. Have thoughts or a tactic that’s worked for you? Drop a comment and let’s compare notes.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Reimagining Product Teams with Generative AI: A Bold, Practical Vision for the Next 24 Months

    Reimagining Product Teams with Generative AI: A Bold, Practical Vision for the Next 24 Months

    In this article, I want to talk about where I believe generative AI is going to take the roles on a product team, and the team topologies of product organizations. I’m motivated to write this both because I think a vision of where we should try to go is important, and also because I see…

    That conviction has only grown as I’ve led cross-functional teams through real deployments. The traditional boundaries between product management, design, engineering, and customer success are blurring as generative AI moves from novelty to dependable copilot. What follows is the vision I’m using to guide our roadmap, hiring, and rituals—practical, near-term, and focused on outcomes.

    First, on roles: product managers will spend less time drafting artifacts and more time validating assumptions and sequencing bets. AI will draft PRDs, summarize interviews, propose opportunity trees, and even flag risks. But we will anchor decisions on outcomes vs output OKRs, using AI to widen the option set, not to outsource accountability.

    Design will accelerate dramatically. With gen ai for product prototyping, designers can turn rough concepts into interactive flows in hours, stress-test copy for clarity, and explore accessibility states before code is written. The craft shifts toward problem framing, system thinking, and quality thresholds—where human judgment remains the differentiator.

    Engineering becomes even more product-facing. Forward deployed engineers will pair with PMs and designers at customer sites (or virtually) to co-create solutions, integrate LLMs, and harden edge cases. Model-aware engineering, evaluation harnesses, and data pipeline stewardship become core competencies, while “prompt engineering” becomes a skill embedded across functions rather than a standalone role.

    On team topology: our default unit stays the autonomous, outcome-owning squad, but we add an enablement layer. An AI platform team supplies shared services—feature stores, evaluation datasets, observability, and safety guardrails—so product teams can move fast without reinventing infrastructure. Guilds or communities of practice steward reusable prompts, patterns, and model cards across squads.

    Discovery evolves too. We’ll pair classic product discovery with AI-accelerated research: large-scale synthesis of qualitative feedback, scenario exploration with synthetic data, and rapid hypothesis testing through simulated cohorts. Human-in-the-loop remains non-negotiable; generative AI helps us see more options, but customers still tell us what’s true.

    Customer support becomes a flywheel. A thoughtful customer support ai strategy turns conversations into structured insights, feeds prioritization, and powers in-product guidance. The same signals that resolve tickets should inform discovery, experimentation, and roadmap trade-offs.

    Governance and safety must be proactive. We’ll define golden datasets, create red-team playbooks, and adopt model-level SLAs alongside product SLAs. Evaluation goes beyond accuracy to include fairness, latency, explainability, and cost, with clear escalation paths when models drift or fail.

    Measuring impact changes as well. Beyond feature delivery, we’ll track time-to-learning, reduction in cycle time, precision of targeting, and the quality of decisions AI actually improves. The goal is durable product-market fit lessons, not vanity metrics or demo-driven development.

    Here’s a pragmatic 90-day starter plan: identify two high-signal use cases where latency, cost, and safety are manageable; form a cross-functional pod with a PM, designer, forward deployed engineers, and a data partner; instrument robust evaluation gates; align on outcomes vs output OKRs; ship, learn, and codify the playbook. In parallel, stand up the minimal AI platform services your squads will reuse.

    This is a leadership challenge as much as a technical one. Product management leadership must set the bar for ethical use, invest in upskilling, and reorganize incentives around outcomes. The teams that win will treat generative AI as a force multiplier for curiosity, learning, and craftsmanship—not a shortcut around them.

    If we do this well, our product teams will be faster, more customer-obsessed, and more resilient. The tools are ready. The real question is whether we are ready to evolve how we work, measure progress, and lead.


    Inspired by this post on SVPG.


    Book a consult png image
  • Why INSPIRED Still Matters in the Generative AI Era: Access, Insights, and Practical Playbooks

    Why INSPIRED Still Matters in the Generative AI Era: Access, Insights, and Practical Playbooks

    In the Generative AI era, I keep returning to the enduring playbooks that shape great product teams. INSPIRED remains a cornerstone for how I coach on product discovery, product operating models, and product management leadership. I’ve used its principles to align cross-functional squads, empower product creators, and accelerate product-market fit lessons across both startups and scaled organizations.

    The book INSPIRED is available in hardcover, digital, and audio versions, but until now, the audio version was only available in an exclusive arrangement with Amazon, on audible.com. The audio versions of our other books have been available from all major audio book providers. The exclusive contract with Amazon has now expired, and…

    Why this matters: when knowledge moves beyond a single platform, more of our teams can absorb it in the flow of work. Distributed PMs, designers, data scientists, and forward deployed engineers can learn on their preferred apps during commutes or deep work breaks. That accessibility compounds learning velocity—especially when we’re iterating weekly on discovery insights, opportunity assessments, and bet selection.

    What’s changed in our craft is the tooling: gen ai now augments how we validate assumptions, run product discovery, and prototype. Pairing the timeless practices in INSPIRED with gen ai for product prototyping helps my teams get to evidence faster—turning ambiguous narratives into testable artifacts, instrumented experiments, and real customer signals. It also sharpens our product operating model by making continuous discovery the default behavior across the product team.

    Here’s how I operationalize this shift: I anchor a short “learning sprint” around one chapter at a time, then immediately translate insights into a concrete discovery activity (problem framing, assumption mapping, or opportunity sizing). We run a gen ai prototyping spike to visualize flows, draft UX copy, and simulate edge cases, followed by quick customer sessions to validate usefulness and usability. We capture outcomes in a working taxonomy of product-market fit lessons and update our decision logs so learning compounds sprint over sprint.

    This is also a practical boost for enablement: new hires, customer support leaders crafting a customer support ai strategy, and forward deployed engineers can now engage with the same source material on their own schedules. When the whole team shares a common vocabulary—shaped by proven practices and accelerated by gen ai—the quality of debate improves, discovery cycles compress, and execution becomes more predictable.

    If you’ve been meaning to revisit INSPIRED, this is an ideal moment. With access broadening, pick the format that fits your routine and turn insights into action the same day. Use it to pressure-test your product operating model, refine your discovery cadence, and elevate product management leadership across the organization. The combination of timeless principles and modern gen ai tools is exactly what our product teams need right now.


    Inspired by this post on SVPG.


    Book a consult png image
  • Mastering AI Debugging: From Data Leakage to Evals—Practical Tactics I Use in the Wild

    Mastering AI Debugging: From Data Leakage to Evals—Practical Tactics I Use in the Wild

    How do you know if your AI product is actually any good? As someone who ships AI features at scale, I ask myself that question daily. Listening to Hamel Husain unpack the craft of error analysis and evaluation reinforced what I’ve learned in the trenches: reliability isn’t an accident—it’s the result of a disciplined, scientific approach to debugging AI products.

    Hamel’s background spans over 25 years across machine learning and data science, including impactful work at Airbnb and GitHub that paved the way for GitHub Copilot. What stood out to me was how methodical his approach is: define the problem crisply, isolate failure modes, measure what matters, and iterate with intention. That’s the same operating rhythm I expect from our teams when we evaluate AI features.

    Here are the core themes I took to heart, preserved in the language discussed: “Why debugging AI starts with thinking like a scientist”; “How data leakage undermines models (and how to spot it)”; “Using synthetic data to stress-test failure modes”; “When to rely on code-based assertions vs. LLM-as-judge evals”; “Why your CI/CD set should always include broken cases”; “How to prioritize failure modes without drowning in them.” Each of these mirrors how I build evaluation pipelines and keep them honest over time.

    On data leakage, I’ve learned to be ruthless. If your splits aren’t rock-solid, your metrics are fantasy. We harden our pipelines with explicit checks for leakage, treat feature provenance like a first-class citizen, and maintain immutable holdout sets. When I hear teams celebrate sudden metric jumps, my first question is: did leakage just sneak in?

    I also appreciated the practical contrasts between code-based assertions and LLM-as-judge evals. My rule of thumb: use code-based assertions for deterministic criteria (formatting, schema, presence/absence of required elements) and LLM-as-judge when the outcome is semantic, subjective, or requires pragmatic grading of quality. In production, I rely on both—code for guardrails, LLM judges for nuance—backed by calibration, adjudication, and spot checks to prevent drift.

    Synthetic data is another cornerstone. “Using synthetic data to stress-test failure modes” resonates because real-world logs rarely cover the long tail. We generate targeted scenarios to probe brittleness—adversarial prompts, multilingual edge cases, domain shifts—and keep these in a living eval suite. The goal isn’t just to pass tests; it’s to anticipate what reality will throw at you tomorrow.

    The conversation traces a journey from forecasting guest lifetime value at Airbnb to hands-on consulting with startups like Nurture Boss, an AI-native assistant for apartment complexes. That arc mirrors what I’ve seen: use case clarity, grounded datasets, and tight feedback loops beat model hype every time. The example of text message errors was particularly relatable—production messaging demands precise intent, tone, compliance, and context. If you can’t evaluate those consistently, you can’t scale them safely.

    Prioritization is where many teams drown. I score failure modes by severity (user harm or business impact), frequency (how often it appears), and confidence (how certain we are in the eval). High-severity issues that repeat—even at moderate frequency—get fast-tracked. Everything lives in a persistent log: what failed, why it failed, how we measured it, what we tried, and the before/after metrics. This log becomes the backbone of continuous improvement, not a graveyard of JIRA tickets.

    To avoid overfitting to the eval suite, I rotate holdouts, refresh cohorts, and introduce blind sets from time to time. We regularly audit LLM-as-judge consistency and anchor grading with a handful of human-reviewed exemplars. When metrics move, we validate that we improved real outcomes, not just our test set. If you can’t trust your evals, you can’t trust your roadmap.

    Here’s the playbook I use and recommend: define success criteria aligned to user value; construct a minimal, repeatable eval harness; seed it with real-world failures and “always include broken cases” in CI/CD; add code-based assertions for hard constraints; layer LLM-as-judge for quality judgments; generate synthetic edge cases to widen coverage; and report results in language business stakeholders understand. Do this, and you’ll not only ship better AI—you’ll ship with conviction.

    If you want to dive deeper into the specific products and methods referenced, explore these: GitHub Copilot, forecasting AirBnB Guest Growth, and NurtureBoss. Each illustrates different angles of error analysis, measurement, and iteration in the wild.

    Listen to the full conversation here: Spotify | Apple Podcasts. For further study, I recommend: Hamel’s blog on AI evals and the AI Evals for Engineers and PMs course on Maven.

    Building robust AI isn’t about perfection; it’s about disciplined progress. Think like a scientist, treat failure modes as assets, and let your evals guide the roadmap. That’s how you transform anxiety about AI quality into a durable advantage.


    Inspired by this post on Product Talk.


    Book a consult png image