Tag: LLMs for product managers

  • Kaizen for the AI Era: Tiny Daily Wins That Build Smarter, Scalable Customer Support

    Kaizen for the AI Era: Tiny Daily Wins That Build Smarter, Scalable Customer Support

    Every day, I challenge my teams to make one small, meaningful improvement—something so lightweight it’s impossible to ignore and easy to repeat. That tiny daily motion compounds, and over time it reshapes customer experience, operational quality, and team culture.

    That’s the essence of Kaizen, the Japanese philosophy of continuous improvement. Developed in post-war Japan and popularized by companies like Toyota, Kaizen proves that small, steady changes lead to significant long-term results. In product management and customer support, this approach transforms big ambitions into daily behaviors that actually stick.

    Crucially, Kaizen isn’t passive or unstructured. It thrives on three principles I reinforce across my org. First, small changes reduce resistance—when you lower the activation energy, teams move faster. Second, improvement is continuous, not occasional; instead of waiting for quarterly reviews or major releases, you ask: “What can we improve right now?” Third, everyone participates—the people closest to the work are best positioned to improve it. That’s how momentum spreads.

    In practice, the cycle is simple: identify a small problem, test the change, measure the result, refine, and repeat. The point isn’t radical transformation in a single swing; it’s steady progress guided by data and observation—a rhythm that aligns beautifully with eval-driven development and continuous discovery.

    At Intercom, we apply this same philosophy to how we manage our Agent Fin through a process we call the “Fin Flywheel”. Here’s how this works.

    Train: Teach Fin how to handle and resolve the most complex customer queries.

    Test: Run fully simulated customer conversations from start to finish to see exactly how Fin will behave before going live.

    Deploy: Launch Fin across all channels so customers get consistent support wherever they reach out.

    Analyze: Use AI-powered insights to review and improve Fin’s performance so it can deliver better customer experiences.

    This isn’t a one-time setup; it’s a continuous loop where every interaction feeds ongoing improvement. Rather than deploying AI and assuming it will perform as expected, improvement is built into the system itself. The more Fin is used, the better it gets. That’s the hallmark of agentic AI done right—tight feedback loops, purposeful conversation design, and clear Agent Analytics that illuminate what to tune next.

    But continuous improvement doesn’t stop with AI. Within our Human Support operations, I emphasize the same mindset that drives great LLMs for product managers: you instrument the experience, learn from real usage, and close gaps fast. We operate with a simple mindset: the first time that you solve a customer issue should be the last time it happens.

    When a conversation reaches a human, we pause to diagnose and prevent recurrence. Why did this reach me? Why couldn’t Fin resolve it? How can we prevent this from happening again? Those questions anchor a culture of root-cause thinking and accelerate product-led growth by removing friction at the source.

    To make this effortless, we’ve built a lightweight, AI-powered way to log suggestions in the moment—no long explanations or heavy admin required. Ideas are reviewed quickly and implemented by subject matter experts or by the team themselves. This keeps the flywheel spinning: insights flow in, fixes go out, and measurable outcomes improve.

    The result is a frontline that evolves from reactive problem-solvers into a proactive improvement engine. The people closest to customers spot friction, suggest fixes, and see their insights shaped into meaningful change. It’s continuous discovery embedded in everyday work, not a side project.

    Kaizen demonstrates that lasting progress doesn’t come from occasional transformation; it comes from intentional, everyday refinement. The “Fin Flywheel” applies that philosophy to AI. Our Human Support continuous improvement process applies it to human insights. Together, they create a shared system where both people and AI learn continuously from customer interactions.

    When improvement is built into the mechanics of how you work, it stops being a one-off project and becomes an ingrained capability. Over time, those small daily improvements don’t just add up—they compound into a sustainable, data-driven advantage that elevates customer experience and differentiates your customer support ai strategy.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • We Built Agent Analytics After Observability Broke—Why Your AI Team Needs It Now

    We Built Agent Analytics After Observability Broke—Why Your AI Team Needs It Now

    I remember the exact moment our product crossed the threshold from scripted automation to truly agentic AI. The excitement was real—so was the pit in my stomach when our dashboards went dark. Our trusted analytics and observability stack, which had served us flawlessly for traditional software, suddenly couldn’t explain what the agent was doing, why it made certain choices, or how to reproduce outcomes across runs.

    "The moment our product became a AI agent, our entire observability stack became irrelevant—not something you want as an analytics company. Here's what we did."

    Why does this happen? Agentic AI doesn’t behave like conventional apps. Instead of deterministic flows and neatly tagged events, we face non-deterministic trajectories, tool-use chains, evolving prompts, context window dynamics, and policy guardrails that influence outcomes in real time. Clicks and pageviews give way to tokens, tool calls, and conversation turns. Without purpose-built observability, you can’t do credible product discovery, measure behavioral analytics, or run eval-driven development with confidence.

    That’s why we built Agent Analytics. We needed a unified lens to trace every step of an AI workflow—from user intent to model prompts, function calls, retrievals, tool outputs, and final responses—while capturing latency, cost, guardrail hits, fallbacks, and outcome tags. We instrumented runs end-to-end, added experiment support for prompt engineering and policy variants, and wired in evaluations so we could turn subjective quality into objective signals the team could act on.

    The impact on product management was immediate. We shortened iteration cycles by making failure states obvious and reproducible, turned ambiguous feedback into structured data, and gave engineers and designers a shared source of truth for conversation design and AI workflows. With visibility into containment, escalation, autonomy ratio, and step-level success, we could ship confidently, rollback safely, and align roadmap bets to measurable outcomes—not anecdotes.

    Building this capability demanded more than logging. We invested in data governance and privacy-by-design to mask sensitive content while preserving semantic context, and we separated human-identifiable data from model telemetry. We treated prompts and policies like code—versioned, diffable, and safely rolled out behind feature flags and CI/CD—so we could experiment without risking regressions in production.

    What should every team measure? Start with outcome quality (task success, resolution, containment), reliability (tool success rate, guardrail triggers, fallbacks), performance (time-to-first-token, total latency, step-level latency), and efficiency (tokens and cost per successful task). Add groundedness checks for retrieval steps, regression evals for core journeys, and post-release anomaly detection to catch drift before users do. These metrics become your operating system for agent performance and your compass for product strategy.

    If you’re building or scaling AI agents, you need Agent Analytics before you hit your first incident. It’s the difference between guessing and knowing—between reactive firefighting and proactive iteration. With the right observability, your team can move faster, manage risk intelligently, and translate agent behavior into business outcomes that compound over time.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Inside Medable’s Agent Studio: The Agentic AI Blueprint to Accelerate Safer Clinical Trials

    Inside Medable’s Agent Studio: The Agentic AI Blueprint to Accelerate Safer Clinical Trials

    What if AI could help reduce the 10-plus years it takes to get a new drug to market? That question has shaped much of my own product strategy thinking, and it’s exactly why I was drawn to Medable’s bold move with Agent Studio. It’s a rare look inside an enterprise AI platform built for one of the most regulated industries in the world—and a team that’s still figuring it out in real time.

    In this episode of Just Now Possible, Teresa Torres talks with four members of the Medable team: Luke Bates (Product Leader, Agent Studio), Jen Brown (Product Manager), Matt Schoolfield (Product Designer), and Fiachra Matthews (Principal Architect). Listening through a product management lens, I focused on how their choices reflect a modern agentic AI strategy that balances speed, safety, and scale.

    Medable does something uniquely hard: enabling global clinical trials across 100+ languages and accelerating drug-to-market timelines. That scope demands more than clever prompts—it requires a durable platform approach. Their answer is Agent Studio, a no-code/low-code platform for configuring and deploying agents across the clinical trial lifecycle.

    What impressed me most was how clearly the platform’s primitives map to repeatable value: models, skills, knowledge bases, MCP connectors, versioning, and trigger types. In my experience, platforms win when these building blocks are composable, governed, and observable—exactly the direction Medable is taking.

    You’ll also hear about the two agents they’ve built on top of it: an ETMF agent that automates document classification across 80,000-plus documents per year, and a CRA agent that monitors patient safety and data quality across 13 different clinical systems. For a domain where errors carry real human consequences, this is the right mix of automation and oversight.

    Under the hood, their architecture choices echo what I’ve seen work in other high-stakes environments. They walk through RAG approaches at scale: embeddings vs. markdown hierarchies vs. just-in-time MCP retrieval, and explain Why they built custom MCPs with an authentication and credentialing wrapper. They also detail Context window management with sub-agents and automatic tool filtering—critical to keep agents focused and reliable as complexity grows.

    Data alignment is often the unsung hero of agent reliability. I appreciated how they described How they built a unified ontology layer to map terminology across 13 different clinical data systems. Equally important, they show their paper trail: How they document agent intent → specification → test evidence to satisfy regulatory bodies. In a GXP context, this kind of lineage isn’t “nice to have”—it’s the price of admission.

    Infographic showing how Medable Agent Studio applies agentic AI to shorten clinical trial timelines from 10 years to 1 year, using no-code agents, automated document classification, unified data monitoring, and human oversight.
    Discover how Medable's Agent Studio reimagines clinical operations, shrinking drug-to-market timelines from a decade to a year with no-code agents, automated eTMF document classification, unified data monitoring, and human-in-the-loop validation.

    Strategically, I love that Medable chose a platform approach to agents instead of one-off builds. They outline Three deployment models: Medable-built products, services-led custom builds, and self-serve platform access. This mirrors a healthy platform business model: prove value with first-party solutions, extend via services for complex needs, and unlock scale with self-serve—while keeping governance centralized.

    Reliability is a theme throughout. They describe Evaluation design in a GXP-regulated environment: golden datasets, production monitoring, and the challenge of human feedback as ground truth. We also get a concrete picture of what human-in-the-loop really looks like when clinical decisions are on the line—tight feedback cycles, auditable interventions, and clear escalation paths.

    Looking forward, they don’t shy away from ambition. The "full self-driving" vision for clinical trials and what it would take to get there is both provocative and grounded. My read: the path runs through stronger domain ontologies, standardized interfaces (MCP done right), eval-driven development, and relentless simplification of agent skills.

    If you’re a product leader building in regulated spaces, this discussion is a masterclass in balancing innovation with compliance. The takeaways map cleanly to AI Strategy: define platform primitives, invest in retrieval-first pipeline patterns, design for context window management, lean into eval-driven development, and operationalize regulatory compliance from day one.

    To dive deeper, listen to the conversation on Spotify or Apple Podcasts, and explore Medable’s broader platform work at medable.com. I left both inspired and practically equipped—an uncommon combo in today’s AI noise.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Agentic Architecture Demystified: How Modern AI Systems Plan, Learn, and Execute at Scale

    Agentic Architecture Demystified: How Modern AI Systems Plan, Learn, and Execute at Scale

    In my role leading product teams at HighLevel, I’m often asked to explain what’s really happening behind the scenes of today’s AI products. The short answer is that modern systems are built on "Agentic Architecture: How Modern AI Systems Actually Work"—not just a single model, but a coordinated loop of planning, tool use, memory, and evaluation. Once you see that pattern, the design decisions snap into focus and the roadmap becomes far easier to prioritize.

    At its core, agentic AI treats the model as a reasoning engine embedded within an AI workflow. The agent interprets intent, plans steps, calls the right tools and APIs, grounds itself in trusted data, and then evaluates outcomes before deciding to continue or stop. This loop creates reliability, reduces hallucinations, and enables the system to operate in real-world, multi-step scenarios.

    Here’s the practical lifecycle I rely on. A user provides intent (a goal or request). We run a retrieval-first pipeline to ground the model in accurate, current data. Prompt engineering structures the task and primes the agent with constraints and success criteria while managing context window management. The agent generates a plan, executes steps by calling tools or services, evaluates intermediate results, reflects or revises as needed, and only then returns a final answer with clear citations or evidence.

    For more complex work, I orchestrate multiple specialized agents—commonly a planner, a solver, and a critic—coordinated by a lightweight controller. This multi-agent pattern reduces single-agent blind spots, encourages self-checking, and mirrors how empowered product teams collaborate. Whether it’s conversation design for support flows or a voice AI agent driving hands-free tasks, orchestration is the difference between a clever demo and a dependable product.

    Memory is the second pillar. Short-term working context sits in the prompt, while long-term memory lives in vector stores or databases to track past interactions, preferences, and outcomes. Retrieval augments the model with the right facts at the right time, and tight context window management ensures the agent stays focused on signal, not noise. The result is faster responses, lower costs, and far better accuracy.

    Reliability is earned through eval-driven development and robust AI risk management. I define offline and online evaluations, guardrails, and human-in-the-loop checkpoints before scaling traffic. These evaluations become living, automated tests that protect against regressions as prompts, models, and tools evolve. The payoff is real: fewer escalations, higher trust, and measurable improvements to quality over time.

    From a product strategy perspective, I resist over-engineering. Start with a simple retrieval-first pipeline and a single agent; prove value; then layer in multi-agent orchestration only where it moves key metrics. Instrument everything—latency, cost, grounding coverage, and outcome quality—and build Agent Analytics dashboards so teams can diagnose issues and iterate with confidence.

    If you’re looking for a practical playbook, here’s mine: clarify the user intent and success criteria; design the tools the agent can call; ground with authoritative data; write prompts that constrain scope and define termination conditions; add reflection and automated evaluations; and ship behind feature flags for safe, staged rollout. Each step compounds reliability without killing velocity.

    The diagram and the video above bring these patterns to life. If you watch closely, you’ll see the same loop—plan, retrieve, act, evaluate—show up in every effective implementation, regardless of domain. That repetition isn’t accidental; it’s the backbone of agentic architecture and a blueprint you can adapt to your own stack.

    Ultimately, what matters is outcomes. When we build around agentic AI, we create systems that are explainable to stakeholders, maintainable by engineers, and genuinely helpful to customers. That’s how we move past hype to durable impact—shipping AI products that plan, learn, and execute at scale.


    Inspired by this post on Product School.


    Book a consult png image
  • Inside Amplitude’s AI Acquisition: Career Lessons Product Managers Can Use to 10x Impact

    Inside Amplitude’s AI Acquisition: Career Lessons Product Managers Can Use to 10x Impact

    I’m often asked how to translate early-stage experience into outsized product impact at scale. In my own practice, I study real career arcs that crystallize the habits of high-leverage product managers—especially those operating at the intersection of analytics and AI strategy.

    Consider this path: Lucas is a Product Manager at Amplitude. Previously, he was employee #1 at Command AI, acquired by Amplitude in October 2024. Lucas studied computer science at Princeton.

    What stands out to me is the compounding effect of being an early builder. When you are employee #1, you live close to the user problem, own outcomes end-to-end, and develop a bias toward focused, continuous discovery. That foundation creates durable instincts around product strategy, sharp prioritization, and empowered product teams—skills that transfer directly to later-stage environments where clarity and speed become competitive advantages.

    Acquisition integration is where those instincts meet enterprise rigor. Folding Command AI into a unified analytics platform like Amplitude requires disciplined product roadmapping and sprint planning, precise stakeholder management, and a strong POV on where AI augments core “Amplitude analytics” versus where it creates net-new value. The north star remains unchanged: deliver measurable customer outcomes that strengthen product-led growth and reduce time-to-value.

    On the AI front, I’ve seen the most successful PMs treat gen ai and LLMs for product managers as means, not ends. They anchor use cases to concrete analytics workflows—accelerating insight generation, surfacing anomaly detection, improving retention analysis, and driving user activation—while validating each step through continuous discovery and rigorous experiment design. This balance of ambition and evidence protects teams from shiny-object drift and keeps investment tethered to business impact.

    Execution-wise, the playbook is straightforward but unforgiving: clarify the problem through customer interviews; define crisp outcomes vs output OKRs; map the journey end-to-end; ship in thin slices; and iterate with observability baked into every release. Along the way, keep your cross-functional partners close—solutions engineering, customer success, and GTM—so that your learning loops extend beyond the product surface and into real adoption dynamics.

    If you’re building analytics or AI-powered experiences today, borrow these lessons: translate early-stage builder energy into enterprise-scale focus; make AI serve the product, not the other way around; and use Amplitude analytics to close the loop from idea to impact. That is how PMs compound credibility, accelerate careers, and, most importantly, create products customers can’t live without.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Ship MVPs in Days, Not Months: My Proven Prompt Prototyping Playbook for Product Teams

    Ship MVPs in Days, Not Months: My Proven Prompt Prototyping Playbook for Product Teams

    Most MVPs take too long, cost too much, and still miss the mark. Over the past year, I’ve shifted my team to a prototyping prompts approach that lets us validate problem-solution fit in days, not months. The result is faster learning loops, clearer tradeoffs, and a dramatically higher hit rate on features that actually move the needle.

    When I say prototyping prompts, I mean structured, layered instructions that guide gen ai systems to produce the right artifacts at the right fidelity. Instead of jumping straight to code, we generate concise problem briefs, user stories, interaction flows, low-fidelity UI descriptions, and test plans. Each pass is constrained by acceptance criteria and business outcomes, which keeps the work grounded in value rather than output.

    Here’s the playbook my product trios use to go from idea to a testable MVP in 48–72 hours. First, we anchor on outcomes vs output OKRs and clarify the customer job-to-be-done using evidence from customer interviews and support data. This is classic continuous discovery, but we compress it by focusing on the single riskiest assumption to de-risk this week.

    Second, we build a prompt scaffold. We specify the role, constraints, target users, success metrics, and the exact output format we expect. We also define evaluation upfront, borrowing from eval-driven development. For example, before any generation, we list the acceptance tests that a good solution must pass, including edge cases and compliance considerations. This discipline keeps hallucinations in check and improves repeatability.

    Third, we spin up multiple prototypes in parallel. One prompt generates a lean product brief; another outlines user flows; a third proposes UI states and error handling. If we’re exploring voice, we add prompt engineering for voice to script dialogs and repair strategies. For data-heavy features, we call out retrieval-first pipeline patterns so the model references source-of-truth data rather than guessing.

    Fourth, we validate with real users using the lightest-weight experiment possible. Fake-door tests, concierge workflows, and guided click-throughs let us measure intent before we invest. Where we can, we run quick A/B testing and size the effort using minimum detectable effect (MDE) so we don’t over- or under-sample. The point isn’t perfection; it’s fast, directional signal to inform the next iteration.

    Fifth, we instrument and ship behind feature flags. We track activation, task completion, and time-to-value from day one. On the delivery side, we watch DORA metrics and deployment frequency to ensure we’re learning continuously rather than batching big bets. This bridges discovery and delivery so roadmaps reflect real-world feedback, not assumptions.

    One recent example: we needed to evaluate a voice AI agent for appointment scheduling. In 72 hours, prompts produced the problem brief, dialog flows, error recovery strategies, and a sandbox to simulate inbound requests across three user personas. We exposed a thin slice to a pilot cohort, captured call outcomes, and iterated the repair prompts twice before writing any production code. The pilot converted at a higher rate than our control flow and gave us the confidence to invest in full integration.

    This approach only works if we treat governance as a first-class concern. We bake in privacy-by-design, clear data governance boundaries, and AI risk management from the start. Prompts include guardrails on personally identifiable information, explicit constraints on data use, and links to approved sources. We also maintain a prompt repository with versioning and automated evaluations so changes are observable and reversible.

    Practically, strong prompt scaffolds share three traits. They’re specific about context and constraints, they define success in measurable terms, and they separate concerns by artifact type. I’ll often ask for three variants with different tradeoffs, then run a quick synthesis prompt that highlights points of parity and differentiation. This gives the team structured options rather than a single, brittle path.

    If you’re starting from zero, begin with one high-leverage workflow. Write a crisp outcome statement, draft your acceptance tests, and create a prompt that outputs a one-page brief, three user flows, and the top five risks with mitigations. Validate with five users in 48 hours, then decide: double down, pivot, or park. Rinse and repeat, and your product roadmapping and sprint planning will shift from speculation to evidence.

    The bottom line is simple. Prototyping prompts won’t replace product judgment, but they will accelerate it. By turning ideas into testable artifacts in hours, you minimize waste, maximize learning, and ship better MVPs—fast.


    Inspired by this post on Product School.


    Book a consult png image
  • Prevent Strategy Drift: AI that flags ‘merge conflicts’ in product plans before a quarter derails

    Prevent Strategy Drift: AI that flags ‘merge conflicts’ in product plans before a quarter derails

    "What if an AI could spot the moment two product teams start pulling in opposite directions — before it derails a quarter?" That question hooked me, because I’ve lived through the costly fallout of subtle misalignments that only surface at the end of a sprint—or worse, during quarterly business reviews.

    I recently dug into an episode of Just Now Possible featuring Matthias and Charlotte Kleverud, co-founders of Momental. Their vision for "GitHub for product management" hits a nerve in the best possible way: find "merge conflicts" in strategy, not code, and do it early enough to save execution time, trust, and outcomes.

    Here’s the core: Momental ingests documents, meeting transcripts, and voice recordings across an organization, then uses AI agents to map them into a structured context layer—a set of interconnected trees covering goals, decisions, learnings, and who's doing what. When it finds a conflict—say, one team betting on retention while another is prioritizing conversion—it surfaces the misalignment for humans to resolve, just like a merge conflict in code. That framing is both familiar (for anyone who’s shipped software) and powerful (for anyone who’s scaled product strategy across multiple teams).

    Their journey tracks with what many of us have learned the hard way. "Starting in 2022 with DaVinci 002 and learning that the market wasn't ready for AI-assisted product thinking" pushed them toward experiments with agent teams. "The origin story: building a team of AI agents in 2024, only to discover agents hit the same alignment problems as humans" is exactly the kind of meta-lesson I’d expect when you scale autonomy without shared context. The breakthrough was an "OODA-loop-driven document processing agent" that continuously curates a living knowledge graph rather than relying on static prompts or brittle pipelines.

    One model that stood out was "The product chain: signals → learnings → decisions → principles, and how AI maps it." That is the backbone of healthy product thinking. When this chain is explicit and inspectable, you can trace why a team chose Path A over Path B—and detect when new signals should invalidate old decisions. I’ve seen this accelerate continuous discovery and improve executive decision hygiene.

    I also appreciated the organizational modeling: "Three trees that model an organization: the product tree (OKRs to epics), the wisdom tree (decisions and their reasoning), and the people/time tree." This maps cleanly to how we run quarterly planning at scale—tying outcomes to work, preserving rationale, and grounding ownership and timelines. With that structure, "How conflicts are detected, auto-resolved, or escalated to humans with merge options" becomes a pragmatic workflow, not a theoretical AI demo.

    On the technical front, they’re blunt about limits: "Why traditional chunking and RAG breaks down at scale and what Momental does instead." Anyone who’s tried to stitch strategy from ad hoc notes knows that naive retrieval won’t cut it. You need durable context boundaries, rich metadata, and graph-aware reasoning. Which brings me to one of my non-negotiables: "Why metadata—who said it, when, and in what context—is critical to preventing hallucinations." In my world, we treat provenance like test coverage—you can’t ship without it.

    Process-wise, the product philosophy resonated: "How a document processing agent uses OODA-loop thinking to extract and connect context across documents" reinforces the need for short feedback cycles, explicit hypotheses, and continuous refactoring of knowledge. Pair that with "The self-improving agent: collecting user feedback weekly and rewriting its own prompts" and you’ve got a blueprint for eval-driven development that keeps the system honest over time.

    Their UI choices also mirror a pattern I’ve adopted: "Moving from chat-first to UI-first to proactive agents as an AI product design pattern." Chat can feel magical, but alignment work benefits from concrete artifacts—trees, timelines, driver trees, and opportunity solution trees—so people can reason together. Then, let proactive agents watch for drift and nudge teams before the cost of change spikes.

    Two broader themes are worth calling out. First, "Specialized tools win" when the problem is deep, cross-functional context like product strategy. General-purpose chatbots struggle here; domain-specific models with strong information architecture have the edge. Second, product culture matters: "Discovery Versus Vibe Coding" is not just a catchy contrast—it’s a reminder that disciplined discovery beats intuition theater when stakes are high.

    As for the roadmap, I’m encouraged by their "Design partner strategy and what's next for Momental's public launch." Early design partners are where you validate signal quality, precision of conflict detection, and the ergonomics of human-in-the-loop resolution. I’m especially curious how this intersects with LLMs for product managers, outcomes vs output OKRs, and product roadmapping and sprint planning in large portfolios.

    Finally, a nod to the broader ecosystem. The conversation touched on "Claude Code" and a shift "Beyond documents and vectors" that many of us are living through—toward retrieval-first pipelines that respect context windows, stronger governance, and measurable improvements in decision quality. If you care about AI Strategy for empowered product teams, this is a space to watch—and to pilot.

    Bottom line: If you’ve ever wished you could prevent strategy drift before it shows up in your dashboards, this "GitHub for product management" approach is worth your attention. Make the chain of signals, learnings, decisions, and principles explicit. Keep humans in the loop for the hard calls. And let proactive, agentic AI do what it does best: flag misalignments early, so your teams can move fast together.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Real-Time Answers in Slack and Teams: How Amplitude’s Global Agent Elevates Product Decisions

    Real-Time Answers in Slack and Teams: How Amplitude’s Global Agent Elevates Product Decisions

    I’ve been looking for a pragmatic way to put product analytics where my teams already work—inside Slack and Microsoft Teams. The moment insights are one message away, cycle time shrinks, debates get crisper, and experiments move faster. That’s why I’m bringing Amplitude Global Agent into our daily decision flow to deliver instant, source-backed answers with visual clarity and actionable next steps.

    Connect Amplitude Global Agent to Slack or Microsoft Teams to answer questions with source-backed analytics, charts, and recommended actions like A/B tests.

    What excites me most is the shift from dashboards to dialogue. Instead of digging through reports, I can ask a focused question in Slack—“How did activation change week-over-week for our self-serve cohort?”—and get a chart in-channel, complete with recommendations that point me toward the next best move. This is Agent Analytics done right: faster insight loops, reduced context switching, and more confidence in the decisions we make every day.

    From a product management perspective, this integration strengthens continuous discovery and aligns product trios around the same truth. Engineers, designers, and PMs see the same chart, discuss trade-offs in the same thread, and can agree on an action—often an A/B test—within minutes. It’s a lightweight but powerful way to support product-led growth and keep our roadmap tied to measurable outcomes.

    In practice, the questions I ask the most look like this: “Which onboarding step causes the biggest drop-off this month?”, “Which channels drive the highest L28 activation rate?”, and “Where did retention improve after our pricing change?” In each case, the Agent returns charts we can share instantly with stakeholders, plus recommended actions like A/B test ideas to validate hypotheses quickly. The result is a reliable rhythm: ask, see, align, act.

    Governance matters just as much as speed. We’re configuring strict permissions, role-based access, and purposeful channel placement so analytics land where they should—no broader, no narrower. We’re also leaning into clear query prompts and naming conventions for events and properties to help the Agent retrieve precisely what’s needed, every time. The aim is a high-signal, low-noise system that maintains trust while accelerating decisions.

    To embed this into our operating cadence, I plug the Agent into three moments: daily standups (to scan activation, conversion, and incidents), weekly product reviews (to align on experiment status and next bets), and executive QBR prep (to pull clean, shareable charts fast). Because the insights arrive in Slack or Microsoft Teams, our conversations stay focused and traceable, and decisions get documented in the same place they were discussed.

    We’ll measure impact with simple, telltale indicators: fewer ad-hoc analytics requests, faster time from question to decision, increased A/B test velocity, and clearer links between recommended actions and outcome metrics like activation and retention. My bar is straightforward—if this Agent can help one team make a better decision per day, it will more than pay for itself across the org.

    If you’re considering a similar move, start small: connect one high-signal channel, curate a handful of common queries, and coach your team on good prompts. Within a week, you’ll feel the difference. When analytics become conversational, momentum follows—and your product strategy benefits from sharper, faster, and more transparent decision-making.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • From Tickets to Strategy: How AI Is Rewriting Support Careers—and Why Now Is the Moment

    From Tickets to Strategy: How AI Is Rewriting Support Careers—and Why Now Is the Moment

    To truly transform with AI, I’ve learned it’s never just about the technology—it’s about redesigning how we work. The teams that win don’t bolt AI on; they re-architect around it. That means rethinking roles, workflows, and governance to build a system that sustains and improves AI performance over time.

    In The 2026 Customer Service Transformation Report, teams at every stage of maturity describe human agents taking on more proactive work—training AI systems, handling the hardest queries, and owning tasks that demand judgment. Job descriptions are shifting, too, with many organizations explicitly adding AI-related responsibilities.

    I’m also seeing a clear rise in dedicated AI specialists. Conversation analysts, knowledge managers, and AI operations leads are fast becoming standard. For support professionals, this opens new, higher-leverage career paths—and creates a talent pipeline that blends service excellence, data fluency, and product thinking.

    Support once centered on queue-level activity—ticket triage, routing, translations, and answering FAQs. Now, as AI handles more frontline interactions, our human roles are moving up the stack toward optimization, oversight, and continuous improvement.

    According to the latest research, 45% of teams report updating job descriptions to include AI-related responsibilities, with 40% saying their human agents are now more focused on training AI systems. Another 27% report that human agents primarily handle the most complex escalations and edge cases, while a quarter say agents are doing more consultative and strategic work.

    Even at the initial deployment stage, 16% of teams report spending less time handling support volume since implementing AI – and among teams who’ve reached maturity, that figure rises to 28%.

    When Intercom’s Research, Analytics & Data Science (RAD) team interviewed 166 of our customers, similar themes emerged. Nearly all participants (≈95%) reported meaningful workflow changes, with manual processes being handled by AI, and humans focusing more on monitoring or fine-tuning AI outputs. Eighty-three percent of participants also reported seeing their team’s roles and responsibilities change to become more strategic and supervisory in nature.

    Infographic of AI-driven customer support roles and adoption rates: conversation analyst 32%, knowledge manager 30%, AI operations lead 28%, support automation specialist 24%; 8% say no new roles added.
    AI is reshaping support teams: organizations are adding conversation analysts (32%), knowledge managers (30%), AI operations leads (28%), and support automation specialists (24%). Just 8% report no new AI roles.

    It’s not just the work that’s evolving; organizational structures are, too. Some teams are reallocating existing talent into AI-focused roles; others are hiring entirely new skill sets. Many of the most common job titles in this space didn’t exist two years ago.

    Consider a Senior AI Knowledge Manager, Beth-Ann Sher, who transitioned from a help center manager role. Like many careers transformed by AI, her work evolved from administrative to strategic. Instead of focusing solely on customer-facing, self-serve content, her mandate expanded to designing and optimizing knowledge inputs that directly improve AI Agent Fin’s performance—work that materially lifts resolution rates.

    Or look at a Senior Conversation Designer, Fred Walton, hired specifically for an AI-first function. He focuses on frictionless customer journeys with Fin, smoothing handoffs between automation and human support while keeping customer satisfaction front and center—hallmarks of mature AI workflows and conversation design.

    In high-performing organizations, roles like these typically sit within a dedicated AI support team under senior CS leadership. Clear ownership and accountability for AI performance is critical; without it, optimization stalls and trust erodes.

    These shifts aren’t isolated. Take Robb Clarke from RB2B. He went from Head of Technical Operations to Head of AI. With Fin, his focus moved from repetitive support questions to managing knowledge and improving the system behind it—freeing him to be proactive about product improvements and fix issues before they hit customers.

    Or consider Eric Broulette from Bloomerang, a support leader who leaned into AI and became the VP of Support and Education. By deploying Fin, his team found breathing room to invest in what’s next. Agents stepped into new roles, contributed to meaningful projects, and built skills that had previously felt out of reach. As Eric puts it: “Do not wait to embrace AI. It will unlock more career growth for your teams than you can imagine.”

    Neon green hero graphic reading 'The 2026 Customer Service Transformation Report', with subhead 'The AI deployment gap is widening' and a black 'Get the report' button over a bar-chart pattern.
    Leaders are racing ahead with real AI in support. Explore the 2026 Customer Service Transformation Report to see where deployment is stalling, benchmark your team, and get practical steps to scale automation that delights.

    Bringing AI into support will eventually change every agent’s day-to-day work. For leaders at the start of the journey, that can feel daunting. My perspective: the most successful teams treat this as an operating model shift, not a tooling rollout—anchored in AI Strategy, governance, and continuous improvement.

    Be transparent about what’s changing, why it matters, and how success will be measured. Define how AI performance will be evaluated (resolution rate, containment, CSAT impact), empower agents to train and improve the system, and communicate how responsibilities will evolve. When teams help build the AI, they’re invested in making it great.

    Here’s the playbook I rely on with support leaders: First, reset expectations about time allocation—less time in the queue, more time improving the AI system that serves the queue. Second, elevate knowledge management as a core capability. Prioritize content quality and coverage for your AI Agent, and carve out dedicated “out of the inbox” time so every agent contributes. Third, keep outcome metrics—especially resolution rate—front and center. It gives the team a north star for experimentation and iteration.

    Scaling AI is as much a people challenge as it is a technology challenge. As automation takes on more work, support roles become more proactive, strategic, and cross-functional—even early in the journey. Responsibilities expand, new roles emerge, and team structures adapt to concentrate on and amplify AI performance. In the process, support careers are transformed.

    If you’re leading this shift, now’s the moment to reimagine your operating model: clarify ownership, invest in knowledge and conversation design, adopt eval-driven development, and build the muscle for continuous improvement. That’s how you move from tickets to strategy—and unlock compounding value for your customers, your business, and your teams.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • 12 Game-Changing Updates to Fin Procedures & Simulations for Complex Queries

    12 Game-Changing Updates to Fin Procedures & Simulations for Complex Queries

    Today, I’m excited to share 12 major updates to Fin’s Procedures and Simulations—the foundation that lets Fin handle complex work while keeping teams fully in control of the customer experience.

    In my work building AI workflows with product and support leaders, I’ve seen how the right blend of natural language instructions, deterministic controls, and fully agentic behavior turns Fin into a reliable problem solver. Procedures make this blend possible by enabling Fin to act like a human—yet with the repeatability and governance of software. Simulations then let us test those complex Procedures at scale before they reach customers, so we can deploy with confidence.

    Together, these capabilities make Fin self-manageable, transparent, and ready for genuinely complex work.

    Here’s what’s new at a glance: we’ve made Procedures easier to build and maintain; enhanced deterministic controls for precision and policy compliance; expanded agentic behavior so Fin can adapt in real time; and delivered more powerful Simulations to validate end-to-end workflows before go-live.

    Why did we build this? Many teams see early AI gains in speed, coverage, and cost to serve—but then hit a ceiling. They keep AI confined to simple automation and information retrieval, rather than setting it up to handle the nuanced, multi-step workflows they still trust to humans. We designed Procedures and Simulations to remove that ceiling, so teams can confidently set up, govern, and iterate on complex AI workflows without bottlenecks.

    Dark UI diagram of a continuous AI/ML lifecycle loop on a grid, labeled ANALYZE, TRAIN, TEST, and DEPLOY, with TRAIN highlighted in orange to signal iterative model development and evaluation.
    Follow the AI lifecycle as it cycles from Analyze to Train to Test to Deploy. This streamlined loop spotlights the TRAIN phase, underscoring faster iteration and feedback that power more capable procedures and realistic simulations.

    We also heard that teams needed an easy way to connect data so Fin could reliably check customer status or eligibility and then take action. And they didn’t want to route through engineering every time they needed to create or amend logic for mid-conversation decisions. Procedures combines natural language instructions and intuitive data connector setups. You tell Fin in your own words how you want it to behave, and you’ll be guided through creating conditional steps so Fin will react consistently, with the option to add in any code snippets for circumstances where absolute precision is required. Once you build one Procedure, we believe you’ll want to build several, so Fin will constantly read the conversation it’s in to ensure it’s following the most relevant Procedure, and jump to a more relevant one if the user intent changes.

    I know that taking something like this live the first time can feel like a leap of faith. That’s exactly why we built Simulations—to test Procedures comprehensively, uncover edge cases, and launch with confidence.

    Reaching mature deployment takes a deliberate, ongoing commitment to training workflows, validating them before deployment, measuring performance in production, and refining them over time. At Intercom, we call this the Fin Flywheel: train, test, deploy, analyze. Procedures form the foundation of the train stage, and Simulations make the test stage reliable at scale. Together, they enable Fin to handle complex work, and teams to stay in control of it.

    Procedures: Define exactly how Fin handles complex work. With Procedures, I can set Fin up to resolve complex, time-consuming queries that require multiple steps or business logic. Fin follows standard operating procedures and applies sound judgment—just like a seasoned teammate—so even complicated queries are resolved in controllable, predictable ways.

    Interface screenshot of a customer service Procedures editor titled 'Procedure: Damaged food order,' showing when-to-use guidance, Train Fin on examples, and Test, Save, Set live actions.
    A snapshot of the Procedures builder in action, mapping a clear path for handling damaged food orders while letting teams train Fin on examples, target channels, quickly test updates, and publish with Set live.

    Procedures combine three powerful elements. First, natural language instructions. You write a Procedure in plain language, just like documenting a process for a new teammate. You can paste in your existing SOPs, write from scratch, or let AI draft them for you, then iterate yourself.

    What’s new: Draft Procedures with AI. Share an outline of your process and Fin drafts a complete Procedure using your conversation history, knowledge hub content, and relevant data. If additional context is needed, it prompts you with clarifying questions to make sure the Procedure is thorough and tailored to your use case, significantly reducing setup time. For example: if you’re creating a refund workflow, the system can draft conditional paths for eligibility, approval thresholds, and verification steps based on your historical cases and policies.

    What’s new: Break complex workflows into Sub-procedures. Write a process once and reference it across multiple Procedures by breaking it down into reusable steps, called Sub-procedures. This makes workflows easier to read, faster to build, and simpler to maintain as things change.

    Second, deterministic controls. Natural language is flexible, but some steps need to be exact. You can layer in deterministic controls where precision matters, starting with a fully natural language Procedure and introducing structure gradually where it adds value: conditional steps (branching logic) to handle decision points so Fin’s behavior is consistent and predictable; data connectors so Fin can pull information from your tools or take actions automatically; code snippets for when absolute accuracy is essential; and checkpoints to pause for approval or hand off to a teammate.

    Screenshot of a Transaction dispute procedure showing IF/ELSE logic, a code step for check_dispute_eligibility, and a Data Connector menu with Freeze credit card and Get upcoming invoice.
    Fin demonstrates structured troubleshooting: a transaction dispute flow with eligibility checks, clear IF/ELSE steps, and quick Data Connector actions like freezing a card or pulling invoices, streamlining complex support tasks.

    What’s new: Instruct Fin to read specific content from your knowledge hub. You can set clear rules for Fin to reference a specific policy or article from your knowledge hub in defined situations so Fin always surfaces the right context in a conversation.

    What’s new: Explicit Procedure switching under defined conditions. You can set rules that deterministically trigger a switch to a different Procedure, for example, escalating to a complaints Procedure if specific risk signals are detected mid-conversation.

    What’s new: Internal notes for human handoffs. When Fin hands off to a teammate, it can now include internal notes with relevant context so the person picking up the conversation knows exactly what happened and what needs to happen next.

    Third, fully agentic behavior. Because real conversations rarely follow the happy path, Procedures let Fin reason through what’s happening and adapt—jumping to the right step or switching Procedures entirely if a customer changes their mind or the issue shifts.

    Product UI showing a Simulations panel where a 'Food order damage clear' test is running, with a simulated user and Fin AI Agent exchanging messages and green checks marking triggered steps.
    Procedures and Simulations in action: Fin rehearses a food order damage scenario, confirming details and progressing through each trigger. Teams validate complex flows end to end as steps turn green and outcomes are tracked.

    What’s new: Automatic Procedure switching. If a customer starts in a billing workflow but then asks about cancelling their subscription, Fin transitions to the relevant Procedure without forcing the customer to restart.

    What’s new: Structured data extraction from uploaded files. Fin can now extract structured data directly from PDFs and images uploaded by customers—like invoices, forms, or receipts—and use that data within the conversation. Customers don’t have to copy and paste or repeat themselves.

    As MONY Group put it:

    “ If a customer starts down one path but their issue turns out to be something else entirely, Fin adapts seamlessly – no more getting stuck in loops or forcing customers into the wrong workflow. ”

    Screenshot of a Simulations panel for AI support workflows, listing scenarios: Damage confirmed (Pass), Refund subscription (Fail), No subscriptions (Not run yet), with Run all, New, and suggested tests.
    Simulations help teams rehearse procedures and verify outcomes before going live. Run all tests or launch a new one to ensure Fin handles tricky customer scenarios—from damage confirmation to refunds and missing subscriptions.

    The result is a conversation that feels fluid, but always follows your intended rules.

    Making complexity easier to manage is just as important as unlocking new capabilities. Beyond the core updates, we’ve focused on creation, governance, and scale—while keeping ownership with your team.

    What’s new: Improved instruction authoring. We’ve made it easier to write, edit, and structure Procedures, so building and updating them takes less time and requires less effort.

    What’s new: Reporting on when Procedures trigger, resolve, or hand off. You can now track how Procedures are performing directly within the Procedures UI, seeing exactly when they trigger, when they resolve, and when they hand off to a teammate. This visibility helps you spot issues early and improve over time.

    Two-column graphic with customer testimonials on Fin’s Procedures and Simulations update, citing payment query handling, ~94% CSAT for Payment Information, and real-time claims via API-driven decisions.
    Customer stories from Raylo and Mony Group show how Fin now resolves payment issues and complex claims in-chat, checks account data via APIs, and lifts CSAT to about 94%, highlighting the impact of Procedures and Simulations.

    Simulations: Test complex workflows at scale before they reach customers. Simulations let you validate how Procedures will perform before anything goes live, and continuously revalidate as things change. Deploying complex AI can feel uncertain; Simulations remove that uncertainty so you can launch with confidence and iterate safely.

    You can simulate full conversations. For any Procedure, choose a user or customer segment and run a complete, multi-turn simulated conversation. You see every step Fin takes, how it applies your rules, reasons through decisions, and where it passes or fails—giving you the observability to debug and fix issues before they ever reach customers.

    What’s new: Upload images for richer testing. Simulations now support image uploads, so you can test workflows that involve receipts, invoices, or forms—the same inputs your customers actually send.

    What’s new: Clearer visibility into Fin’s reasoning. You can now see exactly how Fin is thinking through each step of a Simulation, making it easier to understand behavior, catch unexpected decisions, and refine Procedures with confidence.

    You can also use AI to create, store, and rerun tests. Writing test coverage manually doesn’t scale. Fin’s AI Assistant generates Simulations directly from your Procedures, suggesting realistic edge cases like partial refund disputes, missing invoice uploads, or no subscription found, so you can expand coverage without expanding overhead. All the Simulations you create are stored in a central library. When a product changes, a policy updates, or a Procedure is edited, hit “run all” to instantly check whether anything has regressed. This applies the same rigor to AI automation that engineering teams bring to software testing.

    What’s new: AI-suggested Simulations. You can now use AI to generate a full set of Simulations from any Procedure. The AI Assistant suggests realistic variations based on your workflow, so you can build comprehensive test coverage fast.

    Customers are already seeing this in production. “Fin can now handle payment-related queries that were never possible before… The impact on CSAT and overall CX has been pretty shocking – the Payment Information procedure CSAT is sitting at ~94%, and CX score is significantly higher than our average.” – Raylo

    “Procedures have fundamentally changed what we can achieve with Fin. Previously, complex processes like cashback claim investigations could only be handled through a static form on our website… Now, Fin can handle these sophisticated scenarios in real-time within the conversation itself. It checks account information via API calls, makes complex decisions, and guides customers through the entire claims process dynamically.” – MONY Group

    Procedures and Simulations are available now. I’m eager to see how teams use these updates to scale agentic AI, deliver faster resolutions, and raise the bar for customer experience—without sacrificing control, compliance, or quality.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Human-in-the-Loop Mastery: Proven Oversight Tactics That Elevate AI Quality and Trust

    Human-in-the-Loop Mastery: Proven Oversight Tactics That Elevate AI Quality and Trust

    Human-in-the-loop oversight is the fastest and most reliable way I know to elevate AI quality, build user trust, and reduce risk. At HighLevel, my teams treat oversight as a product feature—not an afterthought—because dependable AI experiences come from deliberate design choices across data, models, and people.

    When I say “human-in-the-loop,” I mean a system that blends automation with targeted human judgment at key moments: during data curation, prompt engineering, evaluation, deployment, and post-launch learning. This approach turns “AI workflows” into measurable, repeatable processes and keeps me honest about what’s working, what’s drifting, and where a human safety net must step in.

    Architecturally, I start with a retrieval-first pipeline to ground outputs in trusted knowledge, then wrap it in guardrails. Deterministic preprocessing, careful prompt engineering, and post-processing validators catch obvious failure modes. Confidence thresholds and policy checks route ambiguous or sensitive cases to a human reviewer, while clear, auditable traces show why the system chose automation versus escalation. This balance supports reliability at scale while preserving agility for “agentic AI” patterns when they add value.

    Quality is only real if I can measure it, so I build with eval-driven development from day one. I maintain golden datasets, rubric-based scoring guidelines, and an automated evaluation harness that runs on every change to prompts, models, or data. Pre-production gates protect against regressions, while production telemetry surfaces drift by segment and use case. When it’s time to run experiments, I use A/B tests sized with a minimum detectable effect (MDE) to avoid overfitting to noise.

    Operationally, I optimize for outcomes, not output. I track task success rate, time-to-resolution, safety violation rate, hallucination rate, and cost-to-serve, then connect these to outcomes vs output OKRs. The signal I want is simple: are we reliably solving the user’s job-to-be-done with lower effort and higher confidence? If not, I tighten prompts, refine retrieval, or expand human review where it pays off most.

    Risk governance is non-negotiable. I design with privacy-by-design and data governance from the start—role-based access, audit trails, PII redaction, and red-team tests for safety. Clear reviewer playbooks and calibration sessions reduce bias and ensure consistent decisions. These practices aren’t bureaucracy; they’re how I operationalize AI risk management while maintaining velocity.

    Teams make or break this model. I empower product trios to own the full lifecycle—discovery, build, and learning—so feedback loops close quickly. In-product feedback widgets, reviewer queues, and incident management playbooks help us respond in hours, not weeks. Over time, human review becomes a targeted scalpel rather than a blanket requirement as the system learns and improves.

    Economics guide the level of oversight. I treat each workflow like a portfolio: where the value of accuracy is high and ambiguity is common, I route more to humans; where tasks are simple, frequent, and well-bounded, I automate aggressively. The goal isn’t zero humans—it’s optimal humans, deployed precisely where their judgment compounds ROI.

    If you’re getting started, begin with one high-impact workflow, establish your golden set and evaluation rubric, and wire in a simple review queue. Prove the lift, then scale. In the short video above, I walk through the patterns I use to design these loops, measure quality with rigor, and ship AI that teams—and customers—can trust.


    Inspired by this post on Product School.


    Book a consult png image
  • Make Your Analytics AI-Ready: De-Risk, Measure, and Scale AI-First Products Fast

    Make Your Analytics AI-Ready: De-Risk, Measure, and Scale AI-First Products Fast

    I ask one question before I green‑light any new AI feature: is our analytics truly AI‑ready? If the answer is no, we slow down, because nothing derails an AI roadmap faster than shipping features we can’t measure, iterate, or trust. Over time, I’ve learned that the right analytics foundation is the difference between a flashy demo and a durable, compounding product advantage.

    "Product and engineering teams face new challenges when building AI-first products. A modern digital analytics platform offers solutions." I agree—and I’d add that the real win comes when model metrics and product outcomes live in one coherent system, so we can connect every improvement to customer value.

    Here’s what “AI‑ready” analytics means in practice for me: a unified event taxonomy tied to clear user and account identities; consistent product analytics (activation, funnels, retention analysis, cohorts); ground‑truth labels and feedback signals for model evaluation; and a single source of truth that blends model telemetry with user behavior. When those pieces click, our AI Strategy turns from guessing to “eval‑driven development.”

    Start with data governance and privacy‑by‑design. Define event names, properties, and versioning rules up front. Capture the context that AI needs—inputs, outputs, confidence scores, content types—without storing unnecessary PII. This discipline reduces rework, improves observability, and keeps auditors and customers confident in how we handle data.

    Next, operationalize eval‑driven development. I run offline evaluations with representative datasets, then shadow mode in production, and finally controlled rollouts with A/B testing and feature flags. We set a minimum detectable effect so experiments are conclusive, and we include AI risk management metrics—like safety violations, fallback rates, and moderation triggers—alongside core product KPIs such as activation, task success, and time‑to‑value.

    On the product analytics side, I rely on a unified analytics platform (e.g., Amplitude analytics or similar) to track adoption of AI features: who sees the feature, who tries it, who repeats it, and who retains because of it. Cohort analyses help me isolate lift among target segments; CRM integration connects usage to revenue; and pathing highlights where users need guidance. This is the engine of product‑led growth for AI capabilities.

    Quality and observability complete the loop. I monitor latency, error rates, and cost per successful outcome, but I also watch human‑grounded proxies: thumbs up/down, edits after AI suggestions, and deflection and CSAT for support workflows. These signals feed back into prompt engineering, retrieval quality, and model selection—closing the gap between LLM behavior and customer value.

    None of this works without strong cross‑functional rituals. Product trios align on success metrics before we write a line of code; continuous discovery validates user problems; and QBRs versus OKRs are reconciled so we invest in durable capabilities, not just quarterly spikes. When analytics and discovery move in lockstep, we ship fewer speculative features and more compounding improvements.

    Finally, choose build versus buy intentionally. I buy a robust, scalable analytics substrate and only build the custom AI evals I need for proprietary use cases. With feature flags in CI/CD and automated schema checks, instrumentation becomes part of deployment frequency—not an afterthought. The result is a reliable runway to scale AI‑first products without losing speed, safety, or clarity.

    If you want a quick readiness check: do you have a clean event schema, identity resolution, and governed properties; a measurable definition of activation for each AI feature; offline and online evals connected to business KPIs; guardrails and human feedback in the loop; and dashboards that team leaders actually use? If not, start there. The payoff is faster iteration, lower risk, and a clearer line from AI investment to customer outcomes.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image