Month: February 2026

  • From Chaos to Clarity with Claude Code: My Hands-On Playbook for Product Leaders

    From Chaos to Clarity with Claude Code: My Hands-On Playbook for Product Leaders

    I’ve been pushing hard to operationalize AI for real product work, and this episode zeroes in on the moment Claude Code stops feeling like a demo and starts behaving like a dependable teammate. If you’ve ever wondered how to go from clever prompts in the browser to durable, repeatable workflows on your machine, this walkthrough is for you.

    Listen on: Spotify | Apple Podcasts.

    My first honest reaction to installing and configuring the desktop agent was the all-too-relatable “this tool thinks everything is a code repo” reality. That framing helped me reset expectations fast: instead of treating it like a magical universal assistant, I began designing guardrails, context, and repeatable routines—exactly how I’d onboard a new team member.

    The shift from Claude-in-the-browser to Claude Code on my machine was the unlock. Locally, it can finally work with my files, folders, and workflows. That meant I could ground it in real artifacts—project docs, meeting notes, product specs, and historical decisions—so responses weren’t just plausible; they were contextual and verifiable.

    On setup, I now treat /init and Claude MD files as my product requirements. I define roles, boundaries, and canonical sources up front, then run in a deliberate “walled garden.” The “treat it like an intern” model works beautifully: scope access intentionally, expand privileges as trust grows, and keep a tight audit trail of what it can touch and why.

    Surprisingly, task management became my ideal on-ramp. It’s easy to validate, the feedback loops are tight, and the ROI is immediate. I export calendar windows rather than granting full calendar access, then let the agent map priorities into Trello, reconcile time blocks, and surface trade-offs. Fast wins build confidence—mine and the agent’s.

    Model switching matters more than I expected. When speed is king and “good enough” will do, Haiku keeps the loop snappy. When stakes are higher—complex synthesis, nuanced product strategy, or gnarly ambiguity—I step up to Claude Opus 4.5. Being intentional about when to optimize for latency versus depth is a quiet superpower.

    Web tasks can still spiral. When that happens, I pause its autonomy, toggle to fewer steps, and ask, “What are you doing?” Paired with Claude’s Web fetch tool, this makes the agent explain its chain-of-thought planning without exposing hidden reasoning, so I can spot brittle assumptions, prune distractions, and re-ground the task.

    Content retrieval has become a killer workflow. I point the agent at my archives—blog posts, book drafts, transcripts, notes—and ask, “Where have I talked about this before?” It assembles a map of prior art, connects themes I’d forgotten, and prevents me from reinventing work. Over time, this evolves into a Zettelkasten-style research system that upgrades rigor and accelerates synthesis.

    I’ve also turned Claude Code into a publishing engine. From a single transcript, it drafts titles, descriptions, show notes, and chapters, then routes artifacts to Ghost for formatting. Before anything ships, I run fact-checking workflows that validate claims against transcripts and research sources. The output improves, but more importantly, the scaffolding makes quality repeatable.

    Reusable workflows compound. I rely on slash commands to trigger common jobs, break down larger efforts with sub-agents, and wire in hooks and plugins where external systems are needed. This is agentic AI at its most practical: fewer hero prompts, more reliable processes.

    Audience analytics and content prioritization are helpful with caveats. I let the agent cluster themes and flag gaps, then I pressure-test its suggestions against first-party data and strategic goals. As with any model-driven insight, triangulation beats blind faith.

    Two metaphors guide my day-to-day. First, Claude Code is like a dog—sometimes it returns with the stick, sometimes it gets lost in the woods. Second, the “intern” framing keeps me honest: don’t hand it the whole company on day one. With that mindset, my output jumped—more volume without sacrificing quality—because the workflow scaffolding got better.

    In this episode, I cover what Claude Code is and why it’s useful even if you’re not an engineer, the real difference between the browser experience and running locally, how to shape behavior with /init and Claude MD files, why task management is the perfect proving ground, when to export calendar windows versus connecting directly, and when model-switching makes sense—Haiku for speed, Opus for depth.

    I also dig into debugging web tasks by asking “What are you doing?”, content retrieval workflows across personal archives, building reusable slash-command systems with sub-agents, hooks, and plugins, practical publishing stacks from transcripts, fact-checking against transcripts and research sources, and using analytics to prioritize content—with a healthy respect for uncertainty.

    If you’ve been trying to make Claude Code feel less like “throwing a stick into the woods,” this is the candid, tactical tour I wish I’d had on day one. Drop your questions and experiments below—I’m eager to compare notes and refine the playbook together.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Build CX Scores You Can Defend: My 5-step playbook for transparent, trustworthy AI metrics

    Build CX Scores You Can Defend: My 5-step playbook for transparent, trustworthy AI metrics

    “You don’t have to trust the algorithm; you can see exactly why a conversation earned the score it did.”

    We recently shared how we redesigned CX Score to deliver deeper, more actionable insights across every conversation. The most common follow-up from support leaders was simpler and incredibly important: “Can I trust it?” It’s the right question—and it’s the one I use as my own bar for whether a metric is ready for the C‑suite.

    CS teams are the subject matter experts on customer experience. They understand the nuance of what customers feel, the context behind every interaction, and the difference between a technically resolved issue and a genuinely satisfied customer. I’ve learned, conversation by conversation, that any metric we ship has to capture that nuance at scale—or it doesn’t deserve to be used.

    We built CX Score to give support teams a complete view of how their customers feel across every conversation. It surfaces what’s working, what’s not, and why—so leaders can communicate impact clearly and drive change across support, product, and the wider business.

    Interface card displaying 'CX Score: 2' summarizing a case where repeated CSV export attempts failed, frustrating the customer; the AI agent explains the issue and requests more details; rounded gradient border.
    A CX Score in action: repeated CSV export failures trigger a low score and customer frustration, while the AI agent clarifies next steps and gathers details—turning raw signals into actionable support insights.

    Here’s exactly how I approached building a trustworthy metric that support leaders can inspect, explain, and defend.

    1) It’s grounded in how support teams define quality. I started with how experienced support professionals actually evaluate conversations—collecting real examples of strong, mixed, and poor interactions across industries, identifying the specific factors that shape overall experience, and writing plain-English rules for each. The result: CX Score applies the same criteria a trained support professional would use, not generic LLM assumptions.

    2) It’s aligned with human judgment. We created a dataset of thousands of real customer conversations spanning multiple industries, languages, channels, and agent types. Each was manually reviewed by experienced support professionals—with two reviewers per conversation where possible and disagreement resolution to create stable consensus labels. The result: CX Score is trained and tested to behave like an expert reviewer, not a language model making broad guesses.

    Analytics dashboard visualizing a CX Score with KPI cards and a Sankey performance funnel linking support channels to AI involvement, resolutions, and positive, neutral, or negative outcomes.
    A modern CX analytics view shows how conversations flow from chat, email, and mobile into AI assistance, then to resolutions and sentiment outcomes—turning messy support data into a single, defensible CX Score.

    3) It’s engineered by AI specialists. CX Score isn’t a prompt attached to an LLM. It’s a production system built by Intercom’s AI Group: 37 ML scientists and 350 engineers whose full-time focus is AI for customer service. The system includes specialized handling for long transcripts, model configuration tailored for support language and subtle sentiment, prompt engineering designed to default to neutral when evidence is weak, and a multi-stage evaluation pipeline that checks for precision, consistency, and reliability. The result: A metric built by a team that understands LLM behavior in production support environments, where accuracy and consistency matter most.

    4) It’s validated statistically, not qualitatively. Trust requires measurement, not vibes. We tested CX Score across standard ML metrics: Precision (when the model flags a negative experience, how often do humans agree?), recall (how many human-identified issues does it catch?), and F1 score (the balance between both). We set an explicit bar: F1 above 0.8, representing high agreement with human judgment. We reran these evaluations through every revision, checking for regressions or biases, and I focused especially on negative experiences, because a false negative hides a real problem. The result: CX Score meets a measurable standard before it ships—not a gut check, a statistical requirement.

    5) It was battle-tested with real customers. Lab accuracy isn’t enough. Customer environments are messy: Varied ticket types, mixed languages, unpredictable edge cases. Before release, we ran a multi-phase field test—shadow-scoring conversations with both old and new models, validating sensible behavior across agent type and conversation length, then rolling out to a controlled customer group who confirmed the scores felt right, reasons were clear, and insights were actionable. The result: CX Score shipped because real teams told us it made sense in practice, not because it passed internal tests.

    Donut chart of CX categories beside a chat UI showing a CX Score of 3 with a 'Negative policy feedback' tag, highlighting policy feedback, answer quality, customer effort, and emotion.
    From conversation to clarity: this visual maps the drivers behind a CX Score. Explore how policy feedback, answer quality, and effort combine to produce defendable insights support leaders can act on.

    The importance of explainability. One of the most critical choices I made was ensuring CX Score isn’t a black box. Every score comes with clear reasons, concrete excerpts, and a short explanation of what influenced the rating. This turns the metric into something you can inspect, audit, and explain to executives. You don’t have to trust the algorithm. You can see exactly why a conversation earned the score it did.

    A metric that evolves with your business. Customer expectations shift. Products change. AI improves. A trustworthy metric can’t be static. CX Score evolves with the same commitments that shaped its redesign: Evaluate the real signals that shape customer experience, keep the logic simple and interpretable, and ensure leaders can make clear decisions from it. It’s built to be a durable source of truth across every conversation.

    The takeaway. In a world where products look the same and AI can generate any interaction, customer experience is one of the few differentiators that actually matters. Support leaders have built that expertise conversation by conversation. What they’ve lacked is a measurement system that could validate it at scale—one that’s reliable enough to report to the C-suite, explainable enough to defend in strategy meetings, and rigorous enough to drive real decisions. That’s what CX Score is designed to be: A metric that reflects the reality support leaders see every day, backed by the technical rigor to make it credible everywhere else.

    Want to see CX Score in your workspace? Ask your admin to enable it for your team, and start using explainable AI insights to improve customer experience and coach with confidence.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • AI Agent Deployment Mastery: My Proven Checklist to Ship Safely, Faster, and at Scale

    AI Agent Deployment Mastery: My Proven Checklist to Ship Safely, Faster, and at Scale

    Shipping AI agents is not like shipping a typical feature. The system learns, reasons, and takes action in unpredictable environments, and when it’s customer-facing, the stakes are high. Over the past few years, I’ve refined a practical checklist that helps my teams move quickly without breaking trust. It balances speed with safety, and ambition with accountability—exactly what you need to scale agentic AI in production.

    This checklist was forged in real launches—some smooth, some humbling. Early on, I watched an otherwise brilliant agent confidently offer a refund policy we didn’t have. That one incident made it clear: AI agents require a higher bar for guardrails, evals, and observability. Today, I won’t greenlight an AI rollout without these steps being explicit, owned, and testable.

    Start with outcomes, not output. I define the job-to-be-done, the target users, and the measurable business impact using outcomes vs output OKRs and driver trees. Success is not “ship an agent,” it’s “reduce first-response time by 40% with no drop in CSAT,” or “increase qualified demo bookings by 20% at a lower cost per acquisition.” Clear outcomes give the agent a purpose and the team a north star.

    Prepare the knowledge the agent will use. A retrieval-first pipeline beats raw prompting for most enterprise cases. I inventory sources of truth, set access controls, and enforce data governance from day one. That includes PII handling, redaction, retention policies, and privacy-by-design. If the agent can’t reliably retrieve the right fact at the right time, the rest doesn’t matter.

    Choose models and prompts with discipline. I align model selection with context window management, cost, latency, and tool-use requirements. Then I build prompts and tools together, not in isolation, and I keep temperature, stop conditions, and function-calling explicit. Most importantly, I use eval-driven development: golden datasets, task-specific metrics (accuracy, helpfulness, latency, cost), and target thresholds that must be met before widening rollout.

    Manage AI risk upfront. I treat jailbreaks, toxicity, and data leakage as product risks, not just security issues. I implement layered defenses—input/output filtering, policy checks, rate limits, and abuse monitoring—and define escalation paths and human-in-the-loop handoffs for ambiguous cases. Every risky capability needs an owner, a playbook, and a test.

    Build the pipeline that lets you iterate safely. Prompts, tools, policies, and retrieval configs go through the same CI/CD rigor as code. I use feature flags for progressive delivery, canary cohorts to limit blast radius, and clear rollback procedures. Observability isn’t optional; I track latency, token usage, cost, failure modes, and user outcomes. I also watch DORA metrics and deployment frequency to ensure we’re improving the engine, not just the output.

    Constrain autonomy intentionally. Agent behavior design matters as much as model choice. I set step limits, define tool whitelists, separate read vs write permissions, and specify decision checkpoints. When the agent is uncertain or confidence drops below a threshold, it hands off to a human or a deterministic workflow. Guardrails aren’t barriers; they’re bumpers that keep you on the track.

    Instrument what users experience, not just what models produce. I track activation, task success, self-serve completion rates, and time-to-value. I pair Agent Analytics with journey analytics so I can see where the agent helps or hurts. I also invest in UX trust cues—transparent explanations, undo paths, and in-app guides—so users feel in control. When the agent changes behavior through learning, the interface should make that understandable.

    If you’re shipping a voice AI agent, test in realistic conditions. I set targets for ASR accuracy, barge-in responsiveness, TTS prosody, and end-to-end latency. I predefine safe transfer logic for complex calls and ensure compliance for call recording and data retention. Voice amplifies both the magic and the mistakes; operational excellence is non-negotiable.

    Plan the business rollout like a product, not a press release. I align pricing (often consumption SaaS pricing), packaging, and SLAs with actual unit economics—tokens, inference, and retrieval. I equip solutions engineering with playbooks and reference architectures, wire up CRM integration for attribution, and put feedback loops into Intercom or the support stack so we learn from every interaction.

    Run operations like an SRE team. I define incident severity for AI-specific failures (e.g., harmful output, runaway cost, degraded retrieval), add alerting, and keep runbooks current. I schedule postmortems that feed directly into eval baselines and backlog priorities. Continuous discovery isn’t a ceremony; it’s the safety net that keeps improvements compounding.

    Close the loop on compliance and governance. From day zero, I document data flows, vendor scopes, and audit logs. I verify regulatory compliance and adopt privacy-by-design so I’m not retrofitting later. Transparency, user consent, and opt-outs aren’t just legal checkboxes; they’re trust-building tools that differentiate your product.

    The result of this checklist is speed with confidence. It gives my teams a common language to debate trade-offs, a clear path to production, and the guardrails to scale safely. If you’re preparing to deploy an agent, adapt these steps to your stack and your customers. Your future self—and your users—will thank you.


    Inspired by this post on Product School.


    Book a consult png image
  • Vibe Coding Unleashed: How Parallel Agents Build KPI Driver Trees in Under Two Hours

    Vibe Coding Unleashed: How Parallel Agents Build KPI Driver Trees in Under Two Hours

    I’ve been exploring what I call the next level of vibe coding: orchestrating agentic AI to build complex product artifacts in minutes, not days. The breakthrough comes from ditching linear handoffs and embracing true parallelism—letting specialized agents tackle the work simultaneously while I steer the orchestration. In product management contexts where speed and clarity matter, this shift changes everything.

    Building a KPI Driver Tree in two hours becomes possible when you stop building sequentially and start building with parallel agents.

    For product leaders, a KPI Driver Tree is the fastest way to make strategy legible. It ties high-level outcomes to the levers we can actually pull—features, channels, pricing, onboarding, activation, and retention mechanics—so we can prioritize with confidence. Done well, it connects outcomes vs output OKRs, clarifies measurement, and aligns the team around a shared, testable model of growth.

    Here’s how I operationalize it with agentic AI and AI workflows. I spin up a small team of specialized parallel agents: a Metrics Librarian (taxonomy and definitions), a Data Modeler (event and table design), a Research Synthesizer (voice of customer and causal hypotheses), a UX Prototyper (visualizing the tree and flows), and a QA/Evaluator (logic and consistency checks). An Orchestrator coordinates these agents, resolves conflicts, and composes outputs into a single, production-ready artifact—while I set constraints, review deltas, and decide.

    In a typical two-hour sprint, all agents run at once. While the Metrics Librarian finalizes the KPI ontology, the Data Modeler validates instrumentable events and joins, and the UX Prototyper renders an interactive driver tree for a unified analytics platform. Meanwhile, the Synthesizer maps qualitative insights to quantitative levers, and the Evaluator stress-tests assumptions. Because we’re not waiting for sequential handoffs, we converge on a coherent driver tree and its initial measurement plan in one pass.

    The payoff isn’t just speed—it’s higher-quality decisions. Parallel agents reduce context loss, expose trade-offs earlier, and allow me to compare multiple viable paths side-by-side. This accelerates continuous discovery, aligns with product strategy, and gives product managers and LLMs for product managers a clear, living map of how inputs roll up to outcomes. It’s the closest I’ve found to running a product trio at machine speed.

    Guardrails matter. I pair this approach with strong data governance, privacy-by-design, and eval-driven development so every agent’s output is testable and auditable. Clear prompts, scoped corpora, and consistent acceptance criteria keep the Orchestrator honest, while lightweight Agent Analytics helps me see where reasoning falters and where to improve the system.

    If your team is still tackling analytics artifacts sequentially—requirements, then instrumentation, then visualization—consider switching mental models. Treat the driver tree as the backbone, empower parallel agents to co-create around it, and reserve human judgment for the critical calls. This is vibe coding for product management: creative, fast, and grounded in measurable outcomes.


    Inspired by this post on Pendo – Best Practices.


    Book a consult png image
  • Go Deep or Get Left Behind: How AI Deployment Depth Transforms Customer Service

    Go Deep or Get Left Behind: How AI Deployment Depth Transforms Customer Service

    AI adoption is everywhere. I see more teams every quarter moving from pilots to production—and increasing their budgets accordingly. But the gap between “using AI” and truly transforming with it is widening fast. Launching an AI Agent is easy; building a mature, AI-powered support operation is where the real work—and the real value—lives.

    In the new research, the "2026 Customer Service Transformation Report," the difference comes down to depth of deployment. It’s not enough to dabble. Teams that design their operations around AI are pulling away from those who treat AI like a bolt-on feature.

    This article kicks off part one of my five-part deep dive into the research. I’ll unpack the data, share what I’ve learned leading product and AI strategy, and translate it into practical steps you can apply now. If you’d like to go straight to the source, you can download the report here.

    First, the macro picture: 2,470 global support professionals across industries were surveyed to understand current AI usage, challenges, and the 2026 opportunities. The headline is clear—AI investment is now table stakes. Eighty-two percent of senior leaders say their teams invested in AI in the past year and 87% say they plan to invest in 2026. Those investments are already paying off: Over three-quarters of CS teams (77%) say AI is meeting or exceeding expectations, delivering faster response and resolution times, always-on coverage, cost savings, increased capacity, and multilingual support that scales globally.

    And yet, only 10% of organizations say they have reached a "mature" level of deployment, where AI is fully integrated into operations and working at scale. That’s the tell: most teams are skimming the surface and leaving meaningful performance gains on the table.

    Infographic showing AI deployment stages in customer service: 10% mature deployment, 26% scaling, 35% initial deployment, 26% exploring; note says 3% unsure; circular gauges compare adoption levels.
    Most service teams are still early in AI adoption. Only 10% report mature deployment, while 26% are scaling, 35% are in initial rollout, and 26% remain in exploration, with 3% unsure.

    When I map the data to what I’ve seen in the field, the maturity difference shows up immediately in outcomes. Teams at mature deployment don’t just automate repetitive tasks; they build AI into critical workflows, give it real responsibility, and iterate continuously. Beyond automating the bulk of their manual work, they’re using AI to proactively engage customers and perform tasks on their behalf.

    The results follow. Of the teams that have reached mature deployment, 43% report higher quality and consistency across support—nearly double the rate of those still in the initial deployment stage. That quality shift is how support evolves from a cost center to a value driver. Great experiences don’t just prevent churn; they create advocacy and become a reason customers choose you. The more you trust your AI Agent with meaningful work, the more it creates the conditions for higher-quality, more consistent support.

    One example I point to often: Lightspeed. They operate a complex product across regions and languages, with tens of thousands of monthly requests. When they adopted Fin in early 2023, they needed a solution that could scale with that complexity—and they treated the transition like a first-class change program.

    They leveraged foundational training and built custom, in-house modules aligned to their processes. They supported their team post-launch and worked closely with leadership to align on the goals and benefits of AI. In a large, distributed org, that executive alignment created ownership and momentum. Their VP of Information Systems, Yamine Gluchow, put it perfectly: "It’s not magic. If you invest in understanding, adoption, and great content, AI performance takes off."

    Bar chart on how teams use an AI Agent for customer service, comparing mature vs initial deployments: automate manual work (63% vs 52%), proactive engagement (51% vs 41%), and performing customer tasks (45% vs 28%).
    Mature AI Agent rollouts deliver bigger gains in customer service—outperforming initial deployments in automation, proactive engagement, and task completion (63% vs 52%, 51% vs 41%, 45% vs 28%)—showing how depth drives measurable impact.

    Their outcomes reflect that depth: An 88% involvement rate. 72% of Fin conversations resolved without human intervention. 43,000+ customer requests resolved monthly. Service in 12+ languages across 100+ countries. Stable CSAT—with improvement in some markets.

    What impressed me most was the complexity Fin now resolves. A merchant in France asked about tax invoices—normally a long phone call to check back-end data and explain rules step by step. Instead, Fin handled the conversation in French, provided an accurate end-to-end explanation, and earned positive CSAT. That’s what mature deployment looks like: a system that absorbs complexity and delivers correct, efficient results at scale.

    So how do we build toward that level of maturity? In my experience, this journey requires a mindset shift and operational rigor—not just a bigger AI budget.

    Rethink how you approach support. If you were building from scratch today, you’d design around AI from day one. As Grant Lee, CEO of Gamma, puts it: "If you want to unlock the real value of AI, you have to design for it, not retrofit around it." Treat AI as infrastructure, not a feature. That shift impacts your org design, workflows, and what “good” looks like.

    Neon green hero graphic reading 'The 2026 Customer Service Transformation Report', with subhead 'The AI deployment gap is widening' and a black 'Get the report' button over a bar-chart pattern.
    Leaders are racing ahead with real AI in support. Explore the 2026 Customer Service Transformation Report to see where deployment is stalling, benchmark your team, and get practical steps to scale automation that delights.

    Secure executive sponsorship early. You won’t scale without C-suite backing. AI reshapes how support works, how teams are structured, how performance is measured, and how cost and value flow. Align your CFO on ROI, your CCO on journey design, and your CEO on customer experience as a strategic advantage. Early wins are great—but the compounding gains only come when leadership backs AI as infrastructure, not a one-off cost save.

    Assign clear ownership for AI performance. One common failure mode: no one owns the AI. Stand up an AI operations lead or support ops specialist to review resolution trends and handoffs, tune content and configuration, coordinate on systemic issues, and drive a prioritized improvement roadmap. Without this role, feedback loops break and performance plateaus.

    Treat content as critical infrastructure. Your AI Agent is only as good as the knowledge it can access. Ensure coverage for the topics it must handle, keep information accurate and current, and structure content so it’s easy for AI to consume. Make maintenance part of BAU, not a quarterly fire drill. A clean, governed, retrieval-first pipeline dramatically increases autonomous resolution.

    Build a continuous improvement system. AI performance isn’t static. Train your AI Agent by expanding its knowledge, refining behavior, and connecting new data sources to handle more scenarios autonomously. Validate changes against real scenarios before they ship. Roll out updates in a controlled way across channels and segments. Use performance data to find patterns—frequent handoffs, low-resolution topics—and decide what to improve next. I often point to the Fin Flywheel (Train → Test → Deploy → Analyze) as a practical example of turning performance data into action.

    The big takeaway from the "2026 Customer Service Transformation Report" is encouraging: investment is widespread, and early returns are real. The bigger opportunity is to turn those early wins into durable transformation. Teams leaning into AI as infrastructure—supported by executive alignment, clear ownership, strong content, and a continuous improvement loop—are already separating from the pack.

    Next up in this series, I’ll dig into how leading teams measure success. Beyond simple cost savings, mature deployments tie AI to clear ROI and strategic impact—shifting more work into value-adding, revenue-generating territory. Follow along here, or subscribe on LinkedIn to get the next installment in your feed.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Amplitude’s AI Visibility Upgrade: Content Generation, Chat Segmentation, Sleeker UI—Why It Matters

    Amplitude’s AI Visibility Upgrade: Content Generation, Chat Segmentation, Sleeker UI—Why It Matters

    I look for analytics upgrades that meaningfully compress time-to-insight for product teams. The newest expansion of Amplitude AI Visibility stands out because it improves how we explore user behavior, automate insight creation, and translate data into action across product-led growth motions.

    Explore the most recent updates to Amplitude AI Visibility, including content generation, AI chat-driven segmentation, better UI, and improved reliability.

    Here’s how I’m thinking about the impact. Content generation can turn raw events into ready-to-share narratives—experiment summaries for A/B testing, cohort deep-dives for retention analysis, and executive briefs that tie outcomes to roadmap decisions. For leaders and ICs alike, this trims the manual lift in Amplitude analytics while keeping the human in the loop to verify context and nuance.

    AI chat-driven segmentation is another meaningful unlock. Instead of clicking through complex filters, I can describe the cohort I want in natural language and iterate quickly. That speeds up continuous segmentation work—spotting activation bottlenecks, isolating churn precursors, or defining cohorts for product-led growth experiments—and keeps the team focused on hypotheses and decisions, not interface friction. With LLMs for product managers, the key is pairing this speed with clear guardrails and validation steps.

    The updated UI matters more than aesthetic polish. A clearer, more consistent experience reduces cognitive load, improves adoption across cross-functional partners, and reinforces a unified analytics platform approach. Improved reliability, paired with strong observability, increases trust in the stack—critical when insights drive roadmap priorities and high-visibility launches.

    Operationally, I’d roll this out with a simple playbook: identify 2–3 high-value use cases (e.g., activation funnel analysis, churn cohort exploration, experiment reporting), define success metrics (time-to-insight, stakeholder adoption, decision velocity), and establish basic AI risk management and data governance guardrails (prompt templates, access policies, and review steps). The goal is to turn AI workflows into a durable capability rather than a one-off novelty.

    Bottom line: these enhancements remove friction between questions and answers. If your team relies on Amplitude analytics, the combination of content generation, AI chat-driven segmentation, a cleaner UI, and stronger reliability should accelerate discovery cycles and help you translate insight into action with greater confidence.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Two People, Zero Waste: How Earmark’s Agentic AI Turns Meetings into Finished Work

    Two People, Zero Waste: How Earmark’s Agentic AI Turns Meetings into Finished Work

    I care about meetings only insofar as they create momentum and outcomes. What if your meetings could actually produce the artifacts you need—specs, tickets, slides—before the call even ends?

    I recently listened to an episode of Just Now Possible where Teresa Torres talks with Mark Barbir (CEO) and Sanden Gocka (Co-Founder), the co-founders of Earmark, about building a productivity suite that turns unstructured conversations into finished work in real time. As a product leader, this premise hits the sweet spot of agentic AI, real-time AI workflows, and ruthless focus on outcomes over output.

    Listen to this episode on: Spotify | Apple Podcasts

    Unlike generic AI notetakers that produce summaries nobody reads, Earmark runs multiple agents in parallel during your meetings—translating engineering jargon, drafting product specs, even spinning up prototypes in Cursor or V0 while you're still talking. That’s the bar I want from AI in the room: finished work, not notes.

    What impressed me most was the clarity of their pivot. They moved from an Apple Vision Pro presentation coaching tool to a web-based meeting assistant. I’ve made similar calls: when the distribution path and daily workflow are obvious, you follow the user’s gravity. This shift unlocked a broader surface area—PMs, engineers, design partners—and made agentic workflows useful where work actually happens.

    They also turned a technical constraint into a commercial advantage. Their ephemeral (no-storage) architecture became a feature for enterprise sales. I’ve seen this repeatedly in AI risk management: privacy-by-design and clear data governance reduce friction with security reviewers and accelerate procurement. For many enterprises, “we don’t store your data” is the win condition.

    Cost discipline was another standout. They tackled the hard problem of making real-time AI affordable—from $70 per meeting down to under a dollar through prompt caching. That’s not just optimization; it’s product strategy. Choices like model selection, context window management, and retrieval-first pipeline design determine whether a feature can scale to every meeting or remains a demo.

    On capability design, the team leaned into templates and simulated stakeholders to ship value fast. Template-based agents: Engineering Translator, Make Me Look Smart, Acronym Explainer. Personas that simulate absent team members (security architect, legal, accessibility). This is exactly how I frame early AI workflows: remove friction for the product trio, anticipate blockers, and let the agent do the tedious, error-prone first pass.

    They were refreshingly pragmatic about models. Why GPT 4.1 still beats newer models for prose quality in their use case is a reminder that “best” is contextual. When the job-to-be-done is precise prose and production-grade artifacts, consistent quality trumps leaderboard buzz. Of course, they also invest in guardrails to ensure quality and manage hallucinations—another non-negotiable for enterprise adoption.

    Search and analysis across time is where many AI products stumble. They explained the limits of vector search for analysis questions across meetings and how they’re building agentic search with multiple retrieval tools (RAG, BM25, metadata queries, bespoke summaries). I couldn’t agree more: analysis requires reasoning over structure, time, and purpose—not just semantic proximity. Layered retrieval with stateful agents beats a single embedding call.

    They also articulated a crisp user thesis: design for product managers as the extreme user to solve for everyone. In my experience, if you satisfy the PM’s bar for clarity, traceability, and actionability, engineers, designers, and go-to-market teams benefit immediately. That’s how you earn daily active use, not once-a-week novelty.

    For builders curious about the stack and comparables, they discuss services and tools like Assembly AI for speech-to-text, OpenAI API with prompt caching support, and build integrations with Cursor and V0 by Vercel. They also reference Granola as a comparison point and nod to ProductPlan, where both founders previously worked. If you want to try the product, here’s Earmark—a productivity suite where the work completes itself.

    If you're a PM drowning in follow-up work or a builder curious about real-time AI architectures, this conversation offers a detailed look at what it takes to ship an AI product that people can't imagine working without. Personally, I see this as a credible path toward an AI chief of staff—their vision goes beyond automating deliverables to orchestrating judgment, compliance signals, and cross-functional readiness.

    The episode covers the founder backstory, what Earmark does, comparisons to competitors, unique features, templates and personas, technical decisions, early versions and challenges, optimizing transcript summarization, managing multiple tools and costs, challenges with context and reasoning models, innovative search and retrieval techniques, creating actionable artifacts from meetings, ensuring quality and managing hallucinations, and the future vision for an AI chief of staff. It’s a full-spectrum look at building with agentic AI, not just talking about it.

    Podcast transcripts are only available to paid subscribers.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Mastering 30,000-Foot Vision and Ground-Level Execution: Systems That Decide Without You

    Mastering 30,000-Foot Vision and Ground-Level Execution: Systems That Decide Without You

    Executive function, for me, is the art and discipline of building systems that make high-quality decisions without my constant involvement. The real unlock isn’t personal heroics; it’s institutionalizing judgment. When I do my job well, teams move faster, ambiguity shrinks, and the organization compounds learning even when I’m not in the room.

    Operating simultaneously at 30,000 feet and ground level is the defining muscle of executive leadership. I deliberately switch altitudes. At 30,000 feet, I obsess over strategy, architecture, and resourcing. On the ground, I validate core assumptions with firsthand data, listen for weak signals, and spot process cracks before they widen. Altitude changes are not random; they’re triggered by variance from plan, critical customer moments, or leading indicators that deviate from expected ranges.

    The leap from frontline manager to manager of managers is where many rising leaders stall. As a manager of managers, my primary value shifts from personal execution to system design. I move from answering questions to installing mechanisms that ensure questions get answered well by others. This includes clear decision rights, shared metrics, and repeatable, lightweight rituals that scale across teams.

    What is an executive actually accountable for? Outcomes over output, talent density, and the clarity of the operating system. That means defining strategy, aligning resources, creating a cadence of review that exposes truth, and ensuring incentives reward the behaviors we want. My barometer: if I step away, do priorities hold, do metrics behave as expected, and do tradeoffs land where I would have landed?

    Knowing when to dive deep versus when to step back is a craft. I dive deep when risks are existential, when metrics have no credible owner, or when narrative and numbers diverge. I step back when leaders demonstrate consistent judgment, metrics sit inside control limits, and learnings are documented. The principle I return to again and again: context is everything. Senior leaders operate on context, not control.

    To scale judgment, I teach people how I think. I externalize my mental models: how I construct decision trees, how I stress-test assumptions, and how I weigh time horizons. I rely heavily on driver trees for metrics because they force causal clarity. If we can’t map how a top-line goal decomposes into controllable levers, we’re managing by hope, not design.

    Creating a shared language across the business is a force multiplier. I standardize definitions for our core metrics, codify what “good” looks like, and make it easy to repeat the system. We align around outcomes versus output, and we use cadences like MBRs and QBRs to unify narrative and numbers. Shared language makes decisions legible across functions and reduces rework.

    My COO playbook emphasizes owning the full customer experience end to end. When marketing rolls up under a COO in certain stages, the upside is coherence: one narrative from awareness to activation to expansion, one set of metrics, one growth engine. The point isn’t org charts; it’s removing seams customers can feel.

    Demanding and supportive is not a contradiction. I set ambitious, unambiguous bars and back them with coaching, resourcing, and fast feedback. The combination builds trust: expectations are clear, and help is immediate. I expect leaders to bring problems paired with proposed solutions and to escalate early, not perfectly.

    Inside my executive interview process, I’m assessing altitude agility, operating cadence, and taste in metrics. I use structured interviews and live case workshops to see how candidates frame ambiguous problems, build driver trees, and prioritize tradeoffs. The best prompts are simple and revealing: design the operating system for a 3x scale scenario; diagnose a broken funnel with incomplete data; align two teams with conflicting incentives. The workshop prompts that reveal everything surface thinking speed, humility, and the instinct to make context legible.

    The common thread in failed executive hires is a mismatch between the company’s operating system and the leader’s default mode. Some leaders can’t stop doing the work themselves. Others stay too abstract and never build mechanisms. I look for demonstrated ability to change systems, not just run them—leaders who can both author and evolve the playbook.

    On metrics, I practice the driver tree philosophy. I begin with the North Star, decompose it into controllable levers, instrument each node, and assign single-threaded owners. We design review cadences where deviations trigger targeted diagnostics, not thrash. Each tree has documented assumptions, data sources, and thresholds that prompt action. This is how teams learn to anticipate, not react.

    High-functioning executive teams are visibly collaborative. We clarify decision rights, disagree and commit quickly, and conduct post-decisions to harvest learnings without blame. My favorite litmus test is simple: can 30 people operate as one team when it matters? When we get this right, information flows, execution accelerates, and customers feel consistency.

    One of the most counterintuitive leadership lessons is working yourself out of a job. If the system cannot run without you, you have a key-man risk, not a leadership strength. I aim to build successors, codify judgment, and design mechanisms that make good decisions the default state. That’s how you create durable, compounding advantage.

    And the review feedback you can’t unhear? Mine was brutally honest: my bar was high, but my mechanisms were implicit. Once I wrote them down—how I decide, what I expect, where I dive deep—the organization moved faster, and I actually became less central. If there’s a throughline to extraordinary leadership, it’s this: make your judgment teachable and your systems inevitable.


    Book a consult png image
  • From Idea to Impact: My PM-Friendly Blueprint to Building Your First AI Agent Fast

    From Idea to Impact: My PM-Friendly Blueprint to Building Your First AI Agent Fast

    AI agents are quickly moving from novelty to necessity, and the fastest way to capture value is to approach them like any other high-stakes product initiative. In this guide, I share how I plan, build, and launch production-grade agents with a product mindset—balancing ambition with risk, speed with governance, and innovation with measurable outcomes.

    I start by getting crisp on the outcome. Who is the primary user, what job are they hiring the agent to do, and how will we know it’s working? I translate this into outcomes vs output OKRs, such as resolution rate, time-to-value, cost-to-serve, or qualified pipeline influenced—anchoring the roadmap before a single line of code or prompt is written.

    Next, I map the agent’s scope and boundaries. I write a simple capability canvas: the tasks the agent must perform, the tools it can use, the data it can access, and the constraints it must respect. Most successful builds follow a retrieval-first pipeline: connect trusted knowledge sources, enrich with metadata, and manage a lean context window to keep responses relevant and cost-efficient. From the start, I bake in privacy-by-design, data governance, and AI risk management so compliance isn’t an afterthought.

    Model selection comes after the workflow is clear. I choose an LLM for the job (latency, cost, multilingual needs, and tool-use fidelity) and pair it with the right connectors and actions—think CRM integration, ticketing, search, or internal APIs. For voice experiences, I define a voice AI agent persona, turn-taking rules, and barge-in behavior. This is where agentic AI patterns shine: structured planning, tool invocation, and verification loops create a resilient, goal-directed system.

    Prompt design is product design. I write system prompts that define role, tone, constraints, data sources, and success criteria. I add few-shot examples that mirror my top use cases and edge cases, then apply prompt engineering best practices to control style, limit speculation, and encourage citations. For voice, I include prompt engineering for voice to optimize brevity, warmth, and disfluency handling without sacrificing accuracy.

    Before launch, I build an eval-driven development workflow. I curate golden datasets from real user intents, add adversarial cases, and automate evals for accuracy, safety, grounding, and tool-use success. I set a minimum detectable effect (MDE) so A/B testing can validate improvements with confidence, and I define go/no-go thresholds to prevent regression. This becomes my continuous discovery loop for the agent.

    Instrumentation is non-negotiable. I wire up Agent Analytics to track task success, containment/deflection rate, handoff quality, cost per task, and user satisfaction. I supplement with a unified analytics platform and session replays to observe failure patterns. These signals feed prioritization and help me decide when to expand scope versus harden reliability.

    For delivery, I rely on CI/CD with feature flags to gate risky capabilities, plus canary releases for new tools and prompts. I monitor DORA metrics to maintain deployment frequency without trading off quality. When incidents happen, I treat them like production issues: incident management playbooks, rollbacks, and clear postmortems.

    Trust is earned through safety and transparency. I enforce least-privilege access, structured logging, and red-teaming for jailbreaks, prompt injection, and data exfiltration. Threat detection and response plus clear user disclosures keep the experience responsible and compliant with regulatory requirements.

    GTM is product-led. I use in-app guides, product tours, and onboarding checklists to drive user activation and early wins. I define success moments, turn them into habit loops, and run retention analysis to find where users stall. This tight loop of messaging, measurement, and iteration accelerates product-market fit.

    Common high-ROI use cases I prioritize include customer support ai strategy (automated resolution and augmented agent assist), sales and success workflows (lead qualification, QBR prep), and internal knowledge copilots (policy, process, engineering runbooks). Each starts narrow, ships fast, and scales with proven evidence from analytics and experiments.

    If you’re skimming, here’s the blueprint: clarify outcomes, design AI workflows with a retrieval-first pipeline, select the right LLM and tools, engineer robust prompts, institutionalize evals and A/B testing, instrument Agent Analytics, ship with CI/CD and feature flags, and iterate with discipline. In the walkthrough video above, I go deeper on templates, prompts, and experiments you can use to build your first agent with confidence.


    Inspired by this post on Product School.


    Book a consult png image
  • Becoming AI Native: A Practical Playbook to Transform Strategy, Teams, Data, and Tech

    Becoming AI Native: A Practical Playbook to Transform Strategy, Teams, Data, and Tech

    AI Native is more than a feature set—it’s an operating system for the entire business. In my role leading product, I’ve seen that companies win when they treat AI as a first-class citizen across strategy, architecture, workflows, and go-to-market. In this narrative, I unpack what “AI Native: What It Means and How to Get There” looks like in practice, sharing the frameworks I use to align vision, technology, and teams around measurable customer outcomes.

    When I say AI Native, I mean a company where core value creation, customer experience, and internal operations are powered by AI end-to-end. It’s not just bolting on a chatbot. It’s rethinking product strategy, data foundations, and execution so we can deliver differentiated experiences faster, at lower cost, and with higher reliability. This shift demands clarity on where AI truly creates leverage—and the courage to say no where it doesn’t.

    The starting point is strategy. I ground teams in outcomes vs output OKRs and a crisp value proposition: Which customer jobs-to-be-done benefit most from generative AI? Where can we unlock 10x improvements in speed, accuracy, or personalization? We prioritize a small number of high-signal use cases, size impact, and design Minimum Viable Experiments (MVEs) to de-risk assumptions before scaling. This is where build vs buy decisions matter—use foundation models and platforms for commodity needs, and invest your scarce engineering time where differentiation lives.

    Next comes architecture and data. AI Native products thrive on a retrieval-first pipeline, strong context window management, and model-agnostic abstraction so we can swap providers as needs evolve. I emphasize privacy-by-design, robust data governance, and observability across prompts, embeddings, latency, and cost. These guardrails let us move quickly without compromising trust, especially in regulated or enterprise settings.

    Execution shifts as well. I organize empowered product teams and product trios around the highest-value workflows, not components. Continuous discovery pairs with CI/CD, feature flags, and telemetry so we can test safely in production. Eval-driven development is non-negotiable: we design offline and online evaluations that mirror real user success criteria—accuracy, helpfulness, safety, and business outcomes—then wire those evals into the build pipeline to prevent regressions.

    On the intelligence layer, we increasingly rely on AI workflows and agentic AI to orchestrate multi-step tasks—retrieval, reasoning, tool use, and verification—with human-in-the-loop where appropriate. Clear system prompts, tool definitions, and fallbacks keep behavior predictable. This is where product craft meets prompt engineering and LLMs for product managers: the best teams codify patterns, share prompts in a living library, and standardize on a lightweight AI product toolbox.

    Risk and reliability are part of the product, not an afterthought. I run AI risk management as a continuous program spanning red teaming, content filters, PII handling, audit trails, and incident response. We tie policies to concrete controls and create simple dashboards leaders can trust. The goal is to ship boldly with safety, maintainability, and scale in mind.

    Becoming AI Native also changes how we grow. We lean into product-led growth with clear in-app guides, product tours, and activation paths that teach users where AI shines. CRM integration ensures sales and success teams have context to coach customers. Pricing experiments—often usage- or value-based—align revenue with the impact customers feel, while retention analysis helps us double down on the use cases that drive compounding value.

    To make this real, I use a 90-day plan. Days 0–30: align on strategy, top use cases, and risk posture; stand up data pipelines and a basic retrieval-first stack; define evaluation metrics. Days 31–60: ship MVEs behind feature flags, run head-to-head evals, and instrument observability; start a cross-functional community of practice. Days 61–90: scale the winning use cases, formalize governance, and publish a roadmap tied to outcomes—not just features—with clear SLAs and success metrics.

    The destination is a durable advantage: faster iteration cycles, smarter experiences, and a product strategy that compounds with every interaction. If you’re ready to make the leap, start small, measure obsessively, and build the muscle to ship, learn, and adapt. That’s the heart of becoming AI Native—and it’s well within reach.


    Inspired by this post on Product School.


    Book a consult png image
  • From Coaching to Co‑Pilots: How AI Elevates Product Owners and Feature Teams

    From Coaching to Co‑Pilots: How AI Elevates Product Owners and Feature Teams

    After two decades of coaching product teams, I’m making a deliberate shift in how I guide leaders and practitioners. The destination hasn’t changed—great products, empowered product teams, and durable outcomes—but the route has. AI is now a practical, compounding advantage, and it demands we evolve our product coaching model.

    In my day-to-day as a VP of Product Management at HighLevel, I’ve watched AI move from novelty to necessity. Large language models, agentic AI, and streamlined AI workflows now accelerate how we discover opportunities, test hypotheses, and communicate decisions. This is not about replacing product judgment; it’s about augmenting it with a disciplined AI Strategy.

    For years, I’ve raised the alarm about the gap between execution and strategy among “product owners and feature team product managers.” The intent was never to pile on more process. It was to strengthen product discovery, sharpen product strategy, and clarify outcomes vs output OKRs so that teams ship what matters. AI finally gives us the leverage to make that shift unavoidable—and repeatable.

    Here’s the new coaching stance: treat AI as a co-pilot, not an answer engine. I coach teams to build an AI product toolbox they can trust—prompt engineering patterns, eval-driven development to measure model quality, and a retrieval-first pipeline for institutional knowledge. When combined with continuous discovery, this creates a tight loop between insight, iteration, and impact.

    Practically, this means elevating core rituals. In product trios, we start discovery with AI-assisted opportunity mapping, then pressure-test problem framing with user evidence. We generate multiple solution sketches with LLMs for product managers, annotate assumptions, and use A/B testing with a minimum detectable effect (MDE) to validate the riskiest bets. The result is faster learning without skipping the hard thinking.

    On the governance side, I set clear guardrails: privacy-by-design, data governance, AI risk management, and explicit criteria for acceptable model behavior. We treat prompts and evaluation datasets as versioned assets, and we pair product managers with forward deployed engineers to operationalize insights in production safely.

    Coaching also extends to measurement. We anchor product outcomes in the customer journey and watch leading indicators for activation, adoption, and retention. On the delivery side, we look at deployment frequency and the health of the feedback loop between support signals and roadmap choices—because empowered product teams win when they learn faster than the market shifts.

    The most profound cultural change is mindset. Instead of asking AI for answers, we ask it for alternatives, counterexamples, and structured ways to explain tradeoffs to stakeholders. That makes product positioning clearer, decision narratives stronger, and the path from insight to execution shorter.

    If you’re responsible for developing talent, reframe coaching as enablement plus guardrails. Build the AI muscle into everyday discovery and delivery, not as a side project. When we do this well, we transform good practitioners into strategic operators—people who pair judgment with leverage and consistently ship value.

    The bottom line: AI doesn’t replace the craft; it amplifies it. Our job as leaders is to harness that amplification responsibly and turn it into a durable competitive advantage.


    Inspired by this post on SVPG.


    Book a consult png image
  • How We Built Rock-Solid AI Infrastructure: Lessons From Scaling AI Visibility and Reliability

    How We Built Rock-Solid AI Infrastructure: Lessons From Scaling AI Visibility and Reliability

    Scaling AI Visibility pushed me to rethink what “reliable” really means for AI infrastructure. As my team expanded usage across more datasets, models, and workflows, we uncovered unexpected sources of report failure and built the guardrails, observability, and processes that now anchor our stability strategy.

    In practice, the surprising failure modes were rarely the loud ones. We saw report failure triggered by small schema drift from non-deterministic LLM outputs, silent permission changes in upstream data sources, token-limit truncation that broke downstream parsing, third-party API rate limits that surfaced only under bursty load, and clock skew that confused idempotent writes. Individually these issues looked minor; together they created reliability debt.

    Our first move was deep observability. We instrumented the end-to-end pipeline with structured logs, distributed tracing, and high-signal metrics mapped to SLOs and error budgets. That visibility let us separate symptom from cause, quantify impact by segment, and prioritize fixes that moved business outcomes, not just vanity thresholds. It also gave product managers and SREs a shared, real-time view to make tradeoffs explicit.

    Next, we hardened the runtime with resilience patterns: circuit breakers on flaky dependencies, timeouts tuned to p95 behavior, retries with jittered backoff, idempotent processing for at-least-once delivery, and backpressure-aware queues. We enforced schema contracts at ingestion with JSON validation and added feature flags to decouple deploys from releases, so we could roll forward or back within minutes when signals degraded.

    On the product side, we adopted eval-driven development for model and prompt changes, shifting risky modifications behind canaries and staged rollouts. CI/CD gates required evaluation baselines to hold or improve before promotion. We tracked DORA metrics to keep deployment frequency high without sacrificing change failure rate, and we used P95 latency and budget burn as the forcing functions for prioritization.

    Culture mattered as much as code. We formalized incident management with clear ownership, lightweight runbooks, and blameless reviews that produced crisp, automatable actions. We partnered early with SRE on SLO design, integrated privacy-by-design and PII scanning into the pipeline, and treated AI risk management as an ongoing product constraint rather than a checkbox.

    The net effect: fewer flaky reports, faster recovery when things do break, and far more confidence to ship improvements to AI Visibility at pace. If you’re scaling similar capabilities, start with observability, make resilience patterns non-negotiable, and let SLOs guide your product roadmap. Reliability is not a phase—it’s the product.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image