Category: Product Management

  • Eliminating the Last Bottleneck: Agentic AI in Amplitude That Builds What Matters Faster

    Eliminating the Last Bottleneck: Agentic AI in Amplitude That Builds What Matters Faster

    For years, I’ve watched high-performing product teams run into the same wall: the gap between insight and action. Dashboards multiply, yet decisions stall. That final mile—where we interpret trends, prioritize tradeoffs, and ship changes—remains the last bottleneck. It’s not a data problem; it’s a bandwidth and focus problem.

    Amplitude's AI Analytics Platform takes the next step: agents that investigate, monitor, and act so your team can build what actually matters.

    From my seat leading product at HighLevel, I see “agentic AI” as a structural upgrade to the product operating system. Instead of waiting on human cycles to discover anomalies, craft hypotheses, and trigger the next experiment, Agent Analytics can continuously investigate user behavior, monitor mission-critical metrics, and initiate actions—closing the loop from observation to outcome. That shift transforms analytics from a passive reference layer into an active, decision-making teammate.

    Practically, this matters because empowered product teams win on speed and focus, not on the volume of reports. When agents surface the most material opportunities—say, a sudden drop in activation for a high-value cohort or a retention dip tied to a recent release—we compress time-to-insight and, more importantly, time-to-action. The result is fewer context switches, fewer meetings, and more cycles invested in building meaningful value.

    The most compelling use cases are those that compound: continuous discovery that highlights friction in onboarding flows, proactive retention analysis on at-risk segments, automated experiment prioritization aligned to outcomes vs output OKRs, and closed-loop alerts that trigger workflows in your CRM or in-app guides to accelerate product-led growth. With a unified analytics platform feeding these agents, we can move from reactive analytics to anticipatory product strategy.

    Of course, leverage requires guardrails. I anchor adoption in three pillars: clear decision rights for agents (what they can autonomously act on vs. recommend), transparency in reasoning (so PMs can audit how conclusions were reached), and explicit alignment to key outcomes (activation, retention, expansion). Done right, this is not a replacement for product judgment—it’s an amplifier for it.

    If I were rolling this out today, I’d set a success dashboard that tracks: time-to-insight, time-to-action, percentage of initiatives initiated by agents, impact on North Star metrics, and the reduction in manual analysis hours. I’d also implement lightweight prompts and playbooks—LLMs for product managers—that standardize how we ask better questions and interpret agent outputs.

    The promise here is simple but profound: eliminate the last bottleneck by giving your teams a partner that never sleeps, never tires, and never loses the plot. When agents investigate, monitor, and act, we spend less time arguing about the data and more time building the right things, faster.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Design Smarter with Amplitude + Figma Make: AI-Powered Prototyping, Testing, and Learning

    Design Smarter with Amplitude + Figma Make: AI-Powered Prototyping, Testing, and Learning

    I rely on Amplitude analytics and Figma Make to turn real user insights into high-fidelity prototypes in hours, not weeks. This pairing compresses our continuous discovery loop and helps my team prioritize what truly moves the needle for customers and the business.

    Design smarter with Amplitude and Figma Make. Use AI and product analytics together to prototype, test, and learn faster.

    Here’s how I put that into practice: I start with product analytics to isolate a measurable opportunity—often around user activation, conversion drop‑offs, or retention analysis. Amplitude cohorts and funnels surface where friction hides; I translate those signals into design prompts and flows in Figma Make, so we can visualize and validate potential solutions before a single line of production code is written.

    Once a promising direction emerges, I convene the product trio—design, engineering, and product—around a clear outcome metric, not output. We build a lightweight driver tree, align on a hypothesis, and define the minimum detectable effect (MDE) so our A/B testing has enough statistical power to be decision‑worthy. From there, we create a small set of Figma Make variations that reflect distinct value hypotheses, not cosmetic tweaks.

    On the experimentation front, I gate risky changes behind feature flags and ship via our CI/CD pipeline to limit blast radius and accelerate feedback. I monitor the experiment with a unified analytics platform mindset: the same definitions and segments in Amplitude power both pre‑launch discovery and post‑launch evaluation. That continuity lets us compare prototype expectations against production reality with far fewer translation errors.

    A few principles keep this workflow sharp and responsible: I use privacy-by-design patterns, apply data governance guardrails to keep datasets consent‑aligned, and set AI risk management standards so generated designs respect accessibility and brand constraints. Critically, I avoid vanity metrics—I measure learning speed, decision quality, and downstream impact on activation or retention, which are what sustain product-led growth.

    If you’re looking for a playbook, try this cadence: 1) define the customer outcome and success metric; 2) map a simple driver tree to narrow the solution space; 3) explore multiple flows in Figma Make; 4) validate quickly with concept tests and usability checks; 5) run A/B testing with a clearly defined MDE; 6) ship iteratively behind feature flags; 7) close the loop in Amplitude with cohort‑level retention analysis; 8) refine copy and UX writing to reinforce the core value proposition. Repeat until the signal is undeniable.

    Blending Amplitude analytics with Figma Make has become my fastest path from insight to impact. It keeps my team focused on learning that compounds, features that matter, and outcomes customers can feel—so we truly make what matters.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Inside ShowMe’s Playbook: Orchestrating Voice, Video & Multi‑Agent AI Sales Reps that Close

    Inside ShowMe’s Playbook: Orchestrating Voice, Video & Multi‑Agent AI Sales Reps that Close

    What happens when you treat an AI agent not as a chatbot, but as a full teammate on your sales team – one that can jump on video calls, demo your product, make phone calls, and follow up over days?

    I recently dug into this question with the team behind ShowMe, an AI-native startup building digital sales reps for inbound teams. Founded in April 2025, ShowMe has engineered a multi‑agent system that combines conversation agents for live voice and video interactions, evaluator agents that score every call for quality and sentiment, and creator agents that ingest customer documentation to build tailored playbooks. A workflow layer orchestrates the entire lead‑to‑close journey across days, not minutes—exactly the kind of agentic AI approach I expect to see become standard in revenue workflows.

    What stood out to me first was the origin story: a glaring conversion gap on a previous website, and the realization that a purpose‑built AI could fill it. The initial MVP was refreshingly pragmatic—start with a voice agent, pair it with product videos, and back it with a simple RAG knowledge base. That retrieval‑first pipeline let the team ship quickly, validate real user behavior, and then scale sophistication where it mattered.

    Then came a pivotal affordance shift: adding a realistic avatar via HeyGen. It wasn’t just eye candy; it changed how prospects engaged. The video-call UX established trust and made the AI’s capabilities legible at a glance. Prospects behaved as if they were with a human rep—interrupting, probing, and asking for demos—because the surface area invited that behavior.

    On the architecture side, the team decomposed a single sales conversation into multiple specialized sub‑agents—greeting, qualifying, pitching—to manage latency, memory constraints, and model limitations. Deterministic workflows handle the happy paths reliably, while a smart orchestrator is emerging to break out of rigid paths when context demands it. Confidence scoring and frustration detection kick in for real‑time human handoff decisions, a must for revenue‑critical moments where a missed nuance can cost pipeline.

    Training the system to sell like your team is where it gets powerful. ShowMe ingests sales transcripts and training materials to teach company‑specific sales skills, then uses creator agents to assemble tailored playbooks. Conversation agents stay focused on live interactions, while evaluator agents continuously score calls for quality and sentiment. The result: repeatable, compliant, and brand‑consistent selling—without flattening personalization.

    Quality isn’t an afterthought—it’s operationalized. Early deployments run with customer-driven evaluation loops where 100% of conversations are reviewed, tapering to about 5% over time as confidence increases. Feedback becomes automated tests to prevent prompt regression, and production quality is proven with POCs, A/B rollouts, dashboards, and CRM logging. This is eval-driven development applied to go‑to‑market: measurable, auditable, and continuously improving.

    I also appreciate how they treat the agent as a coworker, not a widget. Onboarding happens via Slack, weekly reporting aligns with sales leadership rhythms, and tight CRM integration keeps data flowing both ways. That mindset unlocks adoption because it fits how sales teams actually operate—and it creates real Agent Analytics you can manage.

    From a product perspective, several pragmatic details matter. Real‑time voice and avatar demos rely on latency tricks and a library of video clips to keep interactions snappy. The conversation agent evolved from a basic Q&A bot into guided sales discovery, balancing personalization with the ever-present risks of hallucination. Guardrails, human‑in‑the‑loop, and clearly defined handoff rules are non‑negotiables in high‑stakes sales workflows.

    Looking ahead, the roadmap makes sense: move toward self‑serve PLG setup, add smarter orchestration that adapts beyond deterministic flows, and expand into adjacent roles like customer success. For product leaders building in gen ai, the pattern here is instructive: start with inbound value, design AI workflows that align to proven sales motions, and use rigorous evals to earn the right to automate more.

    If you want to go deeper into the build, the live demos, and the full multi‑agent orchestration, listen to this episode on: Spotify | Apple Podcasts. For more on the stack, explore ShowMe and the avatar platform HeyGen.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Go From 3 Customer Interviews to a High-Quality Opportunity Solution Tree—In Minutes

    Go From 3 Customer Interviews to a High-Quality Opportunity Solution Tree—In Minutes

    Most product teams—and especially well-run product trios—know they should be interviewing customers. More teams than ever are actually doing it. That’s the good news.

    The bad news? Many teams still struggle with what comes next. Turning raw recordings into a structured opportunity space that truly guides product discovery can feel overwhelming.

    In my experience, interview synthesis is cognitively demanding work. You have to extract the key moments from each conversation, translate those moments into clear opportunities, and then organize those opportunities into a coherent view of your opportunity space. It’s no surprise I hear teams say, "We need to stop interviewing so we can catch up on what we’ve already learned." Too often, they pause—and never start again.

    Recordings pile up. Maybe there are scattered notes. But nothing gets turned into an opportunity solution tree. The team hasn’t synthesized what they’ve learned, so the research isn’t actionable. That’s the gap I want to help close.

    What if you could go from 3 interviews to a draft OST in minutes?

    My AI goals are straightforward: 1) build tools that help you learn discovery and 2) build tools that help you do discovery. The learning tools are coming through on-demand courses. Today, I’m excited to share the first big step on the "do" side.

    I’m excited to see an expanded partnership with Vistaly—the opportunity solution tree tool many of you already use—to bring AI-powered discovery tools directly into their platform.

    Great synthesis happens in two steps: first, you synthesize each interview separately; then you synthesize across interviews. Most AI tools skip the first step and jump straight to cross-interview analysis—exactly how teams lose the nuance and context that make research actionable.

    This approach does both. You upload three interviews for the same product outcome. The AI extracts the key moments and opportunities from each one separately. Then it synthesizes across those interviews and generates a first draft of your opportunity solution tree for you. Three interviews in. A draft OST out.

    Here’s what this is—and what it isn’t. You’ve probably heard criticism of tools that promise "one-click opportunity solution trees." Those tools ask you to describe your market, click a button, and get a tree. The point of an opportunity solution tree is not to have one—it’s to synthesize what you’re learning from real customers so your team can align on the best path forward. A one-click tree built from made-up data is useless.

    Vistaly 2.0 landing page featuring 'Build what matters,' a blue Enroll in Beta button, and a dark-grid opportunity solution tree connecting an Outcome to Opportunity and Solution nodes.
    Turn interviews into insights in minutes with Vistaly. This hero screen invites you to enroll in beta and showcases an opportunity solution tree that maps outcomes to opportunities and actionable solutions.

    This approach is fundamentally different. It starts with your real customer interviews. The AI does the heavy lifting of extracting key moments and opportunities from those conversations and organizing them into a draft opportunity solution tree. But it’s a draft—you review it, refine it, and reorganize it. You bring your judgment and context to the work.

    My vision for AI-aided cross-interview synthesis is simple: AI identifies common opportunities across interviews, suggests a tree structure, and facilitates the team’s review. Historically, it’s been hard to give AI access to an opportunity solution tree in a way that preserves structure and context. The integration with Vistaly solves that problem by building this capability directly into the tool where your tree already lives.

    In my own experiments using Claude, the AI surfaced opportunities I missed—and I caught things it missed. The highest-quality synthesis came from combining both perspectives. Research (see here and here) backs this up: Experts working with AI outperform both experts working alone and AI working alone. That’s the model we’re building toward—AI generates the draft, you bring the expertise.

    I have mixed feelings about AI doing discovery work for us because there is real value in doing the synthesis yourself. But I also know that a draft OST you actually refine is better than a perfect process you never get to. This is about raising the floor—helping more teams get to a structured opportunity space, even if they aren’t doing every step manually.

    We’re looking for a small group of alpha partners to help shape this product. To apply, sign up for a free Vistaly account and upload three customer interviews for the same outcome or product space.

    We’ll select alpha partners from the applicants. We want a range of interview styles, experience levels, and product spaces. Selected partners will get access to the AI-powered synthesis tools and will work closely with the team to shape the product. Even if you aren’t selected for the alpha, your application puts you at the front of the line when we enter beta.

    A few things to know as you apply: Your three interviews should be for the same outcome, goal, or product space, so the tool can generate a meaningful OST. You don’t need to be a Vistaly user today—the account is free. You don’t need to be an expert interviewer either; we’re looking for a range of experience levels, though we’re particularly interested in story-based customer interviews.

    This is just the beginning. The vision is a full AI-powered discovery suite inside Vistaly—from interview analysis to complete interview snapshots to opportunity solution trees and beyond. We’ll learn alongside our alpha partners and share what we discover as we go.

    If you’ve been looking to bridge the gap between your customer interviews and your opportunity space, this is your chance to help shape how that works. Apply for the alpha today.


    Inspired by this post on Product Talk.


    Book a consult png image
  • LLMs vs AI Agents: Hard‑Won Lessons Product Teams Need to Nail for Real‑World Impact

    LLMs vs AI Agents: Hard‑Won Lessons Product Teams Need to Nail for Real‑World Impact

    When people ask me about "LLM vs AI Agents: What Product Teams Must Get Right," I start with a simple truth: an LLM is a powerful prediction engine, while an AI agent is a productized workflow that plans, takes actions with tools, remembers, and closes the loop on an outcome. That difference sounds academic until you’re on the hook for reliability, cost, and customer trust.

    In my role, I’ve shipped LLM copilots that delight users and piloted agents that automate complex workflows. The pattern that never fails is this: start assistive, then graduate to autonomy. Copilots accelerate people; agents own outcomes. When we respect that gradient, adoption climbs, incidents fall, and we earn the right to expand scope.

    The first decision point is use-case fit. If the task benefits from human judgment, high-context nuance, or brand voice, I frame it as a copilot with strong guardrails and crisp UX. If the task is well-bounded, tool-heavy, and verify‑able, I consider an agent—but only after we can measure end‑to‑end task success with eval-driven development.

    Architecture matters. I reach for a retrieval-first pipeline to keep responses grounded in authoritative data, then add tool use for actions (search, write, schedule, transact) with deterministic scaffolding to prevent thrashing. Good prompt engineering is table stakes, but context window management and a clean memory strategy (short‑term scratchpad, long‑term facts, and policy) separate demos from durable systems.

    Agents amplify both value and risk. I build safety in layers: role and scope definition, tool whitelists, unit limits, human‑in‑the‑loop checkpoints at irreversible steps, and privacy-by-design data governance. We log every decision token-for-token because auditability isn’t optional once agents touch customers, money, or data.

    Measurement is non‑negotiable. For LLM features, I track time‑to‑first‑token, response latency, groundedness, and user satisfaction. For agents, I add Agent Analytics: task success rate, number of steps per task, tool error rate, loop detection, guardrail triggers, escalation to human, cost per successful task, and containment rate. If we can’t see it, we can’t ship it.

    My delivery playbook mirrors modern software ops. We use feature flags, gated betas, and canary rollouts; we version prompts like code; we set incident management paths for model outages and tool drift; and we rehearse fallbacks so the experience degrades gracefully, not catastrophically. Dull operations build dazzling products.

    On roadmapping, I thin‑slice value. We introduce a minimal viable copilot that handles a single, frequent job-to-be-done with high success. Only after continuous discovery confirms product‑market fit do we grant more autonomy, one capability at a time. Outcomes vs output OKRs keep us honest: if the customer’s job gets done faster, cheaper, and with fewer errors, we scale; if not, we fix fundamentals before adding scope.

    Build vs buy is rarely binary. I tend to buy the undifferentiated heavy lifting—observability, prompt versioning, red‑teaming, and policy enforcement—while building the proprietary workflows, data modeling, and UX that encode our defensible advantage. The litmus test: if it’s part of our unique value proposition, we own it; if not, we integrate the best‑in‑class and move.

    Go‑to‑market must be as rigorous as the tech. We position clearly (assistant vs agent), price to value with transparent consumption SaaS pricing, and communicate risk posture in plain language. Customers don’t buy models; they buy confidence that a job gets done reliably within their constraints.

    Common failure modes repeat: shipping autonomy before instrumentation, treating prompts as magic instead of software, skipping data governance, and ignoring the human experience. The antidote is disciplined AI Strategy rooted in empowered product teams, tight feedback loops, and relentless evaluation.

    If you take nothing else: choose the right paradigm for the job (copilot first, agent when proven), ground with a retrieval-first pipeline, instrument with eval-driven development and Agent Analytics, and operationalize like a mission‑critical system. Do that, and you’ll turn LLM capabilities into durable product outcomes.


    Inspired by this post on Product School.


    Book a consult png image
  • Can AI Agents Master Enterprise Analytics? My Proven Task Framework and Amplitude Insights

    Can AI Agents Master Enterprise Analytics? My Proven Task Framework and Amplitude Insights

    Every week, product and data leaders ask me the same question: can AI agents truly shoulder enterprise analytics without sacrificing trust, governance, or speed? I’ve spent the past year putting agentic AI through its paces in real product workflows, and I’ve distilled what works into a practical, task-driven evaluation approach you can adopt immediately.

    Learn how to evaluate AI analytics agents with a task-based framework across analytics tasks. See how Amplitude’s Global Agent scores.

    When I say “enterprise analytics,” I’m talking about far more than chatty dashboards. The bar includes consistent metric definitions, privacy-by-design, RBAC and data governance, audit trails, low-latency decision support, and repeatable outcomes across retention analysis, funnels, cohorts, A/B testing, instrumentation planning, and anomaly detection—ideally within a unified analytics platform.

    My task-based framework evaluates eight capability pillars I expect from an enterprise-ready Agent Analytics solution: task coverage and depth across common product analytics workflows; data fidelity and governance (lineage, access controls, PII handling); instruction-following and reasoning transparency; evaluation rigor and reliability (repeatability, error modes, regressions); security and compliance posture; latency and cost efficiency; integration into existing product strategy workflows (e.g., CRM integration, CI/CD-linked instrumentation, experiment platforms); and human-in-the-loop controls for approvals and guardrails.

    Operationally, I define canonical tasks that reflect day-to-day product management: codify a North Star metric; perform retention analysis by cohort; generate and explain a funnel with drop-off drivers; recommend an event taxonomy and tracking plan; analyze an A/B test with minimum detectable effect (MDE) considerations; and propose a driver tree that maps inputs to outcomes. Each task comes with ground-truth datasets, acceptance criteria, and edge cases to stress the agent—an eval-driven development practice I’ve found indispensable.

    I then score maturity across four levels. L0: a pure chat UI that summarizes existing charts. L1: a retrieval-first pipeline that grounds responses in your analytics catalog and metric store. L2: a tool-using agent that is schema-aware, can write safe SQL, and reconciles results to canonical definitions. L3: a governance-aware autonomous workflow that executes analytics tasks end-to-end with approvals, audit logs, feature flags, and rollback plans. Most teams discover they’re between L1 and L2; reaching L3 requires serious investment in data governance and eval automation.

    Risk management is non-negotiable. I require strict data governance and privacy-by-design controls, including scoped credentials, PII redaction, policy-aware retrieval, and comprehensive observability (query traces, prompt/response logs, lineage). Feature flags and approval gates prevent unintended metric redefinitions. Red-teaming tasks expose prompt injection, schema drift, and hallucination failure modes before they hit production stakeholders.

    Where do agents shine today? Rapid exploration, SQL generation from schema context, summarizing experimentation results, and turning natural-language questions into actionable charts. Where do they struggle? Ambiguous metric semantics, under-specified experiment designs, and edge-case-heavy analyses where ground truth depends on organizational nuance. The cure is disciplined product management: codify definitions, maintain a living analytics taxonomy, and continuously harden your eval suite.

    In the context of product analytics stacks, Amplitude analytics is a common anchor for product teams, and many are evaluating “Amplitude’s Global Agent” to accelerate insight generation. In my framework, I look for how well it grounds to canonical metrics, handles retention and funnel tasks, explains trade-offs, and respects governance boundaries—before I consider expanded autonomy. I share the full task matrix and scoring rubric so you can replicate the assessment in your environment.

    If you’re getting started, pick your top ten high-frequency analytics tasks and define crisp success metrics for each (accuracy, explainability, latency, and reusability). Build a small eval harness with golden datasets, assertions, and regression tests. Favor a retrieval-first pipeline tied to your taxonomy and metric store, add human-in-the-loop approvals for sensitive actions, then pilot with a cross-functional tiger team. Measure time-to-insight, analyst hours saved, and stakeholder trust—then iterate.

    Enterprise analytics isn’t a single feature; it’s a system of definitions, workflows, and governance. With a task-based, eval-driven approach, agentic AI can become a reliable partner—not just a novel interface. If you’re evaluating options, apply this framework first, then expand scope as reliability and trust climb.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • What I Learned Scaling Analytics: Candid Lessons on Product Strategy and Product-Market Fit

    What I Learned Scaling Analytics: Candid Lessons on Product Strategy and Product-Market Fit

    I write from a place many product leaders know well—the moment when the data you need to make decisions simply doesn’t exist, and you have to build the capability from the ground up. That firsthand experience with gaps in analytics shaped how I think about product strategy, product discovery, and the relentless pursuit of product-market fit lessons.

    In my work, I lean on continuous discovery to surface the most meaningful problems, then translate those insights into outcomes vs output OKRs that keep teams focused on impact. When we anchor roadmaps to real user behavior and business results, we avoid vanity metrics and create a durable plan that compounds learning over time.

    Execution matters just as much as insight. I rely on rigorous A/B testing, clear minimum detectable effect (MDE) thresholds, and retention analysis to separate signal from noise. This discipline ensures that every iteration—whether it’s a small UX nudge or a bold bet—moves us closer to measurable value for customers and the business.

    None of this works without empowered product teams. I build around product trios that partner tightly across design, engineering, and product, and I foster a product-led growth mindset so we earn activation, engagement, and expansion through the experience itself. The goal is to create a system where learning is fast, ownership is clear, and the user’s job-to-be-done stays front and center.

    On the tooling side, I favor a unified analytics platform so insights are consistent from discovery to deployment. Whether I’m instrumenting funnels with Amplitude analytics or stitching together qualitative and quantitative inputs, the principle is the same: give teams trustworthy, real-time visibility so they can make better decisions, faster.

    If you’re looking to operationalize these practices, you’ll find practical playbooks, decision frameworks, and real-world examples here—built for leaders who want clarity, speed, and confidence in how they discover, ship, and scale products.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Multi‑Agent Systems Demystified: Why One AI Isn’t Enough—and How I Ship Faster With Many

    Multi‑Agent Systems Demystified: Why One AI Isn’t Enough—and How I Ship Faster With Many

    In my day-to-day building AI products, I’ve learned a simple truth: a single model can be brilliant, but a coordinated team of specialized agents is what consistently ships outcomes customers trust. That’s the promise of multi-agent systems—multiple AIs with distinct roles collaborating inside robust AI workflows to deliver accuracy, speed, and resilience you can’t get from a lone model.

    Think of a multi-agent system as a well-run product trio for machines: a planner decomposes the job, specialists execute focused tasks, a reviewer checks quality, and an orchestrator keeps everyone aligned. This agentic AI approach mirrors how high-performing teams work—divide complex problems, play to strengths, and create tight feedback loops.

    When does one AI stop being enough? Whenever tasks require tool use, domain retrieval, multi-step reasoning, or policy adherence under real-world constraints. In those moments, specialized agents shine—one for search using a retrieval-first pipeline, another for reasoning, another for action execution, and a final one for validation. The result is better accuracy with manageable latency and cost.

    The core architecture I rely on starts with a planner that breaks a goal into steps, followed by execution agents equipped with tools and grounded context. I pair this with context window management to keep prompts lean and relevant, and I insert a verifier (or critic) to catch logic slips and policy violations before results reach customers. A lightweight orchestrator coordinates handoffs and retries to keep the whole flow resilient.

    To make this production-grade, I treat observability as non-negotiable. Agent Analytics helps me see which agents are adding value versus adding latency, where failures cluster, and how prompts drift over time. From there, eval-driven development gives me measurable confidence: I codify representative tasks, run offline and shadow evaluations, and only promote changes that move accuracy and safety in the right direction.

    Governance is equally critical. I design privacy-by-design from the start, restrict data movement with strong data governance, and enforce policy constraints inside the workflow rather than after the fact. This includes red-teaming failure modes, rate-limiting tools, and capturing immutable traces for audits and post-incident reviews—habits borrowed from SRE culture that map well to AI systems.

    On the practical side, prompt engineering remains foundational, but it’s the system design that converts clever prompts into reliable outcomes. Tool access, retrieval quality, memory strategy, and error handling matter more than wordsmithing alone. I’ve found that small prompt improvements are amplified when the surrounding workflow is sound—and are overwhelmed when it isn’t.

    If you’re just starting, begin with a narrow use case and a minimal set of agents—planner, executor, and verifier—then expand. Use continuous discovery with real users to learn where the workflow fails in the wild, and iterate with tight release cycles. Treat every agent like a microservice with clear contracts, test coverage, and metrics, and you’ll unlock compounding gains without losing control.

    The payoff is tangible: faster shipping cycles, fewer regressions, and outcomes customers can actually rely on. When stakes are high and ambiguity is real, one AI is often a talented soloist—but a disciplined ensemble of agents is how I deliver dependable, scalable value at product velocity.


    Inspired by this post on Product School.


    Book a consult png image
  • Why “Figma Is Not the Source of Truth”: My Playbook for Design Leadership That Scales

    Why “Figma Is Not the Source of Truth”: My Playbook for Design Leadership That Scales

    I keep a simple mantra front and center: Figma is not the source of truth. The customer is. In practice, that means the only thing that truly counts is what we ship, how it performs, and whether users come back for more. Mockups are hypotheses; production usage is evidence. When my teams adopt this lens, velocity improves, judgment sharpens, and quality rises where it matters most.

    So what does design actually do in a software company? At its best, design builds leverage for the whole system—engineering, product, and marketing—by clarifying problems, raising the quality bar, and making complex decisions legible. The standard I hold is ancient and still essential: products must be useful, usable, and desirable — and above all, used. When we calibrate around “used,” debates about pixels give way to outcomes, and cross-functional partners feel the difference.

    I often trace the roots of our craft back well beyond the digital era. The lineage from industrial design to software is real; constraints, ergonomics, affordances, and systems thinking didn’t start with screens. If you’ve ever mapped delight, performance, and reliability in a Kano Model, you’ve touched this lineage. The translation to software is simple: design the full journey, not just the interface—prioritize what improves time-to-value, reduces cognitive load, and earns habitual use.

    One lesson I’ve learned the hard way: why design leaders who stop designing stop leading. I still sketch flows, write UX copy, and prototype when it unblocks the team or sets a decisive quality bar. The altitude changes constantly—one hour I’m in a strategic roadmap review, the next I’m in a critique or poking at a prototype. Great design leaders jump up and down in altitude to connect vision to details without becoming a bottleneck.

    Over time, I’ve come to rely on four pillars every design manager must master: craft (raising taste and execution), product strategy (clarifying choices and trade-offs), people leadership (coaching, feedback, and hiring), and systems (processes, rituals, and design ops that scale). Neglect any one of these and either quality, speed, or team health will eventually falter.

    Perfectionism is a double-edged sword. Over-indexing on quality can paralyze decision-making, but lowering the bar indiscriminately is worse. I’ve seen moments where relaxing standards to “go faster” actually cost the business—rework piled up, trust eroded, and customer value stalled. The answer is principled delegation: I define what “must be true” at each milestone, delegate ownership with clear guardrails, and reserve my veto power for moments where product integrity is genuinely at risk.

    Measuring success as a design leader starts with outcomes vs output OKRs. I care about activation, retention, time-to-first-value, NPS verbatims tied to key journeys, and the operational metrics that earn the right to build the next thing. Design output is visible; design outcomes are durable. When trade-offs are needed, I optimize for the smallest shippable surface that still proves the core value proposition, then expand with data.

    Scaling judgment is the multiplier. I build it through pattern matching—studying enduring product systems from companies like Airbnb, Amazon, Apple, Asana, Notion, Stripe, Nest, and others—to distinguish where polish compels usage versus where it’s ornamental. Strong opinions matter, but so does being easy to convince with new evidence. I encourage designers to articulate the pattern they’re invoking, why it fits the job-to-be-done, and how we’ll know it worked.

    Operating cadence matters. My week is anchored around recruiting, crits, and staff meetings that actually make decisions. In critiques, I use the Do/Try/Consider framework to give actionable direction without micromanaging. On one-on-ones, the question isn’t “Should one-on-ones exist?” but “What are they for right now?”—coaching, performance, or clearing execution blockers. If a meeting doesn’t increase clarity or commitment, it gets redesigned or removed.

    Execution-wise, I’ve taken inspiration from Rippling’s operating system—especially its emphasis on speed, precise ownership, and hard commitments. The lesson is timeless: go fast on the right things, make clear promises, and instrument your work so you can see reality quickly. When speed is paired with crisp decision rights and observable outcomes, momentum compounds rather than frays trust.

    Hiring your first design leader? Look for someone who can set standards, scale judgment, and ship. They should be able to zoom from company narrative to interaction copy in a single afternoon, coach product trios, and build rituals that make taste and trade-offs explicit. Above all, they should have a point of view on where quality moves the business and where speed is the quality.

    Here’s how my team’s approach differs from many: Figma is not the source of truth. We design in Figma, but we learn from production. We pair designers with engineering early, prototype in code when it reduces risk, and wire telemetry into every critical path. Product trios use discovery to validate “useful, usable, desirable — and used,” then commit to outcomes with clear, testable definitions of success. The result is faster iteration, fewer surprises, and experiences customers actually adopt.

    If you want to deepen your own pattern library, study products and practices from leaders like Airbnb (https://www.airbnb.com/), Amazon (https://www.amazon.com/), Apple (https://www.apple.com/), Asana (https://www.asana.com/), CrossFit (https://www.crossfit.com/), Figma (https://www.figma.com/), Honeywell (https://www.honeywell.com/), Nest (https://store.google.com/category/google_nest), Notion (https://www.notion.so/), Retool (https://retool.com/), Rippling (https://www.rippling.com/), and Stripe (https://www.stripe.com/). Pay attention to how they balance versatility with clarity, defaults with flexibility, and speed with trust.

    The throughline is simple and demanding: design for reality, not for the board. Keep your standards where they create business value, scale judgment with explicit patterns, and instrument everything so learning never stops. When teams embrace that, the work gets better, customers feel it, and the roadmap starts to pull you forward.


    Book a consult png image
  • Context Engineering Playbook: 5 Proven Ways to Slash Context Rot and Scale Smarter AI

    Context Engineering Playbook: 5 Proven Ways to Slash Context Rot and Scale Smarter AI

    I've been getting a lot of questions about why I'm diving so deep into Claude Code, so I want to take a step back and provide some context.

    Last March, when I started building my first AI product—the Interview Coach—I felt like I had to figure it all out on my own. I had never built an AI product before, and I didn't have a team I could lean on. It was equal parts energizing and intimidating.

    I had a blast digging in, experimenting, and learning what I needed to learn to ship that first AI product. But I also started to wonder, "How are product teams going to learn this stuff?"

    As an industry, we are being asked to leverage a new technology that is foreign to us. We are all experimenting and learning what's just now possible. It's moving so fast, it's exhausting just following the news, let alone trying to learn and develop new skills.

    My mission has always been to help teams make better product decisions. That still drives me today.

    After releasing the Interview Coach, I asked myself two questions: "How am I going to rapidly develop my skill set?" and "How can I help others do the same?" I landed on a three-part plan: First, I'm going to collect and share stories about how other teams are learning and building AI products—that's why I launched Just Now Possible. Second, I'm going to push the boundaries on how I can use AI in my day-to-day life, and I'm going to write about it. Third, I'm going to keep building AI products—and I'm going to write about that, too.

    The Claude Code series was born out of number two. It’s had an interesting side effect: it’s also helping me build better AI products.

    The more I push the boundaries of what's possible with Claude Code, the more I understand how to build more robust AI products. That’s reinforced my belief that product teams need to get hands-on with this stuff in their day-to-day lives. It’s how we’re going to develop the skillsets we need to build tomorrow’s products.

    In my context rot article—where we learned how to manage the context window in Claude Code—I showed just how much day-to-day practice compounds. Today, I want to show how learning about context window management in our day-to-day lives directly maps to managing the context window in the AI products we might build. My hope is to make it crystal clear how experience in one area develops expertise in the other. Let’s dive in.

    Infographic titled What is Context Engineering? visualizing a context window with arrows and five strategies: compact prompts, external memory, curating turns, repeating info, and sub-agents.
    Discover how product teams engineer context in generative AI: compact prompts, curated turns, external memory, repetition, and sub-agents, all feeding a shared context window to deliver clearer, faster outcomes.

    A quick refresher on context window management. In the context rot article, we learned: "what the context window is and what goes into it"; "how to offload conversational context to the file system"; "about the /compact and /clear tools"; "to repeat critical information as the context window fills up to overcome tokens "lost in the middle" or at the beginning of the input"; and "how to use agents to get access to more context windows."

    It turns out these exact same skills are being used by developers to manage the context window in production products. If you haven't read the context rot article, start there: "Context Rot: Why AI Gets Worse the Longer You Talk (And How to Fix It)."

    What is Context Engineering? Context engineering is the work that we do to manage the context window in the AI products and services that we build. It's how we give the large language model the context it needs to do the job well. It's also how we manage and mitigate context rot in our product and services, so that we can get the highest performance from the underlying model.

    Today, we are going to look at five different strategies that product teams are currently using in their context engineering efforts. You are going to see that each of these strategies ties back to a strategy you might already be using in your day-to-day AI usage (especially if you followed the advice in the context rot article).

    Here's how product teams are putting this into practice right now: designing compact system prompts by breaking big tasks into smaller tasks; building external memory/state structures to keep the context window clean; curating what goes into each turn; repeating critical information as context grows; and using sub-agents to grow the context window.

    I'll connect each tactic back to patterns you're likely already using in your daily AI workflows, especially if you followed the advice in the context rot article. Along the way, I’ll share practical guardrails and instrumentation ideas so you can track quality with eval-driven development, reduce context rot, and scale performance predictably.

    Why this matters for product trios: these strategies clarify the handoffs between prompt engineering, external memory design, and orchestration, which strengthens collaboration across PM, design, and engineering. Whether you’re exploring gen ai prototypes, hardening a retrieval-first pipeline, or evolving toward agentic AI, context engineering is the backbone of reliable, high-performing experiences.

    If you build or lead LLMs for product managers initiatives, consider this your field guide. In upcoming posts, I’ll break down each strategy with concrete examples and templates you can adapt to your stack, so your team can move from experiments to durable, scalable AI workflows with confidence.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Reinventing Product Management Workflow: The AI Upgrade I Use to Ship Faster, Smarter

    Reinventing Product Management Workflow: The AI Upgrade I Use to Ship Faster, Smarter

    The most valuable upgrade I’ve made to my product management workflow isn’t a new framework or a shiny dashboard—it’s an AI-first operating model that compresses discovery-to-delivery cycles while increasing confidence in every decision. I built this approach to reduce context switching, remove toil, and keep the team relentlessly focused on outcomes over output. The result is a faster, clearer, and more reliable path from insight to shipped value.

    Here’s how I run an AI-powered product workflow end to end: continuous discovery, opportunity sizing, solution shaping, planning, execution, and iteration—each step instrumented with automation, retrieval, and evaluation so we learn faster without compromising rigor.

    Intake and triage start with a retrieval-first pipeline that unifies customer feedback, support tickets, sales notes, research transcripts, and usage analytics. I use embeddings to cluster themes, de-duplicate signals, and surface the most representative examples. This gives me an instant, always-fresh view of customer jobs, pains, and opportunities without manually combing through noise.

    For discovery, I rely on “LLMs for product managers” to accelerate the hard parts without replacing judgment. I generate interview guides, summarize transcripts, extract entities, and tag moments of friction. Prompt engineering and context window management ensure the model sees the right evidence at the right time. I keep all sensitive data governed by privacy-by-design and data governance controls.

    Opportunity sizing is where I connect insights to business impact. I map problems to a driver tree, quantify potential lift, and align to outcomes vs output OKRs. When relevant, I apply the Kano Model to balance performance, basic, and excitement attributes. To maintain rigor, I use eval-driven development on my prompts and heuristics so prioritization is repeatable, not anecdotal.

    Solution shaping is a collaborative exercise with product trios. I draft problem narratives and PRDs, generate acceptance criteria, and create first-pass UX flows. For speed, I use gen ai for product prototyping to explore alternatives quickly, then gate final choices through usability feedback and feasibility checks. Where uncertainty is high, I define a minimum detectable effect (MDE) and design A/B testing plans upfront.

    Planning ties strategy to execution through product roadmapping and sprint planning. I break work into sequenced bets, enable feature flags for controlled exposure, and wire quality signals into CI/CD. DORA metrics—like deployment frequency and change failure rate—help me keep the system honest. Observability ensures we see the “why” behind behavior, not just the “what.”

    Execution is instrumented with in-app guides, Intercom messaging, and Pendo to shape onboarding and activation. I connect Amplitude analytics to measure habit formation, retention analysis, and feature adoption. When experiments run, I monitor leading indicators in near real time while protecting against peeking and p-hacking. The point isn’t to prove we’re right; it’s to learn fast enough to get right.

    Iteration closes the loop. I use a unified analytics platform to compare expected vs actual outcomes, harvest qualitative feedback, and push new evidence back into discovery. The system improves with each cycle because the retrieval-first pipeline and eval harness both get smarter as data grows.

    Governance is non-negotiable. AI risk management, cybersecurity, and regulatory compliance sit alongside model evaluations to prevent drift, leakage, or bias. I document decisions, model versions, and test artifacts so we can audit how we got to a call—especially when trade-offs are nuanced.

    If you’re standing up this AI workflow from scratch, I recommend a 30/60/90 rollout. In the first 30 days, audit your data sources and build a retrieval-first pipeline. In days 31–60, pilot two high-leverage workflows—continuous discovery and PRD drafting—backed by eval-driven development. By days 61–90, scale to prioritization and experiment design, then thread the outputs into your planning and CI/CD rhythms.

    Common pitfalls I watch for: over-automation that blurs context, lack of evaluation frameworks, ungoverned data that undermines trust, and vanity metrics that celebrate activity over outcomes. The antidote is simple but disciplined—clear decision criteria, measurable hypotheses, and automated evaluations that run as guardrails, not bottlenecks.

    This AI upgrade doesn’t replace the craft of product management; it amplifies it. By combining judgment, clear strategy, and reliable automation, we ship value faster, reduce risk, and make better calls under uncertainty. The payoff is durable: compounding learning velocity and a team that spends more time solving the right problems—and less time wrestling the process.


    Inspired by this post on Product School.


    Book a consult png image
  • From Chaos to Clarity with Claude Code: My Hands-On Playbook for Product Leaders

    From Chaos to Clarity with Claude Code: My Hands-On Playbook for Product Leaders

    I’ve been pushing hard to operationalize AI for real product work, and this episode zeroes in on the moment Claude Code stops feeling like a demo and starts behaving like a dependable teammate. If you’ve ever wondered how to go from clever prompts in the browser to durable, repeatable workflows on your machine, this walkthrough is for you.

    Listen on: Spotify | Apple Podcasts.

    My first honest reaction to installing and configuring the desktop agent was the all-too-relatable “this tool thinks everything is a code repo” reality. That framing helped me reset expectations fast: instead of treating it like a magical universal assistant, I began designing guardrails, context, and repeatable routines—exactly how I’d onboard a new team member.

    The shift from Claude-in-the-browser to Claude Code on my machine was the unlock. Locally, it can finally work with my files, folders, and workflows. That meant I could ground it in real artifacts—project docs, meeting notes, product specs, and historical decisions—so responses weren’t just plausible; they were contextual and verifiable.

    On setup, I now treat /init and Claude MD files as my product requirements. I define roles, boundaries, and canonical sources up front, then run in a deliberate “walled garden.” The “treat it like an intern” model works beautifully: scope access intentionally, expand privileges as trust grows, and keep a tight audit trail of what it can touch and why.

    Surprisingly, task management became my ideal on-ramp. It’s easy to validate, the feedback loops are tight, and the ROI is immediate. I export calendar windows rather than granting full calendar access, then let the agent map priorities into Trello, reconcile time blocks, and surface trade-offs. Fast wins build confidence—mine and the agent’s.

    Model switching matters more than I expected. When speed is king and “good enough” will do, Haiku keeps the loop snappy. When stakes are higher—complex synthesis, nuanced product strategy, or gnarly ambiguity—I step up to Claude Opus 4.5. Being intentional about when to optimize for latency versus depth is a quiet superpower.

    Web tasks can still spiral. When that happens, I pause its autonomy, toggle to fewer steps, and ask, “What are you doing?” Paired with Claude’s Web fetch tool, this makes the agent explain its chain-of-thought planning without exposing hidden reasoning, so I can spot brittle assumptions, prune distractions, and re-ground the task.

    Content retrieval has become a killer workflow. I point the agent at my archives—blog posts, book drafts, transcripts, notes—and ask, “Where have I talked about this before?” It assembles a map of prior art, connects themes I’d forgotten, and prevents me from reinventing work. Over time, this evolves into a Zettelkasten-style research system that upgrades rigor and accelerates synthesis.

    I’ve also turned Claude Code into a publishing engine. From a single transcript, it drafts titles, descriptions, show notes, and chapters, then routes artifacts to Ghost for formatting. Before anything ships, I run fact-checking workflows that validate claims against transcripts and research sources. The output improves, but more importantly, the scaffolding makes quality repeatable.

    Reusable workflows compound. I rely on slash commands to trigger common jobs, break down larger efforts with sub-agents, and wire in hooks and plugins where external systems are needed. This is agentic AI at its most practical: fewer hero prompts, more reliable processes.

    Audience analytics and content prioritization are helpful with caveats. I let the agent cluster themes and flag gaps, then I pressure-test its suggestions against first-party data and strategic goals. As with any model-driven insight, triangulation beats blind faith.

    Two metaphors guide my day-to-day. First, Claude Code is like a dog—sometimes it returns with the stick, sometimes it gets lost in the woods. Second, the “intern” framing keeps me honest: don’t hand it the whole company on day one. With that mindset, my output jumped—more volume without sacrificing quality—because the workflow scaffolding got better.

    In this episode, I cover what Claude Code is and why it’s useful even if you’re not an engineer, the real difference between the browser experience and running locally, how to shape behavior with /init and Claude MD files, why task management is the perfect proving ground, when to export calendar windows versus connecting directly, and when model-switching makes sense—Haiku for speed, Opus for depth.

    I also dig into debugging web tasks by asking “What are you doing?”, content retrieval workflows across personal archives, building reusable slash-command systems with sub-agents, hooks, and plugins, practical publishing stacks from transcripts, fact-checking against transcripts and research sources, and using analytics to prioritize content—with a healthy respect for uncertainty.

    If you’ve been trying to make Claude Code feel less like “throwing a stick into the woods,” this is the candid, tactical tour I wish I’d had on day one. Drop your questions and experiments below—I’m eager to compare notes and refine the playbook together.


    Inspired by this post on Product Talk.


    Book a consult png image