Tag: product discovery

  • Master Build-to-Learn: The Essential FAQ to Supercharge Product Discovery in the AI Era

    Master Build-to-Learn: The Essential FAQ to Supercharge Product Discovery in the AI Era

    In the age of AI, I’ve come to believe we’re all builders—yet not all building is the same. There is a very meaningful difference between building to learn (known as product discovery) versus building to earn (known as product delivery). When we confuse the two, we waste precious time, budget, and team energy on output over outcomes. My goal in this FAQ-style reflection is to clarify when and how to choose each mode so we can make smarter, faster, more confident product decisions.

    Why does this distinction matter so much right now? Because as the cost of product delivery continues to drop, the scarce resource shifts from shipping capacity to clarity of problem, solution, and value. Cloud infrastructure, CI/CD, feature flags, and even gen AI code assistance have made it cheaper to launch. That’s great—but if we don’t learn the right things before we scale, we’ll efficiently deliver the wrong product. Discovery is how we de-risk that.

    What do I mean by build to learn? I use discovery to quickly validate problems, test value, and shape solutions before committing delivery teams to scale. In practice, that means continuous discovery with customer interviews, rapid prototyping, and lightweight experiments that put us in front of real users fast. I rely on product trios and empowered product teams to co-own outcomes, not just output, and I anchor decisions with outcomes vs output OKRs so we stay focused on measurable impact.

    How do I structure discovery sprints? I start with an opportunity solution tree to map customer pain points and candidate solutions, then select the smallest test that can invalidate a risky assumption. When signals are ambiguous, I refine the questions and instrument better learning loops rather than pushing harder on delivery. For experiments, I keep a bias to speed: clickable prototypes, concierge tests, or gen ai for product prototyping often reveal more in days than a coded MVP does in weeks. When experiments go live, I use a clear minimum detectable effect (MDE) and resist reading noise as signal.

    Where does AI change the calculus? LLMs for product managers are turbocharging discovery by accelerating research synthesis, persona drafts, and early concept validation. I pair that with eval-driven development to set crisp acceptance criteria for AI behaviors before any production integration. Prompt engineering and conversation design are part of the toolkit, but the same rule applies: prototype to learn, not to impress. AI can make bad ideas cheaper to build—so disciplined discovery matters more than ever.

    So when do I switch to build to earn? Once I have evidence of value and feasibility, I shift into product delivery to scale with quality, security, and reliability. This is where I bring in product roadmapping and sprint planning, DORA metrics to monitor deployment frequency and lead time, and strong SRE and observability practices to safeguard the user experience. The handoff isn’t a wall; discovery continues inside delivery to refine scope, reduce risk, and maintain momentum.

    What pitfalls do I watch for? The biggest is treating delivery as discovery—shipping features to “see what happens” without a clear learning thesis. Another is tech-first decisions driven by technology FOMO instead of product strategy and customer value. I also see teams set output-based commitments that crowd out learning; outcomes vs output OKRs keep us honest. And when considering build vs buy, I evaluate whether the capability differentiates us; if not, I’ll buy to preserve discovery capacity on what truly matters.

    My operating conviction is simple: invest early and deliberately in build to learn so build to earn becomes high-confidence, high-velocity, and high-impact. In practical terms, that means smaller bets, faster feedback, clearer outcomes, and tighter collaboration across product, design, and engineering. If we get discovery right, delivery feels inevitable—and customers feel understood.


    Inspired by this post on SVPG.


    Book a consult png image
  • AI Agents That Truly Help Product Teams: A Practical Framework for When—and When Not—to Use Them

    AI Agents That Truly Help Product Teams: A Practical Framework for When—and When Not—to Use Them

    Every week, I field the same question from product leaders and engineers: should we deploy an AI agent here, or are we overfitting the problem to a shiny solution? Learn when AI Agents actually help product teams—plus a simple framework to decide when not to use them.

    When I say “AI agents,” I’m talking about autonomous or semi-autonomous systems that can perceive context, plan steps, and take actions across tools and data sources with minimal supervision—what many now call agentic AI. In product management terms, they’re not just another feature; they’re an operating model shift. Used well, they compound team leverage. Used poorly, they add invisible complexity, new failure modes, and governance headaches.

    To make the call with confidence, I use a straightforward VITAL framework that my team can apply in minutes. It keeps us honest about where AI agents are a force multiplier—and where a simpler automation, rule, or in-product UX is the better choice.

    V is for Volume. Agents shine where there’s sustained, repetitive, high-throughput work: triaging inbound support, cleansing CRM records, orchestrating QA checks, or synthesizing weekly research summaries. If the workflow happens rarely or ad hoc, an agent is often overhead in disguise.

    I is for Instructions. Can I specify success in clear, testable terms? Strong instructions include measurable acceptance criteria and constraints. If I can’t articulate what “good” looks like without hand-waving, the task likely needs product discovery, not autonomy.

    T is for Tolerance. What is the blast radius if the agent makes a wrong call? Low-stakes, reversible actions with tight guardrails are ideal. If the tolerance for error is near zero (e.g., irreversible financial transactions or sensitive regulatory actions), favor human-in-the-loop, stronger approvals, or defer agents entirely.

    A is for Access. The agent needs the right data, tools, and permissions, with privacy-by-design and data governance in place. If telemetry is sparse, integrations are brittle, or you can’t enforce least-privilege access, you’ll fight fragility more than you’ll gain leverage.

    L is for Learning loop. Agents require eval-driven development, Agent Analytics, and continuous feedback to stay accurate as reality shifts. If you can’t measure quality, latency, and cost per outcome—or you lack a retrieval-first pipeline to ground responses—expect drift and stakeholder distrust.

    Now, the counterweight. Don’t use agents when the problem is novel or strategically ambiguous and you still need exploratory research; when outcomes are unmeasurable or subjective without heavy context; when stakes are high and the acceptable error rate is effectively zero; when data is siloed, stale, or legally constrained; when the work is one-off or low-volume; or when your team can’t commit to instrumentation, evaluations, and ongoing maintenance. In these cases, a simpler rules engine, a clearer UX, or a well-defined workflow usually beats agentic complexity.

    Here’s how this plays out in practice. We’ve seen agents materially improve customer support triage (categorization, priority, and next-best-action suggestions), CRM hygiene (deduplication, enrichment, and routing), and release QA (regression check orchestration with human sign-off). Conversely, we avoid agents for nuanced pricing decisions, sensitive risk scoring without robust datasets, or any workflow where “explainability” and auditability trump speed.

    Operationalizing agents is a product problem before it’s an ML problem. Start narrow with a retrieval-first pipeline and rigorous prompt engineering, define success metrics upfront (quality, latency, cost per task), and run head-to-head evaluations against human baselines. Ship behind feature flags, monitor with Agent Analytics, and graduate from assisted to autonomous modes only after you’ve proven stability. Align this with product roadmapping and sprint planning so the work lands as durable capability, not a lab demo.

    Finally, be honest about build vs buy. If the workflow is a point of parity, consider buying and focusing your team on integration quality and governance. If it’s a potential source of competitive differentiation, invest in a modular architecture with clear context window management, strong observability, and a feedback loop tightly coupled to your empowered product teams.

    The bottom line: AI agents unlock leverage when there’s volume, clarity, tolerance, access, and a learning loop. If any of those pillars is missing, pause. Your best next move is likely better instrumentation, sharper problem framing, and continuous discovery—not more autonomy. That discipline is how product teams turn agentic AI from hype into habit.


    Inspired by this post on Product School.


    Book a consult png image
  • AI Experimentation Mastery: How I Test Faster, Tame Variability, and Ship with Confidence

    AI Experimentation Mastery: How I Test Faster, Tame Variability, and Ship with Confidence

    I’ve learned that the fastest path to durable AI impact is a disciplined experimentation engine: one that moves quickly, reduces ambiguity, and earns trust with evidence. My goal isn’t just to ship models—it’s to ship measurable outcomes with repeatable rigor.

    AI experimentation for product teams. Here’s how to test AI features, choose the right metrics, handle variability, and make data-driven decisions.

    I start every AI initiative by framing a clear decision: what must be true for this feature to be worth building, and how will we know quickly? From there, I map driver trees that connect user value to measurable signals, so every test clarifies both impact and risk, not just accuracy.

    Success criteria come next. I translate aspirations into testable thresholds, define leading and lagging indicators, and size tests with minimum detectable effect (MDE) so we don’t confuse noise for signal. This keeps us honest about sample sizes, power, and the real cost of waiting for certainty.

    Before I touch production traffic, I run eval-driven development. I curate golden datasets that reflect real user complexity, codify rubrics for correctness, safety, tone, and latency, and automate scoring so improvements are reproducible—not anecdotal. This gives the team a stable baseline to iterate prompts, tools, and policies with confidence.

    Model behavior is inherently stochastic, so I deliberately control variability. I document temperature, top-p, and seed strategies; I compare deterministic settings for regression checks versus sampled settings for user-facing creativity; and I test sensitivity across content lengths and edge cases. This reduces flakiness and prevents surprise regressions during CI/CD.

    When it’s time to learn from real users, I favor A/B testing with thoughtful guardrails. I run holdouts, cap exposure with feature flags, and protect core experience metrics like retention and time-to-value. For ranking and retrieval changes, I’ll use interleaving or switchback tests to isolate effects from seasonality and traffic mix.

    To handle LLM variability online, I aggregate outcomes over multiple prompts per cohort, use stratified bucketing to balance power users and new accounts, and track confidence intervals over time instead of snapshot p-values. This approach turns noisy model outputs into stable product signals.

    Instrumentation fuels everything. I rely on behavioral analytics to trace user intent, effort, and satisfaction across flows, and I wire up Amplitude analytics for event schemas, funnel drop-offs, and cohort comparisons. Clear event taxonomies and naming discipline make it trivial to separate model quality from UX friction.

    Risk is part of the work, so I bake in AI risk management early. I include toxicity and PII checks in my offline evals, monitor safety metrics in every A/B, and set rollback criteria tied to user harm and system costs. Privacy-by-design, audit logs, and runtime safeguards aren’t afterthoughts—they’re acceptance criteria.

    The operating cadence matters as much as the math. I run continuous discovery with customer interviews to keep the test queue grounded in real jobs-to-be-done, and I align product trios on hypotheses, success metrics, and stop-loss rules before launch. Weekly readouts keep decisions crisp, and post-ship learning cycles feed the next iteration.

    Finally, I invest in upskilling the team. We run internal workshops on LLMs for product managers, standardize experiment templates, and maintain a living playbook so new experiments start at 80% instead of 0%. The result: faster learning loops, safer bets, and more confident shipping.


    Inspired by this post on Product School.


    Book a consult png image
  • Pretotyping vs. Prototyping: How I Validate Ideas Fast and Build Products Customers Love

    Pretotyping vs. Prototyping: How I Validate Ideas Fast and Build Products Customers Love

    I learned early in my career that beautiful prototypes don’t save you when you’re solving the wrong problem. What does save you is separating market risk from solution risk and choosing the fastest, lowest-cost way to get evidence. That’s why I rely on pretotyping to test demand in days and prototyping to refine usability and feasibility once I see a strong signal. The result: faster cycles, fewer wasted sprints, and products customers genuinely want.

    Pretotyping vs. prototyping explained: differences, benefits, examples, and when to use each approach to validate ideas before you build.

    Here’s how I define the two in practice. Pretotyping answers, “Should we build this at all?” Its goal is to validate real user intent and behavior with the lightest-weight artifact possible—often before any code. Think painted-door (fake door) experiments, Wizard-of-Oz flows powered by humans behind the scenes, concierge tests, landing-page smoke tests with waitlists or preorders, and simple A/B testing to gauge click-through intent. It optimizes for time-to-signal and cost-to-learn.

    Prototyping answers, “Can we build this well?” and “How should it work?” Once demand is evidenced, I prototype to de-risk solution details: usability, architecture, performance, and integration. This might include interactive UI models, high-fidelity flows, technical spikes, or service stubs. Here, I optimize for learning about user experience and technical feasibility without fully committing to production.

    When should you use each? If your biggest unknown is market risk—whether customers care at all—start with pretotyping. If your biggest unknown is solution risk—how to deliver an experience that’s usable, reliable, and scalable—move to prototyping. In other words, validate the “right thing” before you perfect the “thing right.”

    My decision rule is simple: identify the dominant risk, then pick the smallest experiment that can credibly invalidate it. For market risk, I look for evidence of behavior, not opinions: clicks on a painted door, signups on a landing page, willingness to pay (deposits, preorders), or sustained repeat usage in a Wizard-of-Oz flow. For solution risk, I look for task completion, time-on-task, error rates, and qualitative friction from usability sessions with a realistic prototype.

    Concrete examples from recent work help illustrate the difference. When exploring a new analytics insight, I shipped a fake door inside our product nav; a simple tooltip explained the concept and captured interest. Click-through rate, conversion to a short explainer, and waitlist signups told me whether the value proposition resonated before building anything. For a complex AI-assisted workflow, I ran a Wizard-of-Oz experiment: users experienced the end-to-end flow while our team manually handled the “AI” behind the curtain. That gave us real engagement data and edge cases to inform the prototype and later the MVP.

    Metrics matter. I set a clear hypothesis with a guardrail on sample size and a minimum detectable effect I’d consider actionable. For pretotyping, I focus on time-to-first-signal, intent conversion (CTR to interest, interest to signup), cost-per-qualified-lead, and evidence of willingness to pay. For prototyping, I prioritize task success rates, usability severity findings, and qualitative insights that materially change the design or technical approach. Above all, I avoid vanity metrics and anchor decisions to outcomes, not output.

    My repeatable playbook looks like this: (1) Frame the problem and value proposition in one crisp sentence. (2) Choose the leanest pretotyping method that can reveal real behavior. (3) Define success metrics and a decision rule before you run the test. (4) Launch quickly, instrument well, and let the data run long enough to be credible. (5) If demand is strong, promote to a prototype to refine UX and de-risk technicals; if not, iterate the proposition or stop. This keeps product discovery continuous and ensures roadmapping and sprint planning are guided by evidence.

    There are ethical guardrails I never skip. Painted doors must set correct expectations once clicked; waitlists or learn-more pages are honest and respectful. For Wizard-of-Oz and concierge tests, I’m explicit about data handling and provide timely follow-up. Trust compounds when experiments are transparent and user time is valued.

    Tooling can accelerate the cycle without diluting rigor. I often use lightweight design systems and no-code automations to stitch together realistic flows, and I’ll leverage gen AI for product prototyping to generate copy, microinteractions, or data scaffolding. But the principle remains: don’t over-invest until evidence earns the investment. Empowered product teams thrive when they optimize for learning velocity, not feature velocity.

    If you’ve ever felt the tension between shipping fast and shipping right, this approach resolves it. Pretotype to prove the market; prototype to perfect the solution. Do that consistently and you’ll spend more time delivering outcomes customers value—and far less time debating outputs.


    Inspired by this post on Product School.


    Book a consult png image
  • The AI PM One-Pager: Radical prototyping requirements for speed, clarity, and truth

    The AI PM One-Pager: Radical prototyping requirements for speed, clarity, and truth

    I move fastest in Generative AI when I strip work down to its essential signals. At HighLevel, I rely on a single-page format—”Prototyping Requirements: The One-Pager for AI PMs”—to turn ideas into testable artifacts within hours, not weeks. This approach reinforces AI Strategy, minimizes coordination overhead, and keeps Product Management focused on learning over ceremony.

    “Prototyping requirements go rogue: one page, zero bureaucracy, built for AI. Shape concepts fast, prompt tools directly, and get to the truth sooner.”

    In practice, my one-pager captures only what’s required to run an immediate experiment: the user problem, the target behavior change, success signals, core constraints, intended AI workflows, and the smallest realistic path to an evaluable demo. I also include example prompts, guardrails, and evaluation criteria so the team can apply prompt engineering and LLMs for product managers without guessing.

    This is eval-driven development in action. I document a minimal hypothesis, concrete inputs/outputs, and a quick plan for metrics, including qualitative signals from product discovery and continuous discovery. By prompting tools directly, we expose assumptions early, shorten feedback loops, and build an AI product toolbox that compounds learning sprint after sprint.

    I run this with a product trio to ensure we balance feasibility, usability, and value. We align on risks, dependencies, and what “good” looks like, then we integrate the learnings into product roadmapping and sprint planning. The result: fewer meetings, tighter collaboration, and empowered product teams delivering sharper outcomes with less friction.

    If you want speed and clarity without sacrificing rigor, adopt the one-pager. It centers the conversation on evidence, accelerates AI workflows from prompt to prototype, and makes it obvious what to try next—and what to stop doing. Most importantly, it keeps the team focused on truth over theater, which is how great AI products actually ship.


    Inspired by this post on Product School.


    Book a consult png image
  • My Essential AI Toolbox for Product Managers: Tested Picks, Prompts, Workflows + Checklists

    My Essential AI Toolbox for Product Managers: Tested Picks, Prompts, Workflows + Checklists

    I created this practical guide to help product managers cut through the hype and apply AI where it genuinely moves the needle—faster discovery, clearer strategy, sharper execution, and measurable outcomes.

    A practical guide to AI tools for product managers: tested picks, what each tool is best for, copy-paste prompts, workflows, and screenshot checklists.

    Leading product management at HighLevel, I’ve pressure-tested dozens of gen AI solutions across product discovery, roadmap planning, delivery, and go-to-market. In this guide, I map an AI product toolbox to core PM jobs-to-be-done so you can move from experimentation to repeatable impact with confidence.

    Expect clear recommendations on where each tool excels—LLMs for product managers, research synthesis for customer interviews, behavioral analytics for opportunity sizing, and lightweight automation for in-app guides and product tours. I connect these tools to proven practices like continuous discovery, outcomes vs output OKRs, and product roadmapping and sprint planning so you can operationalize AI inside your existing workflows.

    I also share the evaluation criteria I use before rollout—AI Strategy alignment, data governance and privacy-by-design, AI risk management, observability, and total cost of ownership. This eval-driven development approach helps teams avoid technology FOMO while creating defensible, trustworthy workflows that scale.

    To accelerate adoption, I’ve included copy-paste prompts (including prompt engineering patterns for both chat and voice), retrieval-first pipeline blueprints to ground your models in product docs and decision logs, and conversation design tips for support and success use cases. You’ll see step-by-step AI workflows that tie directly to journey mapping, opportunity solution trees, and Kano Model trade-offs.

    Every workflow comes with screenshot checklists you can use for onboarding or stakeholder management, making it easy to align ICs and leaders on the same operating picture. Whether you’re optimizing A/B testing, retention analysis, or QBRs vs OKRs, these checklists turn good intentions into repeatable rituals.

    Use this guide as your field companion to ship faster with higher confidence—reducing cycle time, improving signal in discovery, and building momentum for product-led growth. If you’re ready to translate generative AI into reliable PM leverage, start with the workflows, adapt the prompts, and make them your own.


    Inspired by this post on Product School.


    Book a consult png image
  • Build to Learn vs. Build to Earn: My Proven Playbook for Outcomes Over Output in the AI Era

    Build to Learn vs. Build to Earn: My Proven Playbook for Outcomes Over Output in the AI Era

    Product teams rarely fail because they don’t ship enough features; they fail because they don’t learn fast enough. That’s the core tension I manage every day: when to build to learn and when to build to earn. Navigating that balance is how we protect focus, accelerate time-to-value, and ultimately deliver durable business impact.

    Over the years, I’ve seen at least two major ways to develop product: build to learn and build to earn. The first is discovery-led and evidence-seeking; the second is delivery-led and value-capturing. Both are essential. The real craft is knowing which mode to be in, when to switch, and how to keep stakeholders aligned around outcomes instead of output.

    The project model remains the default in many organizations—even in the age of AI—and it’s all about output. Stakeholders or executives assemble a prioritized roadmap of features and projects, and teams ship against it. This can create momentum, but without clear outcome metrics and customer validation, it’s easy to drift into a feature factory that looks productive while missing the mark on user value and business results.

    When I build to learn, I emphasize continuous discovery. That means using customer interviews to surface unmet needs, running lightweight prototypes to test desirability and usability, and deploying A/B testing to quantify impact. I map assumptions, risks, and opportunities with an opportunity solution tree, and I timebox experiments so we learn fast and cheap. The standard is evidence, not opinions—especially my own. The goal is simple: reduce uncertainty before we scale.

    When I build to earn, the objective shifts to capturing value with confidence. Here I align teams to outcomes vs output OKRs, commit to clear acceptance criteria, and ensure product roadmapping and sprint planning reflect the highest-leverage bets we validated in discovery. Delivery excellence matters: crisp definition, reliable release trains, observability, and a strong feedback loop to confirm we’re moving activation, conversion, or retention in the intended direction.

    Deciding when to transition from learning to earning is all about thresholds of evidence. I look for leading indicators that our solution reliably solves the target problem, shows a measurable lift in key behaviors, and can be delivered with acceptable risk. If we can’t articulate the expected outcome and how we’ll measure it, we’re not ready to scale. If we can, we invest, monitor impact, and keep guardrails in place to avoid scope drift.

    The operating model that makes this sustainable is simple and disciplined. I rely on empowered product teams organized as product trios (product, design, engineering) to run dual tracks of discovery and delivery. We socialize learning with stakeholders early and often to strengthen trust and stakeholder management. We elevate strategy by linking every roadmap item to a problem statement, a testable hypothesis, and a quantified outcome—no orphan features, no vanity launches.

    In the AI era, speed can tempt us back into shipping-by-idea. I use gen AI for product prototyping and insight synthesis, and I lean on LLMs for product managers to accelerate discovery work—without treating AI as a shortcut to validation. Our AI Strategy clarifies where AI augments discovery, where it powers the product, and how we evaluate risk, so we move faster without compromising rigor or ethics.

    My rule of thumb: spend just enough time building to learn to achieve conviction, then shift decisively to building to earn—while preserving a small discovery cadence to keep learning alive. This rhythm protects focus, compounds insight, and makes growth more predictable. It’s how we avoid the output trap, deliver meaningful outcomes, and create products that customers love and the business celebrates.


    Inspired by this post on SVPG.


    Book a consult png image
  • Product Work Is Relationship Work: How I Align Stakeholders Faster and Cut Team Politics

    Product Work Is Relationship Work: How I Align Stakeholders Faster and Cut Team Politics

    Lately, I keep hearing a familiar question: with AI making it so easy to generate ideas and build products, do we still need product managers? My answer is unequivocal—yes. Tools accelerate delivery, but they don’t build trust, reconcile competing incentives, or create the shared understanding teams need to ship outcomes. Product work is relationship work.

    I recently listened to “Product Work Is Relationship Work – All Things Product with Teresa & Petra,” and it echoed what I see every day in high-performing product organizations. If you prefer to watch, here’s the episode on YouTube: https://www.youtube.com/embed/d-0f8uAfc8w?feature=oembed

    Listen to this episode on: Spotify | Apple Podcasts

    While AI can help build things faster, it can’t replace the relationship work required to align stakeholders, navigate competing priorities, and create shared understanding across teams. That’s the hard, human part of product management—and it’s not going away.

    In my experience, product teams stall when collaboration becomes transactional. We jump to negotiation (“What can you commit by Friday?”) before establishing context (“What problem are we solving and why now?”). When I slow down to get curious—about constraints, incentives, and assumptions—momentum actually increases because we’re rowing in the same direction.

    Stakeholder alignment often breaks down when we conflate advocacy with exploration. We argue our viewpoint as if it were the only lens that matters, rather than making space to surface how others see the system. I’ve found the distinction between “dialogue vs. discussion,” rooted in work by Chris Argyris and elaborated in The Fifth Discipline by Peter Senge, to be a powerful reset. Dialogue builds shared understanding; discussion decides. You need both, in the right order.

    Language matters in the room. The improv principle “Yes, and” is deceptively simple but transformative. When a designer, engineer, or executive feels heard (“Yes”) and we build on their idea (“and”), we create psychological safety without sacrificing critical thinking. I use “Yes, and” to explore perspectives before we converge on decisions—especially with product trios and senior stakeholders.

    Here are the moves I rely on to keep collaboration relational and outcomes-focused. First, we align on outcomes before solutions. I explicitly separate outcomes vs output OKRs so we’re clear on what success looks like, independent of the features we ship. That clarity reduces rework and speeds up decision-making later.

    Second, we operationalize curiosity with continuous discovery. I schedule recurring, lightweight touchpoints with customers and internal stakeholders so insights compound. When learning is continuous, debates quiet down—evidence does the heavy lifting.

    Third, we invest in relationship rituals. Regular 1:1s with key partners, stakeholder maps that capture motivations, and pre-reads that frame trade-offs all prevent misalignment from surfacing in the last mile. These small habits pay huge dividends in trust and speed.

    Fourth, I’m explicit about mode-switching in meetings: are we advocating a position or exploring perspectives? Calling the mode out loud prevents people from mistaking questions for opposition and keeps the conversation productive.

    Fifth, we use “Yes, and” to move from possibility to practicality. We explore generously, then converge rigorously—ranking options by impact, effort, and risk so decisions are transparent and fair.

    If stakeholder alignment, team dynamics, or product “politics” slow your team down, this conversation offers a practical reframe. You’ll move faster when you build the relational tissue first—because alignment is an accelerant, not a tax.

    Resources & Links:

    Follow Teresa Torres: https://ProductTalk.org

    Follow Petra Wille: https://Petra-Wille.com

    Mentioned in this episode:

    Petra’s Coaching Packages

    Work by Chris Argyris on organizational learning and dialogue vs. discussion

    The Fifth Discipline: The Art and Practice of the Learning Organization by Peter Senge

    Improv principle “Yes, and”: Saying “Yes, and” — A principle for improv, business & life and Yes, and …

    Have thoughts on this episode or examples from your team? Leave a comment below—I’d love to learn what’s working (and what’s not) in your stakeholder landscape.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Commercial vs. Internal Products: Hard Truths, High Leverage, and How I Make the Call

    Commercial vs. Internal Products: Hard Truths, High Leverage, and How I Make the Call

    Internal Products Are Hard; Commercial Products Are Harder. That line captures years of hard-won lessons from leading both internal platforms and market-facing SaaS at HighLevel. I’ve seen how the two demand different muscles—even when the tech stack, talent, and timelines look the same on paper.

    When I talk about internal products, I mean services and solutions that our own employees use to take care of customers—customer-enabling tools and services, agent consoles, fulfillment and billing workflows, operations dashboards, and the underlying platforms that keep them fast, compliant, and resilient. These tools don’t generate revenue directly, but they quietly determine customer experience, gross margin, and how quickly we can ship, resolve issues, and scale.

    Commercial products, by contrast, add a second challenge layer. Beyond discovery, usability, and reliability, we must conquer positioning, pricing and packaging, competitive differentiation, sales enablement, procurement hurdles, and ongoing customer success motion. The surface area for failure is bigger, and the time-to-signal on product-market fit is slower and noisier.

    Here’s how I decide where to invest. First, I anchor on outcomes, not output. If the business priority is net revenue retention, faster onboarding, or reduced cost-to-serve, internal products often provide the highest-leverage path. If the priority is new revenue, new market entry, or a must-have differentiator, we lean commercial. I make the trade explicit in outcomes vs output OKRs so we can defend the decision when pressure mounts.

    Second, I run a clear build vs buy calculus. For internal needs, the default is buy if a mature, configurable solution exists that meets our security, data governance, and integration requirements. I only build when the workflow is core to our differentiation, the TCO of customization is lower than vendor sprawl, or we can capture unique proprietary advantage. For commercial products, I avoid embedding third-party IP in a way that caps differentiation or compresses margins as we scale.

    Third, I insist on continuous discovery. Internal audiences are not a captive market—they’re discerning experts with real jobs to do. I treat them like customers, with structured customer interviews, journey mapping, and opportunity solution trees. I rely on empowered product teams and product trios to validate problems and reduce solution risk before we commit engineering time.

    Fourth, I frame commercial vs internal work with capacity guardrails. In most planning cycles, I reserve explicit allocation for platform scalability and internal tooling, separate from feature bets. Without this, internal products become backlog filler, which guarantees we’ll pay the interest later in churn, SLA breaches, and slower delivery.

    Execution differs too. For internal products, change management is the make-or-break. I plan enablement as a first-class deliverable: clear rollouts, in-app guides, training, and feedback loops with frontline champions. I track adoption, time-to-resolution, error rate, and satisfaction for internal users with the same rigor we apply to external users.

    For commercial products, I design the discovery-to-GTM handshake early. Pricing and packaging must reflect value drivers discovered in research, not what’s easiest to meter. Sales and solutions engineering need crisp narratives, objection handling, and proof points. Customer success needs activation plans and health signals tied directly to leading indicators of retention.

    Across both, I instrument the product and process. I lean on feature flags and progressive delivery to manage risk, and I protect SLOs with error budgets so teams balance reliability with iteration speed. CI/CD isn’t a badge—it’s how we earn the right to ship continuously without eroding trust.

    Common pitfalls recur. Teams skip UX for employee tools because “they have to use it”—which backfires as shadow workflows and rework. Leaders underfund internal platforms, then wonder why velocity stalls. On the commercial side, teams over-index on features and under-invest in positioning and onboarding, leading to poor activation and elongated sales cycles.

    What’s the payoff? When we treat internal products as products, we unlock scale: shorter handling times, fewer escalations, clearer accountability, and higher customer satisfaction. When we approach commercial products with the same discovery rigor plus smart GTM, we compress time-to-value and amplify differentiation. The craft is knowing which lever to pull when—and having the discipline to measure what matters.

    My rule of thumb is simple. If the goal is operational excellence that compounds across the entire customer journey, invest in internal products with the same intensity you reserve for revenue-generating features. If the goal is market expansion or category leadership, invest in commercial products with a tight discovery-to-GTM loop. In either case, clarity of outcomes, disciplined discovery, and empowered teams win the day.


    Inspired by this post on SVPG.


    Book a consult png image
  • Beat AI FOMO: A Product Leader’s Playbook to Choose Tools, Stay Focused, and Learn Deeply

    Beat AI FOMO: A Product Leader’s Playbook to Choose Tools, Stay Focused, and Learn Deeply

    Lately, it feels like every morning brings a new AI launch, a dazzling demo, or a must-try tool. I love the pace of innovation, but the constant stream can trigger counterproductive FOMO if I’m not intentional. As a product leader, I’ve learned to turn that anxiety into a disciplined learning system—one that keeps me curious without letting novelty hijack my focus.

    That’s exactly why this conversation with Petra Wille and Teresa Torres resonated with me. They explore how to stay experimental in the AI era without chasing every shiny object. Their perspective aligns closely with my own operating cadence: start with real problems, go deep on a small set of tools, and create explicit boundaries between work, learning, and play.

    Listen to this episode on: Spotify | Apple Podcasts

    Here’s the mindset I apply. I don’t start with tools—I start with problems. When I encounter concrete friction in a workflow or see a credible opportunity to improve an outcome, that’s my trigger to explore a new capability. This mirrors the continuous discovery habit of prioritizing opportunities over solutions, and it’s how I avoid performing “innovation theater.”

    To keep exploration healthy, I time-box my learning. I block recurring windows specifically for experiments, reading, and hands-on trials so they don’t overrun my core product work. During these blocks, I’ll set a clear question, run a tight test, and capture what I learned. No rabbit holes, no endless tinkering.

    I also separate “interesting” from “actionable.” Plenty of inputs are worth awareness, but very few deserve immediate action. I bookmark the rest for later. This simple filter reduces cognitive load and keeps my backlog—from ideas to proofs of concept—well-governed.

    Social media can amplify technology hype cycles, so I establish boundaries. I batch consumption, mute low-signal channels, and prioritize practitioner communities over performative threads. The goal isn’t to be first; it’s to be right for my customers, my team, and our strategy.

    When choosing what to try next, I use a practical rubric. Does the tool target a real friction I’ve seen in discovery or delivery? Can it plug cleanly into our AI workflows without unsustainable glue work? Do we have a safe, compliant way to test it? Is there a plausible path from trial to compounding value? If the answer isn’t a confident yes to most of these, I wait.

    Depth beats breadth. I’d rather take one promising tool into a real use case, instrument it, and measure outcomes than skim ten trending demos. That tighter loop produces sharper intuition, clearer product bets, and better partner decisions. A quick opportunity solution tree helps me connect user pain to outcomes before I let any solution onto the field.

    In the episode, Petra Wille and Teresa Torres talk candidly about managing FOMO, deciding which tools to explore, and designing intentional learning systems. They discuss why starting with a problem is more valuable than starting with a tool, how social media amplifies technology FOMO, and why going deeper with fewer tools can lead to better learning. If you’ve ever felt like you’re falling behind because you haven’t tried the latest AI tool yet, this conversation will help you rethink how you approach learning and experimentation.

    If you’re curious about what came up, here are some of the tools and communities mentioned: Claude Code, OpenClaw (formerly Clawdbot, Moltbot), NotebookLM, Product Talk, ElevenLabs, Lenny’s Newsletter Community, and even a nod to Bridgerton for a touch of levity.

    My takeaway is simple but powerful: curiosity doesn’t require constant experimentation. The best product managers cultivate a balanced system—grounded in product discovery, energized by focused experiments, and protected by clear boundaries—so we can learn faster while staying pointed at outcomes that matter.

    Discussion Question: How do you decide which new tools or technologies are worth exploring—and which ones you can safely ignore?

    Resources & Links: Follow Teresa Torres: https://ProductTalk.org | Follow Petra Wille: https://Petra-Wille.com

    Full transcripts are only available for paid subscribers.

    Have thoughts on this episode? Leave a comment below.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Stop Misleading A/B Tests: Master Sample Size Assumptions for Reliable Results

    Stop Misleading A/B Tests: Master Sample Size Assumptions for Reliable Results

    I’ve learned the hard way that sample size calculators can be both empowering and deceptive. They feel wonderfully precise, but they’re only as trustworthy as the assumptions you feed them. When I lead A/B testing at scale, I treat the calculator as a planning tool, not a verdict—then I systematically validate the assumptions behind it so our decisions stay rigorous and our roadmap stays credible.

    At a minimum, most calculators assume you know your baseline rate, your “minimum detectable effect (MDE),” your desired statistical power, and your significance level. They also quietly assume independent observations, clean randomization, stable traffic quality, and a fixed test horizon with no peeking. If any of those break, the “right” sample size can be wildly wrong—and the test conclusions can nudge teams toward the wrong product or go-to-market bet.

    Baseline and variance come first for me. I estimate the baseline conversion (and volatility) from recent behavior using behavioral analytics, sanity-check it across key segments, and look for seasonality. Tools like Amplitude analytics help me spot anomalies, bots, or instrumentation drift. If baseline is unstable or highly skewed, I either stabilize it with longer lookbacks or narrow the target segment to reduce noise.

    Setting the “minimum detectable effect (MDE)” is where product strategy meets statistics. I work backward from an outcome that actually matters: the revenue, retention, or activation uplift that justifies the opportunity cost of building and running the experiment. If that effect size is implausible given historic lift and variance, I rethink the scope or stack changes into a sequenced set of learning experiments rather than overpromising a single moonshot.

    For power and alpha, I default to 80–90% power and a 5% significance level unless the downside risk of a false positive is unusually high, in which case I tighten alpha. I choose one-tailed tests only when we would not act on a negative result and we’ve explicitly pre-registered that decision; otherwise, two-tailed is safer for real-world ambiguity.

    Randomization and independence are where many tests quietly fail. I randomize at the user level (not session or pageview), guard against cross-device contamination, and ensure consistent exposure via feature flags. If there’s shared context—say, team-based usage or geographic clustering—I account for it via cluster randomization or acknowledge the inflated variance it can introduce.

    Traffic allocation integrity is non-negotiable. I monitor for sample ratio mismatch by comparing observed group splits to the intended allocation and immediately pause if they drift. When SRM appears, the root cause is often instrumentation gaps, eligibility filters applied asymmetrically, or caching layers. Fixing that early preserves trust in every test that follows.

    Fixed-horizon math assumes no peeking. If stakeholders need continuous reads, I use sequential testing methods with alpha spending or always-valid approaches designed for ongoing monitoring. If we commit to a fixed horizon, we stay disciplined: no early looks, no midstream metric swaps, no retrofitted hypotheses.

    Multiple comparisons can quietly inflate false positives. I predeclare one primary metric to decide, define guardrail metrics to protect experience and revenue, and apply appropriate corrections (for example, controlling the false discovery rate) when testing many variants or slicing results by numerous segments.

    Duration and seasonality matter more than most roadmaps admit. I run through full business cycles (at least one complete week for daily patterns, longer for B2B buying rhythms), plan for novelty effects, and watch for behavior settling after initial exposure. If the intervention changes long-run behavior, I extend the measurement window or add a post-test holdout to capture durable impact.

    Not all metrics are binomial. For revenue, time-on-task, or heavy-tailed distributions, I confirm variance assumptions, use robust estimators or bootstrapping, and consider variance reduction methods like CUPED to improve power without overextending duration. The calculator’s simplicity should not mask the data’s complexity.

    Finally, I connect experimentation to product outcomes. I map hypotheses to a driver tree, ensure each test ladders to activation, retention, or monetization, and document assumptions up front so we learn even when results are null. The result is a culture that respects the math and moves faster precisely because we trust our reads.

    Here’s the practical checklist I use before pressing “Start”: validate baseline and variance from recent behavior; set an MDE tied to meaningful business impact; choose power and alpha explicitly; confirm user-level randomization and stable exposure; watch for sample ratio mismatch; align on fixed-horizon vs sequential testing; predeclare a single primary metric and guardrails; run long enough to cover seasonality; use robust methods for non-binomial metrics; and write a brief pre-read so the whole team commits to the plan.

    When we honor these assumptions, sample size calculators become sharp instruments rather than blunt ones. You’ll ship fewer misleading wins, avoid costly false negatives, and build a repeatable experimentation engine that compounds learning—and results—over time.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Inside Banani: How a Canvas-First AI Designer Elevates UX and Accelerates Product Teams

    Inside Banani: How a Canvas-First AI Designer Elevates UX and Accelerates Product Teams

    I believe the future of product design isn’t about replacing designers—it’s about giving every team access to one. That’s why Banani grabbed my attention. It’s an AI product designer that doesn’t just generate code—it generates design. For solo founders, stretched design teams, and early-stage startups, that shift matters: it raises the design floor without lowering the creative ceiling.

    I spent time with Vlad Solomakha (CEO & Co-founder), Vova Kovalchuk (CTO & Co-founder), and Vlad Ostapovats (Founding Growth) to unpack how they took Banani from a Figma plugin proof-of-concept to a canvas-first AI design tool generating hundreds of thousands of designs per week. Vlad brings a decade of design experience and a precise north star: AI should produce beautiful, tasteful design rather than average, undifferentiated UI.

    The architectural choices stood out. They engineered their agent to handle parallel screen edits, manage per-screen context across canvases with hundreds of frames, and make surgical edits without regenerating entire screens. This is the kind of agentic AI work that product leaders have been waiting for: concrete advances in context window management, tool orchestration, and prompt engineering that translate into higher throughput without sacrificing quality.

    Equally important is how they addressed the "gulf of specification"—the mismatch between how designers think visually and how agents understand text. Banani’s canvas-first approach acknowledges that design is spatial, hierarchical, and iterative. Rather than forcing a chat-first UX, they center the canvas and let the agent do production work while keeping the designer firmly in control. In practice, this narrows intent ambiguity, speeds up iteration, and preserves taste.

    The team made another pivotal bet: Why Banani doesn’t compile running applications — just HTML/CSS mockups — and how that shapes everything. By decoupling the design artifact from runnable code, they optimize for velocity, taste, and exploration. In my experience, this separation is the right product strategy for early discovery and gen ai for product prototyping—move fast on aesthetics and flows, then converge on implementation once you’ve validated the direction.

    I also appreciated their pragmatic evaluation approach. Instead of traditional evals, they spin up 10 screens from one prompt to compare models. It’s hands-on, outcome-based, and aligned with eval-driven development in real product environments. They’re relentlessly discerning about when to work around model limitations versus when to wait for the models to improve—an essential discipline when building at the edge of what’s possible.

    Under the hood, context engineering and specialized agent tools do the heavy lifting. Per-screen history with shared project context enables precise, reversible changes across large canvases. The result: fewer destructive regenerations, more reliable design intent preservation, and a workflow that feels like collaborating with a strong mid-level designer who’s exceptionally fast and consistent.

    If you want a quick tour, I recommend jumping to a few highlights: 20:13 Product Tour Canvas First AI, 33:40 Gulf of Specification, 42:54 Agent Architecture Under Hood, 48:48 State History Context Tricks, and 56:04 Navigating Busy Canvases. Each segment reveals a different layer of the system design and product thinking behind Banani’s canvas-first UX.

    For product leaders, this is a compelling blueprint for raising the design floor while protecting the last mile of craft. It aligns with empowered product teams, continuous discovery, and LLMs for product managers who need leverage without losing judgment. If you’re exploring agentic AI in design, this is a thoughtful, execution-focused model worth studying and trialing on your next product tour or redesign.

    Resources worth exploring: Banani and TL Draw. To hear the full conversation, you can listen on Spotify or Apple Podcasts. Then, pressure-test the approach inside your own product development lifecycle and see how a canvas-first AI designer reshapes your team’s velocity and quality bar.


    Inspired by this post on Product Talk.


    Book a consult png image