Category: Product Management

  • Stop Misleading A/B Tests: Master Sample Size Assumptions for Reliable Results

    Stop Misleading A/B Tests: Master Sample Size Assumptions for Reliable Results

    I’ve learned the hard way that sample size calculators can be both empowering and deceptive. They feel wonderfully precise, but they’re only as trustworthy as the assumptions you feed them. When I lead A/B testing at scale, I treat the calculator as a planning tool, not a verdict—then I systematically validate the assumptions behind it so our decisions stay rigorous and our roadmap stays credible.

    At a minimum, most calculators assume you know your baseline rate, your “minimum detectable effect (MDE),” your desired statistical power, and your significance level. They also quietly assume independent observations, clean randomization, stable traffic quality, and a fixed test horizon with no peeking. If any of those break, the “right” sample size can be wildly wrong—and the test conclusions can nudge teams toward the wrong product or go-to-market bet.

    Baseline and variance come first for me. I estimate the baseline conversion (and volatility) from recent behavior using behavioral analytics, sanity-check it across key segments, and look for seasonality. Tools like Amplitude analytics help me spot anomalies, bots, or instrumentation drift. If baseline is unstable or highly skewed, I either stabilize it with longer lookbacks or narrow the target segment to reduce noise.

    Setting the “minimum detectable effect (MDE)” is where product strategy meets statistics. I work backward from an outcome that actually matters: the revenue, retention, or activation uplift that justifies the opportunity cost of building and running the experiment. If that effect size is implausible given historic lift and variance, I rethink the scope or stack changes into a sequenced set of learning experiments rather than overpromising a single moonshot.

    For power and alpha, I default to 80–90% power and a 5% significance level unless the downside risk of a false positive is unusually high, in which case I tighten alpha. I choose one-tailed tests only when we would not act on a negative result and we’ve explicitly pre-registered that decision; otherwise, two-tailed is safer for real-world ambiguity.

    Randomization and independence are where many tests quietly fail. I randomize at the user level (not session or pageview), guard against cross-device contamination, and ensure consistent exposure via feature flags. If there’s shared context—say, team-based usage or geographic clustering—I account for it via cluster randomization or acknowledge the inflated variance it can introduce.

    Traffic allocation integrity is non-negotiable. I monitor for sample ratio mismatch by comparing observed group splits to the intended allocation and immediately pause if they drift. When SRM appears, the root cause is often instrumentation gaps, eligibility filters applied asymmetrically, or caching layers. Fixing that early preserves trust in every test that follows.

    Fixed-horizon math assumes no peeking. If stakeholders need continuous reads, I use sequential testing methods with alpha spending or always-valid approaches designed for ongoing monitoring. If we commit to a fixed horizon, we stay disciplined: no early looks, no midstream metric swaps, no retrofitted hypotheses.

    Multiple comparisons can quietly inflate false positives. I predeclare one primary metric to decide, define guardrail metrics to protect experience and revenue, and apply appropriate corrections (for example, controlling the false discovery rate) when testing many variants or slicing results by numerous segments.

    Duration and seasonality matter more than most roadmaps admit. I run through full business cycles (at least one complete week for daily patterns, longer for B2B buying rhythms), plan for novelty effects, and watch for behavior settling after initial exposure. If the intervention changes long-run behavior, I extend the measurement window or add a post-test holdout to capture durable impact.

    Not all metrics are binomial. For revenue, time-on-task, or heavy-tailed distributions, I confirm variance assumptions, use robust estimators or bootstrapping, and consider variance reduction methods like CUPED to improve power without overextending duration. The calculator’s simplicity should not mask the data’s complexity.

    Finally, I connect experimentation to product outcomes. I map hypotheses to a driver tree, ensure each test ladders to activation, retention, or monetization, and document assumptions up front so we learn even when results are null. The result is a culture that respects the math and moves faster precisely because we trust our reads.

    Here’s the practical checklist I use before pressing “Start”: validate baseline and variance from recent behavior; set an MDE tied to meaningful business impact; choose power and alpha explicitly; confirm user-level randomization and stable exposure; watch for sample ratio mismatch; align on fixed-horizon vs sequential testing; predeclare a single primary metric and guardrails; run long enough to cover seasonality; use robust methods for non-binomial metrics; and write a brief pre-read so the whole team commits to the plan.

    When we honor these assumptions, sample size calculators become sharp instruments rather than blunt ones. You’ll ship fewer misleading wins, avoid costly false negatives, and build a repeatable experimentation engine that compounds learning—and results—over time.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Join Me in April: Build a Continuous Interviewing Habit and Unlock Real Customer Insights

    Join Me in April: Build a Continuous Interviewing Habit and Unlock Real Customer Insights

    “Continuous Discovery Habits” turns five this year, and I’m celebrating by reading it with our community—together, in practice, not just in theory. Each month, I’m publishing an in-depth reading guide with the chapters we’ll cover, a preview of the most important concepts, short videos you can share with your teams, individual and team discussion questions, practical exercises to apply what you read, and additional resources to go deeper.

    We’ll keep the conversation active in the comments each month and meet live once a quarter to compare notes, share what’s working, and troubleshoot what’s not. If you’re joining late, no problem—start with the current month or go back to January. You can also find all of the book club articles here.

    If you want to participate, grab a copy of the book (or dust off your old one), share the “Spread the Love” videos with colleagues, block time for the team exercises, and register for the community sessions. Let’s dive in together.

    This Month’s Reading

    Chapter: Chapter 5: Continuous Interviewing. Estimated reading time: ~37 minutes.

    This chapter grounds us in why interviewing on a regular cadence is critical to the success of any product trio; how cognitive biases affect what we learn from direct questions; the difference between research questions and interview questions; how to use story-based interviewing to uncover actual customer behavior (not ideal behavior); the interview snapshot, a one-page tool for synthesizing what you learned from a single interview; how to automate the recruiting process so interviewing becomes easier than not interviewing; and why product trios should interview customers together.

    Need a copy? Grab the book.

    Share the Love with Friends and Colleagues

    We learn best in community. To help your team rally around these practices, share these concise primers and invite them to join the book club discussion with you.

    What are customer interviews? – Build a competitive advantage that compounds over time.

    What should we ask in customer interviews? – Mitigating cognitive biases.

    Research questions vs. interview questions – And why the difference matters.

    Getting reliable feedback from customer interviews – Ask the right questions.

    Who should conduct customer interviews? – My answer might surprise you.

    How do you find customers to interview? – Automate the recruiting process.

    The Interview Snapshot – How to synthesize a single customer interview.

    Reflect and Discuss What You Read

    Reflection cements learning. This month, I’m challenging you—as I challenge my own teams—to build a weekly habit of interviewing customers and to shift from direct questions (which trigger bias) to collecting specific stories about past behavior. For many teams, this is a big mindset change: from infrequent “big research projects” to lightweight, continuous conversations that fuel daily decision-making.

    Individual Reflection: Think about your last customer interview or conversation. Did you rely on direct questions, or did you excavate a specific story about what happened? How might the answers have changed if you had used the other approach?

    Consider your own behavior—buying jeans, going to the gym, choosing what to watch on Netflix. Where do your ideal intentions differ from what you actually do? How might that same gap show up in your customers’ answers to direct questions?

    Scan your calendar from the past month. How many customer interviews did you conduct? If it’s fewer than four, what got in the way? What needs to change to make weekly interviewing sustainable?

    Team Discussion: As a team, discuss your current interview cadence. If you’re not interviewing at least weekly, name the biggest obstacle—recruiting, time, or synthesis—and commit to reducing one barrier this month.

    Try this together: Ask a teammate, “How does a product idea go from concept to launch at our company?” Have them write it down. Then ask for the last specific feature or improvement that launched and capture the story. Compare the two. What’s different? What does this reveal about the gap between ideal process and actual process?

    If you already interview regularly, ask: Who participates? Is it just one person (like the designer or product manager), or does the whole trio join? What value might you be missing by not having all three perspectives in the room?

    Put It Into Practice

    Understanding the “why” is easy; building the habit is the work. The following exercises are how my teams operationalize continuous interviewing week over week.

    Exercise: Conduct a Story-Based Interview (Time: 20–30 minutes. Do this with your product trio.) Schedule a conversation with a current customer. Instead of drafting a long script, identify a handful of research questions (what you need to learn) and translate them into one story-based interview question (what you’ll ask).

    For example, research questions might include: What challenges do customers face when onboarding? Where do they get stuck? What are we asking them to do that they don’t understand? How can we make it easier for them to get to the activation moment? The corresponding interview question could be: Tell me about the first time you used our product.

    During the interview, excavate the story with temporal prompts like “What happened first?”, “What happened next?”, and “What happened before that?” If the participant drifts into generalities (“I usually…” or “In general…”), gently bring them back to the specific instance.

    After the interview, debrief as a trio. What did each of you hear? Which opportunities surfaced? What surprised you? If you want personalized, detailed feedback on your technique, consider the Interview Coach available through the Story-Based Customer Interviews course.

    Exercise: Create Your First Interview Snapshot (Time: 30 minutes. Do this with your product trio immediately after the interview.) Using the interview snapshot template, capture a photo of the participant (or a visual that represents their story), quick facts about their context, a memorable quote you’ll still recall months from now, the opportunities (needs, pain points, desires) you heard, notable insights that aren’t yet opportunities, and an experience map that illustrates the story. Over time, aim to complete each snapshot in 15–20 minutes.

    Go Deeper: Additional Reading

    If you prefer audio, I’ve included an audio summary for paid subscribers that covers this month’s chapter plus the resources below.

    Related In-Depth Guides: Customer Interviews: How to Recruit, What to Ask, and How to Synthesize What You Learn.

    The Value of Continuous Interviewing: Why Product Trios Should Interview Customers Together – How interviewing together ensures research is timely, actionable, and believable.

    How to Find Customers to Talk To: Customer Recruiting: Get Easy Access to Customers Week Over Week – Practical strategies for automating your recruiting process. Ask Teresa: How Do You Select Customers for Customer Interviews? – Who to interview and how to recruit them. Tools of the Trade: Finding People to Interview Before You Have Customers – Recruiting strategies for early-stage products.

    What to Ask in Your Interviews: Why You Are Asking the Wrong Customer Interview Questions – Understanding the gap between ideal behavior and actual behavior. Story-Based Customer Interviews Uncover Much-Needed Context – Why collecting specific stories is more reliable than asking direct questions. Ask Teresa: What Are the Best Customer Interview Questions? – Common questions and how to improve them. Ask About the Past Rather than the Future – Why memories about recent instances are more reliable than speculation.

    How to Take Notes and Synthesize What You Are Learning: How to Take Notes During Customer Research Interviews – Practical tips for capturing what you hear. The Interview Snapshot: How to Synthesize and Share What You Learned from a Single Customer Interview – A comprehensive guide to creating and using interview snapshots. Customer Interview Analysis: How AI Helps and Hurts – Learn how to use AI effectively.

    Videos: All Things Product Podcast: Customer Interview Analysis – Petra and I discuss using AI to analyze customer interviews, the risks and benefits, and why your interviewing skills matter more than any AI tool.

    Other Resources from Around the Web: The Top 5 Mistakes Product Teams Make With Customer Interviews by Pragmatic Live. Continuous interviewing with Kristian Collin Berge (CEO & Co-founder at UX Signals) by Afonso Franco. How to Make Time for Customer Interviews & Validation by Rich Mironov. Brave UX: An interview with Teresa Torres by Brendan Jarvis.

    Related Courses: Customer Recruiting for Continuous Discovery – Get easy access to customers week over week. Story-Based Customer Interviews – Collect reliable feedback from every customer conversation.

    Our Live Discussion Schedule

    Our live discussion sessions are for paid subscribers. Sessions are not recorded. Invitations will go out to members two weeks before each event—add these to your calendar now: Tuesday, June 16, 2026: 9am–10am PDT. Thursday, September 17, 2026: 9am–10am PDT. Wednesday, December 16, 2026: 9am–10am PST.

    Audio Summary

    This summary was produced by NotebookLM. The sources supplied were the book chapters as well as all of the additional reading.

    This article is part of the CDH Book Club celebrating the five-year anniversary of Continuous Discovery Habits.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Inside Amplitude’s ML Playbook: Practical Strategies for Smarter A/B Tests and Growth

    Inside Amplitude’s ML Playbook: Practical Strategies for Smarter A/B Tests and Growth

    I’m continually asked how machine learning can make product analytics more actionable. Drawing from Amplitude analytics in real-world settings, I’ve distilled what matters most for product teams that want faster, smarter decisions without sacrificing rigor.

    When I design experiments, I start with minimum detectable effect (MDE) to size samples correctly and avoid costly, inconclusive tests. I pair that with disciplined A/B testing hygiene—clear hypotheses, thoughtful stop rules, and guardrails for key metrics—so results translate into credible product strategy choices instead of noisy dashboards.

    For growth and retention, I map behavioral analytics to activation and long-term value. Driver trees help me connect feature adoption to revenue or retention, and anomaly detection keeps me from overreacting to outliers when seasonality or data quality shift.

    I segment cohorts by user intent and lifecycle stage, measure user activation with crisp event definitions, and monitor leading indicators across a unified analytics platform. This keeps cross-functional conversations grounded, accelerates product-led growth, and reduces the risk of optimizing for vanity metrics.

    Operationally, that means building self-serve views that flag MDE-ready experiments, surface retention analysis by cohort, and trigger anomaly detection alerts only when the signal outpaces noise. The payoff is fewer meetings debating data quality and more time shipping value.

    If you’re leveling up your analytics stack, start by tightening experimentation basics, instrumenting activation and retention with behavioral analytics, and wiring in anomaly detection as a safety net. You won’t just move faster—you’ll learn faster, and with the confidence to bet big when the data earns your trust.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Inside Banani: How a Canvas-First AI Designer Elevates UX and Accelerates Product Teams

    Inside Banani: How a Canvas-First AI Designer Elevates UX and Accelerates Product Teams

    I believe the future of product design isn’t about replacing designers—it’s about giving every team access to one. That’s why Banani grabbed my attention. It’s an AI product designer that doesn’t just generate code—it generates design. For solo founders, stretched design teams, and early-stage startups, that shift matters: it raises the design floor without lowering the creative ceiling.

    I spent time with Vlad Solomakha (CEO & Co-founder), Vova Kovalchuk (CTO & Co-founder), and Vlad Ostapovats (Founding Growth) to unpack how they took Banani from a Figma plugin proof-of-concept to a canvas-first AI design tool generating hundreds of thousands of designs per week. Vlad brings a decade of design experience and a precise north star: AI should produce beautiful, tasteful design rather than average, undifferentiated UI.

    The architectural choices stood out. They engineered their agent to handle parallel screen edits, manage per-screen context across canvases with hundreds of frames, and make surgical edits without regenerating entire screens. This is the kind of agentic AI work that product leaders have been waiting for: concrete advances in context window management, tool orchestration, and prompt engineering that translate into higher throughput without sacrificing quality.

    Equally important is how they addressed the "gulf of specification"—the mismatch between how designers think visually and how agents understand text. Banani’s canvas-first approach acknowledges that design is spatial, hierarchical, and iterative. Rather than forcing a chat-first UX, they center the canvas and let the agent do production work while keeping the designer firmly in control. In practice, this narrows intent ambiguity, speeds up iteration, and preserves taste.

    The team made another pivotal bet: Why Banani doesn’t compile running applications — just HTML/CSS mockups — and how that shapes everything. By decoupling the design artifact from runnable code, they optimize for velocity, taste, and exploration. In my experience, this separation is the right product strategy for early discovery and gen ai for product prototyping—move fast on aesthetics and flows, then converge on implementation once you’ve validated the direction.

    I also appreciated their pragmatic evaluation approach. Instead of traditional evals, they spin up 10 screens from one prompt to compare models. It’s hands-on, outcome-based, and aligned with eval-driven development in real product environments. They’re relentlessly discerning about when to work around model limitations versus when to wait for the models to improve—an essential discipline when building at the edge of what’s possible.

    Under the hood, context engineering and specialized agent tools do the heavy lifting. Per-screen history with shared project context enables precise, reversible changes across large canvases. The result: fewer destructive regenerations, more reliable design intent preservation, and a workflow that feels like collaborating with a strong mid-level designer who’s exceptionally fast and consistent.

    If you want a quick tour, I recommend jumping to a few highlights: 20:13 Product Tour Canvas First AI, 33:40 Gulf of Specification, 42:54 Agent Architecture Under Hood, 48:48 State History Context Tricks, and 56:04 Navigating Busy Canvases. Each segment reveals a different layer of the system design and product thinking behind Banani’s canvas-first UX.

    For product leaders, this is a compelling blueprint for raising the design floor while protecting the last mile of craft. It aligns with empowered product teams, continuous discovery, and LLMs for product managers who need leverage without losing judgment. If you’re exploring agentic AI in design, this is a thoughtful, execution-focused model worth studying and trialing on your next product tour or redesign.

    Resources worth exploring: Banani and TL Draw. To hear the full conversation, you can listen on Spotify or Apple Podcasts. Then, pressure-test the approach inside your own product development lifecycle and see how a canvas-first AI designer reshapes your team’s velocity and quality bar.


    Inspired by this post on Product Talk.


    Book a consult png image
  • 5 powerful ways I use Pendo MCP to bring product analytics into ChatGPT, Claude, and Cursor

    5 powerful ways I use Pendo MCP to bring product analytics into ChatGPT, Claude, and Cursor

    I’ve wanted my product analytics to follow me into every conversation, doc, and code review. Now they do—and it changes how quickly I can move from question to insight to decision.

    Pendo is now available as an MCP (Model Context Protocol) server, easily accessible in Claude, ChatGPT, and Cursor.

    Practically, this means my core product analytics, segments, and qualitative feedback can be surfaced right where I plan sprints, refine opportunity solution trees, and write specs. Fewer context switches, tighter feedback loops, and faster product decisions.

    Here are five ways I put Pendo MCP to work across my day-to-day workflows—grounded in product management leadership habits and built for speed and clarity.

    1) Daily triage and decision support: In ChatGPT or Claude, I quickly query product analytics to spot anomalies, usage spikes, or drop-offs by segment. Prompts like “Highlight top features by week-over-week growth and flag statistically notable anomalies” help me focus standups on what matters, tightening the loop between observability and action.

    2) Continuous discovery prep: Before customer interviews, I pull recent NPS verbatims, feature adoption by persona, and journey mapping signals. In seconds, I have a concise brief that blends behavioral analytics with customer interviews, so I can ask sharper questions and validate assumptions faster—without leaving my AI workspace.

    3) Evidence-based prioritization: When shaping the roadmap, I bring in retention analysis, user activation metrics, and cohort views to weigh impact vs. effort. Using Pendo MCP inside Claude or ChatGPT, I translate insights into driver trees and a clear product strategy narrative that aligns stakeholders around outcomes, not output.

    4) Product-led growth and onboarding: I review onboarding funnels, identify friction in first-run experiences, and draft in-app guides and tooltip copy that meets users at the exact drop-off points. With Pendo MCP, the context for product tours and in-app guides is right where I’m writing, so iteration cycles stay tight and data-informed.

    5) Customer success and QBR prep: For account health and QBRs vs OKRs alignment, I generate succinct summaries of feature adoption, sentiment, and value realization—ready to paste into email, decks, or a CRM integration. This keeps sales-led and product-led growth motions unified, with a single source of truth visible in ChatGPT, Claude, or when I’m coding in Cursor.

    The net effect: higher-quality decisions, faster. By bringing product analytics into my AI workflows, I reduce context switching, improve context window management, and keep my team anchored to real user behavior. Wherever I’m working—ideating in Claude, drafting in ChatGPT, or reviewing code in Cursor—my Pendo context is right there with me.

    If you’re leading empowered product teams, this is a pragmatic way to operationalize continuous discovery, speed up alignment, and turn insights into outcomes. It’s a simple shift with outsized leverage.


    Inspired by this post on Pendo – Best Practices.


    Book a consult png image
  • Unlock Confident Decisions with Bayesian Statistics: Smarter A/B Tests from Small Samples

    Unlock Confident Decisions with Bayesian Statistics: Smarter A/B Tests from Small Samples

    Shipping great products is a game of making high‑quality decisions under uncertainty. In my role leading product management, I’ve seen teams stall when classic methods demand huge sample sizes before we can say anything useful. Bayesian statistics has become my go‑to approach for turning sparse data into clear, decision‑ready insights—especially when traffic is limited or experimentation windows are tight.

    Understand Bayesian statistics vs. frequentist methods and learn how Bayesian approaches improve experiment insights with small sample sizes.

    Here’s why I rely on it in A/B testing: frequentist methods focus on p‑values and long‑run error rates, which are tough to translate into action. With a Bayesian lens, I can express outcomes as intuitive probabilities—“Variant B has a 92% chance to outperform A”—and use credible intervals to communicate likely ranges of impact. That clarity reduces decision friction and helps the team move faster with confidence.

    Bayesian methods shine when sample sizes are small and the minimum detectable effect (MDE) of a frequentist test would be impractically large. I incorporate prior knowledge—historical conversion trends, seasonality, and learnings from related experiments—to stabilize noisy early data. Done thoughtfully, priors improve estimate quality without overfitting; I always run sensitivity checks to ensure the posterior is driven by the data we’re observing, not wishful thinking.

    In practice, my workflow is straightforward. I set a prior from historical performance in Amplitude analytics, run the experiment, and update the posterior daily. I track the probability of superiority, expected lift, and a credible interval that the CRO role can rally around. When the probability of a meaningful win crosses a pre‑agreed threshold, we ship. When it doesn’t, we bank the learning and move on—no prolonged debates about p‑values that few stakeholders truly understand.

    This approach also strengthens product discovery. By using behavioral analytics and retention analysis as informative priors, I can evaluate early signals from narrower cohorts—new geographies, niche segments, or enterprise accounts—where traffic is scarce. The result is faster iteration in product‑led growth environments, even when a full‑funnel test would take weeks to reach frequentist significance.

    Operationally, I treat Bayesian experimentation as part of a unified analytics platform strategy. The same posterior machinery that powers A/B testing can support anomaly detection during releases, quantify risk in phased rollouts, and estimate lift from in‑app guides or product tours. Because results are framed in plain language probabilities, cross‑functional teams make better, faster decisions aligned to outcomes rather than outputs.

    A few guardrails keep me honest. I preregister decision rules (stop/go thresholds, guardrail metrics), run prior sensitivity analyses, and document assumptions alongside results. That discipline prevents overconfidence, improves reproducibility, and builds trust with leadership.

    If your experiments are bottlenecked by low traffic or you’re tired of waiting weeks for a binary “significant/not significant,” consider a Bayesian upgrade. You’ll get earlier readouts, clearer stakeholder communication, and a repeatable path to compounding learning—without sacrificing rigor.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Product Management Isn’t Dead: Why ‘Product Builders’ Will Win in the AI Era—and How to Upskill Now

    Product Management Isn’t Dead: Why ‘Product Builders’ Will Win in the AI Era—and How to Upskill Now

    “Is product management dead?” I hear this question at almost every conference hallway chat. After listening to the latest Product Builders – All Things Product Podcast with Teresa Torres & Petra Wille, I’m more convinced than ever: product management isn’t dead—it’s evolving fast, and the leaders will be those who embrace the shift.

    Listen to this episode on: Spotify | Apple Podcasts

    The core take resonated deeply with my day-to-day at HighLevel: product management isn’t dying—“the traditional product trio (PM, design, engineering) is collapsing into something new.” The center of gravity is shifting from swim lanes to outcomes, from rigid handoffs to fluid collaboration, and from role definitions to capabilities that actually ship value.

    AI is raising the baseline across the board. That “80/20 shift: AI handles patterns, humans handle hard problems” is real on my teams. With LLMs like “GPT 5.2” and “Opus 4.5,” coding agents such as “Claude Code” and “Codex,” and tools like “Replit” and “Lovable,” we’re compressing cycle time on the repeatable 80%. The bottleneck is no longer typing code or drafting copy—it’s selecting the right problems, crafting sharp product strategy, and making confident trade-offs.

    This is why the future belongs to “product builders” — people with a shared foundation across disciplines and deep expertise in one area. I look for teams that can shape, prototype, validate, and iterate in tight loops, blending continuous discovery with empowered product teams. The baseline expands, the craft deepens.

    Functional expertise still matters—more than ever—because the hard parts are getting harder. We need leaders who can weigh platform scalability against time-to-value, protect privacy-by-design, apply AI risk management, and navigate data governance while sustaining product-market fit. When AI accelerates execution, judgment becomes the differentiator.

    For leaders, this creates a clear mandate: “What product leaders must do to create safe AI infrastructure.” In practice, that means building guardrails early—security reviews tailored to AI workflows, QA harnesses that include eval-driven development, model performance observability, and human-in-the-loop review systems. You can’t bolt this on later without paying a tax in velocity and trust.

    Hiring signals are already shifting. “How job descriptions and hiring expectations are already shifting” shows up in my reqs: we emphasize cross-functional range, fluency with AI workflows, prompt engineering literacy, and the ability to frame measurable outcomes. We still want craft depth—design systems, systems thinking in engineering, rigorous discovery—but we prize people who move seamlessly from discovery to delivery.

    In the episode, I appreciated the crisp framing of why product management isn’t dying—but changing. The rise of the “product builder” foundation reframes team topology and unlocks smaller, more cross-functional squads. AI changes the baseline skill set across product teams, and ignoring it is a career risk. If you’re not learning AI tools, you’re falling behind.

    My key takeaways were straightforward and actionable. Smaller, more cross-functional teams are likely. Deep expertise still matters—especially for complex trade-offs. Leaders need guardrails: security, QA, and review systems built for an AI-driven workflow. And if you work in product, design, or engineering, this episode is your signal to start upskilling now.

    “The risk of ignoring AI in your craft” is not hypothetical. I encourage PMs to carve out weekly lab time for hands-on experiments with LLMs for product managers, build lightweight prototypes with Replit or Lovable, and pressure-test opportunity solution trees with data-informed discovery. Pair with your engineers on agentic AI use cases, and integrate model evals into your CI/CD pipelines.

    “Mentioned in the episode” were several resources worth exploring: “Product at Heart” (June, Hamburg), “Replit,” “Lovable,” “Every,” “Petra’s Coaching Packages,” and “coding agents (Claude Code, Codex) and LLMs (GPT 5.2, Opus 4.5).” These are great jumping-off points for your own product builder toolkit.

    My recommendation: queue up the episode on your commute, then pick one workflow to augment with AI before the week ends. Replace a handoff with a shared canvas. Automate a repetitive analysis. Ship a scrappy prototype. Momentum compounds.

    Have thoughts on this episode? Leave a comment below. I’d love to hear how your teams are evolving your product trios, what AI workflows are sticking, and where governance has been most challenging.


    Inspired by this post on Product Talk.


    Book a consult png image
  • How We Built PR Review Bots In‑House for a Fraction of the Cost—and How You Can Too

    How We Built PR Review Bots In‑House for a Fraction of the Cost—and How You Can Too

    PR review bots are all the rage, but they cost a premium. We built our own for cheap that work just as well, if not better. Here's how.

    As a VP of Product Management, I care deeply about the velocity and quality of our software delivery. The decision to build our own pull request (PR) review agents came from a simple calculus: we needed tighter control over developer experience, CI/CD integration, and cost—without sacrificing accuracy or reliability. The result was a pragmatic system that accelerates reviews, improves code quality, and pays for itself through faster feedback loops.

    Before we wrote a line of code, we defined success. Our objectives were to shorten review cycles, reduce back-and-forth on style and test coverage, and surface risks earlier—measured against DORA metrics like lead time and deployment frequency. That focus aligned the team, guided our build vs buy decision, and anchored scope to the highest-impact use cases.

    We started rules-first, AI-optional. The initial release enforced guardrails that are universally valuable: linting and formatting checks, required test coverage thresholds, commit message standards, ownership validation (CODEOWNERS), and basic security scans. These automated gates eliminated predictable review friction, freeing engineers to focus on logic and architecture rather than style debates.

    Then we layered intelligence where it mattered. We added lightweight, explainable checks for common code smells and dependency risks, plus optional natural-language summaries that turn large diffs into concise context. Where appropriate, we introduced agentic AI workflows to triage PRs by risk, draft review comments, and suggest missing tests—always keeping humans in the loop. This hybrid approach kept costs low and outcomes high.

    Integration with our CI/CD pipeline was non-negotiable. We wired GitHub/GitLab webhooks to a stateless service that queued work, executed checks in containerized workers, and posted results back as status checks and review comments. Caching, parallelization, and smart diff-scoping ensured we only computed what changed, keeping the experience snappy even on large repos.

    Adoption hinged on developer experience. We made the bot’s feedback fast, specific, and actionable, with clear remediation steps and links to documentation. Feature flags allowed teams to opt into new checks gradually. ChatOps commands enabled quick overrides for emergencies, while policy-as-code kept rules visible, versioned, and auditable.

    We treated this like any product: eval-driven development for accuracy, ongoing telemetry for false-positive rates, and explicit SLAs for response times. We instrumented outcomes end-to-end—tracking PR cycle time, comment-to-merge ratios, and rework—so we could prove the ROI and tune the system without guesswork.

    The outcome: a reliable PR review companion that runs on a shoestring budget, integrates cleanly with our workflows, and measurably improves engineering throughput. If you’re weighing build vs buy, start small with rules that deliver immediate value, then layer intelligence where it earns its keep. With a clear product strategy, you can stand up capable PR review bots quickly—and scale them as your needs grow.

    If you’re ready to try this yourself, begin with your top three friction points in code reviews, wire them into your CI/CD checks, and pilot with a single team. Iterate weekly, measure relentlessly, and let your developers be your strongest signal. You’ll be surprised how far a pragmatic, product-led approach can take you.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Unlocking Impact: What Amplitude’s MCP server and experimentation platform teach product leaders

    Unlocking Impact: What Amplitude’s MCP server and experimentation platform teach product leaders

    In my role leading product management at HighLevel, I study the architectures and operating models behind high-velocity learning. I often reference "Amplitude's MCP server and its experimentation platform" as a benchmark for how to operationalize scale, reliability, and speed of insight across complex product ecosystems. That lens informs how I design processes, data flows, and decision loops that turn ambiguity into measurable outcomes.

    Experimentation is the heartbeat of eval-driven development. In practice, that means running disciplined A/B testing, deploying targeted feature flags to de-risk rollouts, and sizing experiments with a clear minimum detectable effect (MDE) so we avoid vanity wins. When teams internalize these habits, we shift from opinion-led debates to evidence-led decisions—and that’s where product-led growth compounds.

    I'm an AI enthusiast, so I think a lot about how experimentation accelerates AI roadmaps. The same rigor that validates UI changes should govern prompts, retrieval strategies, and policy settings for LLM-backed features. By treating AI behaviors as first-class experiment surfaces—and tying them to user activation, retention analysis, and value proposition metrics—we move faster without compromising safety, privacy-by-design, or customer trust.

    Making this work in production demands clean instrumentation and a unified analytics platform. I look for stacks that combine Amplitude analytics with robust observability and CI/CD to ensure we can ship, measure, and iterate continuously. When platform scalability and data governance are baked in from the start, product trios can focus on product discovery rather than firefighting pipelines or reconciling metrics.

    My playbook is straightforward: define decision-worthy questions, map them to crisp success metrics, run right-sized experiments with feature flags, and use consistent analytics to close the loop. Do this well, and you create a durable advantage—faster learning cycles, sharper product positioning, and a culture that lives by outcomes over output. That’s the real lesson I take from platforms that execute experimentation at scale: process and technology are table stakes; what wins is the discipline to learn relentlessly.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • From Engineer to CEO: Hard-Won Lessons on GTM, Cloud-First Bets, and Must-Do Focus

    From Engineer to CEO: Hard-Won Lessons on GTM, Cloud-First Bets, and Must-Do Focus

    Making the leap from engineer to CEO demands an almost entirely new skillset. I’ve felt that jolt firsthand: the tools that serve you as an IC or even a product leader—system design, crisp PRDs, elegant roadmaps—only get you about 20% of the way. The rest is learning to orchestrate go-to-market strategy, finance, hiring, culture, and product positioning with just enough depth to make sound, fast decisions while empowering true experts to execute.

    My operating heuristic is the 80% rule. As CEO or GM, I don’t need to be the best marketer, seller, or finance leader; I need to understand 80% of each function well enough to set a compelling product strategy, ask the right questions, and catch the second-order effects. That breadth unlocks speed, quality of judgment, and the conviction to say no when the organization is tempted by what it can do rather than what it must do.

    The clearest illustration comes from the journey that turned Apache Kafka—originally built at LinkedIn—into Confluent, a publicly traded enterprise software company. The technical insight was powerful, but the real lift came from translating that insight into a repeatable go-to-market engine. That required building new muscles: founder-led GTM, enterprise sales orchestration, and open source monetization without alienating the community that fueled adoption.

    Early on, the product was “embarrassing” by enterprise standards—thin features, sharp edges, and a long tail of operational gaps. Shipping anyway was the point. A thin vertical slice into the market created learning loops with real customers, not hypotheticals. That uncomfortable speed became a superpower, especially when the company decided to push toward a cloud-first business in the face of widespread opposition.

    The messaging challenge was just as hard as the technical one. Most marketing fails because it starts with what we built, not what customers must achieve. A simple product marketing pyramid—vision at the top, category framing and points of parity in the middle, crisp value props and proof at the base—helped explain Kafka to the world in customer language. When the narrative snaps into place, adoption accelerates. In Kafka’s case, one well-timed blog post clarified the “why now” and unlocked a step-change in community and enterprise pull.

    There’s a pivotal distinction leaders underestimate: the gap between what a company can do and what it must do. I use a must-do filter before every planning cycle: What moves are non-discretionary for durable product-market fit? For Kafka and Confluent, that meant ruthless prioritization on managed cloud services, reliability, and platform scalability—even when it jeopardized short-term revenue or required retooling how engineering, sales, and support worked.

    Fundraising strategy mirrored this clarity. Planning to raise before building the full product wasn’t about hype; it was about matching capital to the physics of the problem. If your category requires enterprise credibility, global infrastructure, and 24/7 SRE, you finance those table stakes early. That’s first principles decision making: instrument the constraints, then design the sequence that gets you to scale with the fewest irreversible mistakes.

    In the early years, every product decision felt like a trade between polish and learning. The team essentially bludgeoned its way into a cloud-first posture—less because the initial product was ready, and more because the market’s must-do was obvious. That’s the essence of founder-led GTM: get into the field, close lighthouse customers, and use their arcs to shape the roadmap. It’s also where open source monetization matures from downloads into durable, enterprise value.

    As the organization scales, excellence often erodes—the Chipotle problem. Process hardens; quality blurs; the magic decays. The antidotes are simple but hard: a few non-negotiable product quality bars, a short set of product-market fit metrics that everyone can recite, and empowered product teams who own outcomes over output. This is where organizational development matters as much as code: design clear interfaces between product, sales, and success, and you’ll keep velocity without losing standards.

    Contrary to popular lore, founder optimism is overrated. Constructive realism wins. I try to model “probabilistic optimism”: assume we will win, but instrument the journey like an SRE runs an incident. Set leading indicators, rehearse failure modes, and make pre-commitments to the must-do path so you’re not swayed by the latest anecdote. It keeps the team out of a failure mindset while making room for rigorous course correction.

    Giving up the right things at the right time is a CEO superpower. As complexity grows, I hand off decisions that benefit from specialization and keep only those tied to company narrative, must-do prioritization, and talent bar. CEO time management becomes a portfolio problem: ensure each week contains deep product time, frontline customer exposure, and one compounding systems fix (hiring loop, pricing rubric, or GTM enablement) that pays back for quarters.

    If you’re moving from IC or PM into a GM/CEO role, here’s a practical playbook: build your product marketing pyramid; write the one-page must-do memo for the next six quarters; ship a narrow, managed cloud slice early; pick three product-market fit metrics (usage, time-to-value, retention) and publish them company-wide; and architect an enablement engine that turns field learnings into roadmap changes within one quarter. That’s how you transform technical advantage into a category-defining business.

    The Kafka-to-Confluent arc reminds me that technology can open a door—but clarity of narrative, sequencing, and must-do focus determines whether you walk through it. When in doubt, bias toward shipping, talking to customers, and tightening the loop between what you learn and what you build. That’s the work of product management leadership at scale.


    Book a consult png image
  • Inside My Product Marketing Playbook: Amplitude Analytics Tactics That Drive PLG Wins

    Inside My Product Marketing Playbook: Amplitude Analytics Tactics That Drive PLG Wins

    I’ve curated a focused set of product marketing insights that zero in on what actually moves the needle—turning data into decisions. You’ll find a special emphasis on Amplitude Analytics, because its behavioral analytics foundation makes it easier to translate product usage into clear messaging, sharper positioning, and measurable growth.

    In my day-to-day as a product leader, I’m constantly bridging the gap between product discovery and go-to-market strategy. The best outcomes come when we connect quantitative signals to narrative: using behavioral analytics to inform the value proposition, refining product positioning with cohort trends, and driving product-led growth with activation and retention insights.

    Here’s how I put this into practice. I start with user activation and retention analysis to identify the few behaviors that predict long-term value. Then I run tightly scoped A/B testing to validate messaging and in-product prompts that nudge those behaviors. When the numbers move, I translate wins into a consistent story—one that sales, success, and marketing can all rally around.

    One pattern keeps repeating: clarity beats complexity. Instead of piling on more features, I focus on the minimum, verifiable set of behaviors that correlate with outcomes. That discipline makes it easier to craft a crisp value proposition, streamline go-to-market strategy, and accelerate feedback loops between product, design, and marketing.

    As you explore this collection, expect practical playbooks over platitudes. You’ll see how to apply Amplitude Analytics to uncover hidden friction, validate hypotheses faster, and operationalize product-led growth motions that compound over time. My goal is to help you move from interesting dashboards to decisive actions that strengthen your roadmap and your revenue.

    If you care about building empowered product teams that learn continuously, you’ll feel at home here. Dive in, borrow what works, and adapt the rest to your context—then measure it, iterate, and share the wins with your team.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • How I Safely Deploy Amplitude AI in Healthcare: Governed Analytics, PHI-Safe Workflows, Real ROI

    How I Safely Deploy Amplitude AI in Healthcare: Governed Analytics, PHI-Safe Workflows, Real ROI

    Healthcare leaders ask me the same question every week: how do we unlock AI-driven insights without risking patient trust or regulatory missteps? My approach is pragmatic and proven—connect business goals to measurable behavioral analytics, wrap everything in clear governance, and keep protected health information (PHI) out of the analytics layer by default. In other words, we earn the right to scale by making safety, compliance, and transparency visible in every step of the workflow with Amplitude AI.

    At the core, I anchor our rollout on "governed analytics"—curated events, certified metrics, and role-based access that make audits straightforward and decision-making fast. When product, data, security, and compliance share a single source of truth in Amplitude analytics, we reduce rework, eliminate ambiguous definitions, and ship improvements with confidence. This is where AI Strategy meets operational excellence: a unified analytics platform that balances velocity with verification.

    From there, I establish "PHI-safe workflows" by drawing a hard boundary around what data enters analytics. Behavioral signals flow in; identifiers stay in clinical systems. I lean on privacy-by-design, data minimization, and clear data governance so we can demonstrate regulatory compliance before a single end user is exposed to a new AI-powered experience. That alignment builds trust with legal and security, shortens review cycles, and operationalizes AI risk management without slowing innovation.

    Insights must be "trusted insights"—reliable enough to drive care pathways, staffing decisions, and patient communications. I emphasize repeatable instrumentation, observability of data quality, and transparent lineage so teams can trace outcomes back to inputs. In practice, that means we agree on event contracts, enforce change control, and verify that behavioral analytics reflect real-world adoption and efficacy across patient and provider journeys.

    To move decisively from legal review to production, I run a two-speed rollout. First, we validate in a sandbox with synthetic or de-identified data to pressure-test prompts, dashboards, and alerting. Then we graduate to controlled pilots with strict guardrails, documented data flows, and pre-agreed risk mitigations. By the time we scale, stakeholders have evidence, not just assurances—accelerating approvals and reducing last-minute scope churn.

    One pattern I rely on is connecting AI outcomes to product metrics that matter: activation, time-to-first-value, task completion rates, and variance in outcomes across segments. With Amplitude analytics, we can spot drop-offs, attribute improvements to specific design or model changes, and quantify impact in language that resonates with executives and clinicians alike. That rigor is what transforms AI from a promising prototype into a dependable operating capability.

    Success looks like faster time-to-insight, fewer compliance iterations, and audit-ready documentation built into normal workflows. It also looks like teams who are confident enough in their data to run A/B testing and continuous discovery—because they know their dashboards reflect reality. When governance, safety, and clarity are designed in, product-led growth becomes compatible with healthcare’s unique regulatory and ethical obligations.

    "See how to adopt AI in healthcare safely with Amplitude, using governed analytics, PHI-safe workflows, and trusted insights that help teams move from legal review to real usage." That’s the journey I guide teams through—measurable, compliant, and humane—so we can deliver AI that clinicians trust, patients respect, and leaders can scale.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image