After years of running experiments at scale, I’ve learned that the quickest way to stall product momentum is to rely on static A/B test calculators that promise certainty from a single sample size number. Real-world data rarely behaves like those calculators assume, and that gap quietly erodes decision quality, speed, and stakeholder trust.
Read about the issues with current A/B test calculators and why experimenters need to see a range of MDEs over time, not a static sample size
Most calculators hard-code fragile assumptions: a constant baseline conversion rate, balanced traffic allocation, independent and identically distributed sessions, no seasonality, no peeking, no novelty effects, and a fixed-horizon stop. They often use normal approximations that break at low counts and ignore the realities of traffic ramping, SRM (sample ratio mismatch), and mid-test product updates. The result is a deceptively precise sample size that fits the math, not the environment.
In practice, product teams peek, traffic fluctuates by day of week, acquisition mixes shift, and funnel variance changes as users move from click to activation to retention. These conditions make “the” required sample size a moving target, not a constant. Treating a static figure as a guarantee leads to underpowered tests, false confidence, and rushed stops that inflate false positives.
The alternative is to manage Minimum Detectable Effect dynamically. Instead of anchoring on a single number, I plan with a range of MDEs over time—power curves that show what lift we can reliably detect after 3, 7, 14, and 28 days as traffic accrues. This reframes the question from “How big should my sample be?” to “What effect sizes can we detect at each decision point given our forecasted traffic and variance?”
At HighLevel, this approach changed our experimentation culture. For example, an onboarding flow test initially “required” three weeks according to a static calculator. Our MDE-over-time view showed we could detect a meaningful 4–6% lift within a week under expected weekday traffic, but only 8–10% on weekends due to volatility. We set a sequential schedule for interim checks, aligned stakeholders on stopping rules, and made a confident call in nine days—saving a sprint and avoiding a premature rollback.
Implementing dynamic MDEs is straightforward: forecast traffic by day, estimate variance from historical data, and simulate power curves across relevant effect sizes. Layer in sequential testing or Bayesian monitoring to avoid p-hacking, include guardrail metrics (e.g., latency, error rates, SRM), and publish an MDE band that updates as data arrives. This transforms your “calculator” into a living decision tool rather than a one-time estimate.
For teams using a unified analytics platform or tools like Amplitude analytics, it’s simple to automate: generate daily MDE curves, annotate ramp changes and seasonality, and expose a dashboard that tracks detectable lift as a function of time and traffic. Pair this with pre-registered stopping rules and a simple communication routine so stakeholders know exactly when and why you’ll decide.
Beyond top-of-funnel conversion, this mindset is critical for retention analysis and revenue outcomes where effects materialize over weeks or months. Plan MDE bands per horizon—early activation, Day-7 retention, and longer-term LTV—so product discovery and product-led growth bets aren’t prematurely judged on the wrong timeline.
The takeaway is simple: retire the illusion of a one-number sample size. Embrace dynamic MDE curves that reflect how your data actually behaves, make faster and more confident calls, and keep empowered product teams focused on outcomes over outputs. Your experiments—and your roadmap—will move with more speed, less drama, and far better signal.
Inspired by this post on Amplitude – Perspectives.












Leave a Reply