Why are static A/B test calculators unreliable for product experiments?

Static calculators rely on fragile assumptions such as constant baseline conversion, balanced traffic, no seasonality, no peeking, and a fixed stopping point. The post argues that real product data changes over time, so a single sample size can create false confidence and underpowered decisions.

What is a dynamic MDE curve?

A dynamic Minimum Detectable Effect curve shows what lift a team can reliably detect as traffic and variance change over time. Instead of asking for one required sample size, it helps teams evaluate detectable effects at decision points such as 3, 7, 14, and 28 days.

How can MDE-over-time views improve experiment decisions?

MDE-over-time views make stopping rules, expected detectable lift, and traffic assumptions visible to stakeholders. In the example from the post, this helped a team make a confident call in nine days while avoiding a premature rollback.

What inputs are needed to implement dynamic MDE planning?

The post recommends forecasting traffic by day, estimating variance from historical data, and simulating power curves across relevant effect sizes. It also recommends guardrail metrics such as latency, error rates, and sample ratio mismatch.

How should teams avoid p-hacking when checking experiments over time?

The article recommends using sequential testing or Bayesian monitoring rather than informal repeated peeking. It also calls for pre-registered stopping rules and a communication routine so stakeholders know when and why a decision will be made.

Can analytics platforms automate dynamic MDE updates?

Yes. The post says teams using a unified analytics platform or tools like Amplitude analytics can generate daily MDE curves, annotate ramp changes and seasonality, and expose dashboards that track detectable lift over time.

Why do dynamic MDE bands matter for retention and revenue metrics?

Retention and revenue effects often appear over weeks or months, not immediately after a click or activation event. The post recommends planning separate MDE bands for horizons such as early activation, Day-7 retention, and longer-term LTV.

Why are static A/B test calculators unreliable for product experiments?

Static calculators rely on fragile assumptions such as constant baseline conversion, balanced traffic, no seasonality, no peeking, and a fixed stopping point. The post argues that real product data changes over time, so a single sample size can create false confidence and underpowered decisions.

What is a dynamic MDE curve?

A dynamic Minimum Detectable Effect curve shows what lift a team can reliably detect as traffic and variance change over time. Instead of asking for one required sample size, it helps teams evaluate detectable effects at decision points such as 3, 7, 14, and 28 days.

How can MDE-over-time views improve experiment decisions?

MDE-over-time views make stopping rules, expected detectable lift, and traffic assumptions visible to stakeholders. In the example from the post, this helped a team make a confident call in nine days while avoiding a premature rollback.

What inputs are needed to implement dynamic MDE planning?

The post recommends forecasting traffic by day, estimating variance from historical data, and simulating power curves across relevant effect sizes. It also recommends guardrail metrics such as latency, error rates, and sample ratio mismatch.

How should teams avoid p-hacking when checking experiments over time?

The article recommends using sequential testing or Bayesian monitoring rather than informal repeated peeking. It also calls for pre-registered stopping rules and a communication routine so stakeholders know when and why a decision will be made.

Can analytics platforms automate dynamic MDE updates?

Yes. The post says teams using a unified analytics platform or tools like Amplitude analytics can generate daily MDE curves, annotate ramp changes and seasonality, and expose dashboards that track detectable lift over time.

Why do dynamic MDE bands matter for retention and revenue metrics?

Retention and revenue effects often appear over weeks or months, not immediately after a click or activation event. The post recommends planning separate MDE bands for horizons such as early activation, Day-7 retention, and longer-term LTV.

Stop Trusting Static A/B Test Calculators: Why You Need Dynamic MDE Curves Over Time

After years of running experiments at scale, I’ve learned that the quickest way to stall product momentum is to rely on static A/B test calculators that promise certainty from a single sample size number. Real-world data rarely behaves like those calculators assume, and that gap quietly erodes decision quality, speed, and stakeholder trust.

Read about the issues with current A/B test calculators and why experimenters need to see a range of MDEs over time, not a static sample size

Most calculators hard-code fragile assumptions: a constant baseline conversion rate, balanced traffic allocation, independent and identically distributed sessions, no seasonality, no peeking, no novelty effects, and a fixed-horizon stop. They often use normal approximations that break at low counts and ignore the realities of traffic ramping, SRM (sample ratio mismatch), and mid-test product updates. The result is a deceptively precise sample size that fits the math, not the environment.

In practice, product teams peek, traffic fluctuates by day of week, acquisition mixes shift, and funnel variance changes as users move from click to activation to retention. These conditions make “the” required sample size a moving target, not a constant. Treating a static figure as a guarantee leads to underpowered tests, false confidence, and rushed stops that inflate false positives.

The alternative is to manage Minimum Detectable Effect dynamically. Instead of anchoring on a single number, I plan with a range of MDEs over time—power curves that show what lift we can reliably detect after 3, 7, 14, and 28 days as traffic accrues. This reframes the question from “How big should my sample be?” to “What effect sizes can we detect at each decision point given our forecasted traffic and variance?”

At HighLevel, this approach changed our experimentation culture. For example, an onboarding flow test initially “required” three weeks according to a static calculator. Our MDE-over-time view showed we could detect a meaningful 4–6% lift within a week under expected weekday traffic, but only 8–10% on weekends due to volatility. We set a sequential schedule for interim checks, aligned stakeholders on stopping rules, and made a confident call in nine days—saving a sprint and avoiding a premature rollback.

Implementing dynamic MDEs is straightforward: forecast traffic by day, estimate variance from historical data, and simulate power curves across relevant effect sizes. Layer in sequential testing or Bayesian monitoring to avoid p-hacking, include guardrail metrics (e.g., latency, error rates, SRM), and publish an MDE band that updates as data arrives. This transforms your “calculator” into a living decision tool rather than a one-time estimate.

For teams using a unified analytics platform or tools like Amplitude analytics, it’s simple to automate: generate daily MDE curves, annotate ramp changes and seasonality, and expose a dashboard that tracks detectable lift as a function of time and traffic. Pair this with pre-registered stopping rules and a simple communication routine so stakeholders know exactly when and why you’ll decide.

Beyond top-of-funnel conversion, this mindset is critical for retention analysis and revenue outcomes where effects materialize over weeks or months. Plan MDE bands per horizon—early activation, Day-7 retention, and longer-term LTV—so product discovery and product-led growth bets aren’t prematurely judged on the wrong timeline.

The takeaway is simple: retire the illusion of a one-number sample size. Embrace dynamic MDE curves that reflect how your data actually behaves, make faster and more confident calls, and keep empowered product teams focused on outcomes over outputs. Your experiments—and your roadmap—will move with more speed, less drama, and far better signal.

Inspired by this post on Amplitude – Perspectives.