Tag: A/B testing

  • From Vision to Execution: Building Agentic, Data‑Driven Products with Real‑World Rigor

    From Vision to Execution: Building Agentic, Data‑Driven Products with Real‑World Rigor

    When I consider where product development is headed, one statement captures the mandate perfectly: "Eric Carlson is a Principal AI Engineer helping to shape and build Amplitude's next generation vision of of agentic and data driven product development." That vision resonates deeply with how I lead teams—anchoring strategy in behavioral analytics while enabling agentic AI to act on insights with speed, safety, and measurable impact.

    Translating that vision into execution starts with clarity of outcomes. I frame driver trees that connect customer value to leading indicators—activation, engagement depth, and retention—then instrument product telemetry with Amplitude analytics and behavioral analytics to surface the moments that matter. From there, we operationalize learning with A/B testing and feature flags, ensuring each hypothesis gets a fair, observable run and that we can safely ramp what works.

    Agentic AI changes the operating model. Instead of static dashboards, we design autonomous workflows that observe signals, reason over context, and take action—grounded in a retrieval-first pipeline and governed by eval-driven development. For product managers, this demands fluency with LLMs for product managers and practical prompt engineering, plus rigorous AI Strategy around data governance, privacy-by-design, and risk scoring so agents remain trustworthy under real-world conditions.

    Cross-functional cadence is everything. I partner closely with Principal AI Engineers and product trios to blend continuous discovery with execution: rapid user interviews to reveal intent, opportunity solution trees to prioritize, and outcomes vs output OKRs to align incentives. The result is a system where insights are unified, decisions are explainable, and agents improve through tight feedback loops across analytics, experimentation, and production telemetry.

    If you’re building toward an agentic, data-driven future, invest in a unified analytics platform, shorten the path from signal to action, and measure learning velocity as carefully as feature delivery. With the right foundations, agentic AI becomes more than a feature—it becomes a force multiplier for product strategy, customer value, and sustainable growth.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Principal Product Manager Playbook: Strategy, Discovery, and Measurable Impact That Lasts

    Principal Product Manager Playbook: Strategy, Discovery, and Measurable Impact That Lasts

    I’ve spent my career building products that move the needle, and as a Principal Product Manager and product leader at HighLevel, I focus on the work that compounds: clear strategy, rigorous discovery, and measurable outcomes. My role is to turn ambition into traction by aligning vision with execution, then proving impact with data, not anecdotes.

    Great product strategy starts with customer value and ends with business results. I frame the narrative around a defensible value proposition, clarify points of parity and points of differentiation, and translate that into driver trees tied to outcomes vs output OKRs. This creates line-of-sight from our roadmap to metrics that matter—Net Recurring Revenue (NRR), activation, retention, and expansion—so teams know exactly why their work matters.

    Discovery is continuous, not a phase. I partner in product trios to run continuous discovery through customer interviews, journey mapping, and an opportunity solution tree that separates signal from noise. By keeping a weekly cadence of learning, we reduce risk early, refine problem statements, and ensure we’re solving the highest-leverage jobs to be done for our customers.

    Evidence beats opinion, so I obsess over instrumentation and experimentation. I rely on Amplitude analytics for behavioral analytics, cohorting, funnel health, and retention analysis, and I validate hypotheses with A/B testing designed around a minimum detectable effect (MDE). With feature flags, we decouple deployment from release, ramp value safely, and learn fast without exposing the entire base to risk.

    Execution only works when planning is pragmatic and transparent. I run product roadmapping and sprint planning as living systems informed by discovery insights and real usage data. That means tighter stakeholder management, clearer trade-offs, and fewer surprises for go-to-market partners—so we ship confidently and tell a crisp story from beta through scale.

    I also apply modern AI practices where they create real leverage. For exploration and prototyping, I use gen ai for product prototyping and practical workflows from LLMs for product managers to accelerate research synthesis, scenario mapping, and content generation—always with human-in-the-loop judgment, data governance, and privacy-by-design as non-negotiables.

    The result is a disciplined, human-centered, and data-powered approach. I build empowered product teams that learn faster than the market, align on few-but-mighty bets, and compound outcomes over outputs. That’s how a Principal Product Manager consistently turns strategy into durable, product-led growth.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Unlock High-Leverage PM Work: 5 Claude Cowork Playbooks to Turbocharge Your Strategy

    Unlock High-Leverage PM Work: 5 Claude Cowork Playbooks to Turbocharge Your Strategy

    In my role leading product teams, I’m relentless about freeing time for high-leverage work—clarifying strategy, sharpening positioning, and unblocking execution. Claude Cowork has become a reliable AI partner in that mission, helping me automate repeatable tasks while preserving judgment for the decisions that matter most.

    Get 5 playbooks to automate common product management tasks with Claude Cowork and free yourself for higher-leverage PM work.

    When I say “playbooks,” I mean structured, repeatable workflows that turn messy inputs into crisp outputs—without sacrificing rigor. With agentic AI, LLMs for product managers, and thoughtful prompt engineering, these playbooks plug directly into my product roadmapping and sprint planning process, accelerating discovery, analysis, and stakeholder alignment.

    Playbook 1: Continuous discovery synthesis. I route raw customer interviews, support threads, and behavioral analytics into Claude Cowork to cluster themes, extract Jobs-to-Be-Done, and propose opportunity areas. It drafts an initial opportunity solution tree with clear problem statements, target outcomes, and candidate solutions, which I then refine with the team. This shortens the loop between customer interviews and actionable insights while preserving the nuance that continuous discovery requires.

    Playbook 2: Strategy-to-roadmap alignment. Starting from our product strategy and target outcomes, I ask Claude Cowork to translate goals into a prioritized roadmap, calling out outcomes vs output OKRs and showing driver trees that connect initiatives to measurable impact. It flags dependencies and suggests stakeholder management touchpoints, making the narrative behind prioritization transparent and easier to socialize across product trios and leadership.

    Playbook 3: Experiment design and A/B testing. To move from ideas to evidence, I have Claude Cowork generate testable hypotheses, success metrics, and guardrails for A/B testing. It produces experiment briefs, checks statistical assumptions like minimum detectable effect (MDE), and suggests instrumentation plans for tools such as Amplitude analytics. I use these drafts to speed up reviews without compromising on methodological rigor.

    Playbook 4: Launch communications and in-product guidance. After we ship, I leverage Claude Cowork to assemble UX writing, release notes, and in-app guides tailored to user segments. It proposes short product tours, contextual tooltips, and support macros that keep messaging consistent across Pendo or Intercom while reinforcing our value proposition. The result is faster, more cohesive go-to-market execution with fewer round-trips.

    Playbook 5: AI risk, governance, and quality checks. Before anything goes live, I use Claude Cowork to run structured reviews for data governance, privacy-by-design, and AI risk management. It helps draft acceptance criteria, red-team prompts for edge cases, and an eval-driven development checklist so the team can track model behavior and mitigate regressions over time. These safeguards maintain trust as we scale AI workflows across the product surface.

    To make these playbooks sing, I seed Claude Cowork with a retrieval-first pipeline of canonical docs—vision, strategy, OKRs, analytics dashboards, and definition-of-done checklists—plus prompt templates tuned for our voice and review standards. Tight context window management, explicit role instructions, and lightweight evaluations keep outputs accurate, auditable, and on-brand.

    The impact has been compounding: faster discovery-to-decision cycles, clearer roadmaps tied to outcomes, stronger experiments, and launch content that lands. Most importantly, the team spends more time on creative problem solving and stakeholder partnership, not manual synthesis or formatting. If you’re ready to reclaim your calendar and elevate your product strategy, start with these five Claude Cowork playbooks and iterate from there.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • How I Orchestrate Growth & AI at Amplitude to Ignite Viral Product Engagement

    How I Orchestrate Growth & AI at Amplitude to Ignite Viral Product Engagement

    I lead Growth & AI at Amplitude, where I focus on viral and core growth strategies, user acquisition, and product engagement. My north star is to architect durable growth loops that compound over time while elevating the customer experience—from the first onboarding moment to deep, habitual use.

    Day to day, I combine Amplitude analytics and behavioral analytics to power product-led growth. By instrumenting the right events, mapping activation journeys, and running disciplined A/B testing, I drive user activation and accelerate time-to-value. That work extends into onboarding, in-app guides, and retention analysis, ensuring we optimize not just for acquisition but also for sustainable engagement and expansion.

    On the AI front, I define and execute the AI Strategy that responsibly applies gen ai and LLMs for product managers to increase experimentation velocity and personalize experiences at scale. This includes deploying intelligent nudges, next-best actions, and adaptive UX while honoring privacy-by-design and strong data governance practices. The outcome is a feedback-rich system that learns from user behavior and continuously improves product-market fit signals.

    My playbook is simple but rigorous: align on a clear North Star, translate it into activation and retention metrics, size lift using minimum detectable effect (MDE), and iterate fast with product trios. I use an opportunity solution tree to prioritize bets, validate with continuous discovery, and then harden winning patterns into repeatable growth loops. This approach keeps teams focused on outcomes, not output, and creates a shared language across product, design, data, and engineering.

    If you’re exploring how to scale product-led growth with AI, this is the path I follow: turn rich product analytics into actionable insights, test with scientific precision, and ship experiences that feel personal, timely, and trustworthy. The result is a growth engine that compounds—driving efficient acquisition, stronger activation, and enduring product engagement.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • How a Digital Analytics Visionary Shapes My Product Strategy for Growth, Retention & Monetization

    How a Digital Analytics Visionary Shapes My Product Strategy for Growth, Retention & Monetization

    Data has always been my compass for building products that customers love and businesses depend on. Few sentences distill that imperative as crisply as the one below—and it continues to inform how I prioritize, experiment, and scale outcomes across the roadmap.

    Krista is a digital analytics leader, product strategist, and industry evangelist. She helps businesses use data to drive growth, retention, and monetization.

    That mandate mirrors how I run product: leverage behavioral analytics to uncover patterns, translate those insights into hypotheses, and validate them through rigorous A/B testing. I start by instrumenting the user journey end to end, then use cohort analysis, funnel diagnostics, and retention analysis to pinpoint where activation, engagement, or monetization is stalling. From there, I map driver trees to connect inputs (feature adoption, time-to-value, onboarding friction) to outputs (retention, conversion, revenue), so every experiment has a clear line of sight to business impact.

    On experimentation, I hold the bar high: define the minimum detectable effect (MDE) up front, ensure clean experiment design, and size samples to reduce noise. I combine Amplitude analytics with qualitative signals from continuous discovery to prioritize tests that move the needle, not just the vanity metrics. When a variant wins, I don’t stop at the lift—I track downstream effects on user activation, long-term retention, and monetization, ensuring we’re compounding gains rather than optimizing in silos.

    For product-led growth, I focus on the moments that matter most: first-value, aha, and habit formation. Journey mapping helps me identify the shortest, clearest path to value, while targeted in-app experiences and contextual nudges accelerate activation without adding friction. Every iteration feeds a learning loop—measure, learn, and ship—so we can pursue step-change outcomes, not incremental tweaks.

    Ultimately, the craft is in translating analytics into action. When teams can trace a feature idea to a specific behavioral pattern, test it with a well-powered A/B experiment, and observe durable improvements in retention and revenue, momentum takes care of itself. That’s how I operationalize data to deliver growth, retention, and monetization at scale.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • 4 Costly Agent Analytics Myths—And the Data-Backed Metrics I Rely on Instead

    4 Costly Agent Analytics Myths—And the Data-Backed Metrics I Rely on Instead

    In my work with product, operations, and support leaders, I’m often asked to help make sense of Agent Analytics—what to track, how to attribute outcomes, and where to invest. After reviewing countless dashboards and running experiments across human agents and AI agents, I’ve learned that some of the most common measurement beliefs are precisely the ones that lead teams astray.

    What comes up in conversation with leaders about Agent Analytics, and why not everything is what it seems.

    Below, I unpack four pervasive myths I encounter and share the data-centered practices I use to replace them. My goal is simple: help you upgrade the way you measure performance so you can improve customer outcomes, accelerate learning, and scale impact with confidence.

    Myth 1: “Lower average handle time (AHT) means higher performance.” AHT is useful but incomplete. When teams optimize solely for speed, they often push complexity into repeat contacts, reopens, or escalations. In the data, that shows up as a weak or negative relationship between lower AHT and durable outcomes like first contact resolution (FCR), customer effort, or revenue per conversation.

    Reality and what I measure instead: I right-size speed by pairing AHT with intent-level resolution and recontact rate. For simple intents (password reset, billing address update), shorter is usually better. For complex intents (tiered troubleshooting, multi-step verification), “right-speeding” wins—slightly longer interactions that prevent rework. Practically, that means segmenting by intent complexity using behavioral analytics, tracking weighted “intent resolution rate,” and monitoring repeat-contact windows (24–168 hours) to catch downstream pain.

    Myth 2: “AI agent containment tells the whole story.” A high containment rate can mask failure modes such as unresolved intent, silent abandonment, or low-quality handoffs that frustrate customers and spike human workload later.

    Reality and what I measure instead: I break containment into three parts for voice and chat flows: (1) intent resolution without escalation, (2) graceful handoff quality when escalation is necessary, and (3) post-handoff efficiency and satisfaction. For voice AI agent experiences, I also track escalation clarity (did the transcript summarize history and intent?), time-to-human, and customer satisfaction on the combined interaction. This provides a fuller view of customer support ai strategy effectiveness and avoids over-crediting automation for partial wins.

    Myth 3: “Quality is subjective, so it can’t be measured at scale.” Teams often default to sporadic QA because they assume it can’t be standardized across channels or agent types. The result is noisy feedback loops and stalled coaching.

    Reality and what I measure instead: Quality becomes measurable when it’s grounded in observable behaviors linked to outcomes. I use a rubric anchored in behavioral analytics (e.g., verified customer need, correct resolution path, policy compliance, empathy markers) and validate it via correlation with FCR, recontact, and retention analysis. To scale, I combine calibrated human reviews with AI-assisted scoring, check inter-rater reliability weekly, and use driver trees to connect quality levers to business results. This creates a consistent, coachable signal for both human agents and AI flows.

    Myth 4: “If the dashboard is green after launch, we’ve won.” Early wins can reflect novelty effects, cherry-picked routing, or short-term incentives that don’t persist. Declaring victory too soon locks in fragile gains and hides regressions across cohorts.

    Reality and what I measure instead: I treat go-live as the start of learning. I use A/B testing with a clear minimum detectable effect (MDE), stagger ramps, and hold out stable control cohorts for at least one full demand cycle. I track outcomes vs output OKRs—focusing on intent resolution, customer effort, and revenue/customer health over vanity metrics. I also monitor seasonality and channel mix shifts inside a unified analytics platform to ensure improvements generalize beyond the first week.

    How I operationalize this day to day: (1) define intents and complexity upfront, (2) unify journey data across channels, (3) instrument resolution and recontact rigorously, (4) apply driver trees to isolate what actually moves outcomes, and (5) iterate via disciplined experiments rather than sweeping changes. This approach aligns product and operations, speeds up coaching, and ensures AI investments compound rather than decay.

    If you’re rethinking your Agent Analytics stack, start by replacing each myth with a sharper metric: pair AHT with intent-level resolution, pair containment with handoff quality and satisfaction, pair QA with outcome-linked rubrics, and pair green dashboards with robust experiments. The payoff is a measurement system that earns trust, guides better decisions, and consistently improves customer and business results.


    Inspired by this post on Pendo – Best Practices.


    Book a consult png image
  • Make AI Search Count: Convert Every Query into Revenue with Visibility, Sentiment, and Action

    Make AI Search Count: Convert Every Query into Revenue with Visibility, Sentiment, and Action

    In my role leading product strategy at HighLevel, I’ve learned that AI search is one of the most overlooked growth levers in a modern product stack. When we treat every query as a moment to understand intent, reduce friction, and guide users to value, AI search stops being a utility and starts becoming a compounding engine for product-led growth.

    "Turn AI search into a growth channel with AI visibility, sentiment analysis, revenue impact, and content recommendations in one place."

    That single line has become a practical blueprint for how I operationalize AI Strategy: make what users ask visible, interpret how they feel, quantify what converts, and continually recommend better content. AI visibility tells me which intents we serve well (and where we fail). Sentiment analysis connects experience to emotion. Revenue impact closes the loop with attribution. Content recommendations ensure we don’t just diagnose gaps—we close them.

    Under the hood, I anchor this on a retrieval-first pipeline that marries behavioral analytics with a unified analytics platform. This lets me trace the path from query to outcome: how users phrase needs, which results earn clicks, where drop-offs happen, and which experiences correlate with activation, retention, and expansion. With that signal, I can prioritize high-leverage content updates, tune relevance, and decide when agentic AI should step in with guided workflows rather than static results.

    Measurement has to be rigorous. I rely on eval-driven development to benchmark intent coverage and answer quality, then confirm impact with A/B testing designed around a clear minimum detectable effect. We test ranking tweaks, prompt variants for LLMs for product managers, and new answer types (short snippets vs. deep dives) to isolate what actually moves activation and Net Recurring Revenue. If it doesn’t change behavior or dollars, it’s noise.

    The operating model matters as much as the model weights. Cross-functional product trios pair continuous discovery and journey mapping with a lightweight content audit cadence. The CRO role partners with data science to align search KPIs to revenue goals, and solutions engineering ensures CRM integration and downstream systems reflect what users discover. This keeps the system honest: every improvement is traceable from insight to impact.

    Finally, governance and scale are non‑negotiable. Privacy-by-design, clear data governance, and observability protect trust while feature flags and CI/CD let us iterate safely. When the fundamentals are strong, we can confidently expand into richer experiences—like proactive recommendations, in-app guides, and voice AI agent handoffs—without sacrificing reliability or compliance.

    If your AI search still feels like a black box, it’s time to turn it into a transparent, revenue-linked growth channel. Make the work visible, measure what matters, and let sentiment and behavior guide the roadmap. The payoff is real: better answers, faster activation, and a content system that learns—and sells—every day.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Principal Product Manager Playbook: Strategy, Leadership, and Execution That Scales

    Principal Product Manager Playbook: Strategy, Leadership, and Execution That Scales

    I’ve learned that the Principal Product Manager role is the crucible where strategy, execution, and leadership meet. It’s less about owning a backlog and more about owning an outcome—aligning a portfolio of bets to a clear vision, then guiding empowered product teams to deliver measurable impact at pace.

    Unlike a Senior PM who may anchor a single area or a Group PM who often has direct people management, I operate as a force multiplier. I set product strategy, shape cross-functional operating rhythms, mentor PMs and product trios, and influence executives and partners—without relying on formal authority. The bar is outcomes over output, clarity over activity, and learning over certainty.

    My first move is to define a crisp North Star and the driver tree beneath it. I translate company goals into outcomes using outcomes vs output OKRs, ensuring every roadmap item ties to a measurable lever (conversion, retention, activation, expansion). This structure prevents feature factory drift and creates a shared language for prioritization and trade-offs.

    Discovery is continuous, not a phase. I run weekly customer interviews, synthesize insights with journey mapping, and map opportunities with an opportunity solution tree so teams solve the right problems before building the right solutions. I use the Kano Model to calibrate expectations on “delighters” versus “must-haves,” and I document assumptions so we can invalidate them early instead of discovering them late.

    Data sharpens judgment. I rely on Amplitude analytics for behavioral analytics, retention analysis, and funnel diagnostics, pairing this with A/B testing to validate causal impact. I size experiments with minimum detectable effect (MDE) to reduce false negatives, and I instrument leading indicators to shorten feedback loops—so we can pivot weeks earlier, not quarters later.

    Execution is where strategy earns its keep. I plan in outcomes-based quarters and deliver in two-week sprints, keeping a living roadmap that reflects new learning. Product trios (PM, design, engineering) co-own problem framing and solution shaping, while I maintain stakeholder management with transparent trade-offs and crisp decision records. This balance preserves autonomy while ensuring alignment.

    High standards spread through coaching. I mentor PMs on writing testable bets, crafting compelling problem statements, and telling a metrics-first narrative. I champion empowered product teams because autonomy plus accountability consistently outperforms mandate-driven delivery—and because it attracts and retains top talent.

    As scope scales, so does storytelling. I align leaders through a brief, repeatable operating cadence: monthly business reviews tied to driver trees, quarterly OKRs grounded in outcomes, and QBRs vs OKRs alignment to keep customer-facing teams in lockstep. I choose first principles decision making for high-ambiguity calls, and I make risks explicit early.

    Go-to-market is part of product, not an afterthought. I partner with marketing and customer success to craft value propositions, then validate them in-product with in-app guides and product tours. We define user activation precisely, instrument it, and iterate messaging and onboarding until time-to-value collapses. This is how product-led growth compounds.

    Technical excellence reduces product risk. I advocate for feature flags to decouple release from launch, CI/CD to increase deployment frequency, and observability to catch regressions fast. These practices make experimentation cheaper and safer, which in turn makes bold bets possible.

    My 30-60-90 framework is simple. In 30 days, clarify outcomes, baselines, and constraints; in 60, run discovery sprints and ship the first experiments; in 90, land two to three measurable wins, prune low-signal bets, and scale the operating cadence. The goal is momentum with meaning—evidence, not theater.

    At HighLevel, I’ve seen that the Principal Product Manager unlocks leverage by combining strategic clarity with disciplined learning and empathetic leadership. When we align on outcomes, instrument for truth, and empower teams, we don’t just ship features—we shift the trajectory of the business.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Inside AI Product Management at Amplitude: How Leaders Turn Data into Better Products

    Inside AI Product Management at Amplitude: How Leaders Turn Data into Better Products

    When I think about the impact of AI on product management, one line sums it up for me: "Spencer Whittaker is a senior AI product manager at Amplitude. He focuses on using AI to advance Amplitude's mission of helping companies build better products." That focus on outcomes reflects how I frame AI Strategy—grounding every model and workflow in customer value and product-led growth.

    In practice, that means pairing Amplitude analytics and behavioral analytics with A/B testing and continuous discovery. I lean on eval-driven development to keep models honest, and I coach LLMs for product managers techniques so teams can prototype safely while we protect signal. Using a unified analytics platform clarifies what to build next and how to iterate faster.

    On teams I lead, product discovery stays tightly coupled to AI workflows: we map hypotheses to metrics, design experiments, and close the loop with instrumentation before we ship. That discipline turns AI from a demo into durable value, accelerating activation, retention, and feature adoption without sacrificing quality. A pragmatic AI product toolbox keeps us focused on measurable outcomes, not just novel capabilities.

    If you’re building with AI today, take a page from leaders pushing the craft forward: start with clear outcomes, connect your data in a unified analytics platform, and let A/B testing and continuous discovery guide your roadmap. With the right foundations—Amplitude analytics, behavioral analytics, and a sharp AI Strategy—you’ll transform insight into impact and build better products, faster.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • AI Experimentation Mastery: How I Test Faster, Tame Variability, and Ship with Confidence

    AI Experimentation Mastery: How I Test Faster, Tame Variability, and Ship with Confidence

    I’ve learned that the fastest path to durable AI impact is a disciplined experimentation engine: one that moves quickly, reduces ambiguity, and earns trust with evidence. My goal isn’t just to ship models—it’s to ship measurable outcomes with repeatable rigor.

    AI experimentation for product teams. Here’s how to test AI features, choose the right metrics, handle variability, and make data-driven decisions.

    I start every AI initiative by framing a clear decision: what must be true for this feature to be worth building, and how will we know quickly? From there, I map driver trees that connect user value to measurable signals, so every test clarifies both impact and risk, not just accuracy.

    Success criteria come next. I translate aspirations into testable thresholds, define leading and lagging indicators, and size tests with minimum detectable effect (MDE) so we don’t confuse noise for signal. This keeps us honest about sample sizes, power, and the real cost of waiting for certainty.

    Before I touch production traffic, I run eval-driven development. I curate golden datasets that reflect real user complexity, codify rubrics for correctness, safety, tone, and latency, and automate scoring so improvements are reproducible—not anecdotal. This gives the team a stable baseline to iterate prompts, tools, and policies with confidence.

    Model behavior is inherently stochastic, so I deliberately control variability. I document temperature, top-p, and seed strategies; I compare deterministic settings for regression checks versus sampled settings for user-facing creativity; and I test sensitivity across content lengths and edge cases. This reduces flakiness and prevents surprise regressions during CI/CD.

    When it’s time to learn from real users, I favor A/B testing with thoughtful guardrails. I run holdouts, cap exposure with feature flags, and protect core experience metrics like retention and time-to-value. For ranking and retrieval changes, I’ll use interleaving or switchback tests to isolate effects from seasonality and traffic mix.

    To handle LLM variability online, I aggregate outcomes over multiple prompts per cohort, use stratified bucketing to balance power users and new accounts, and track confidence intervals over time instead of snapshot p-values. This approach turns noisy model outputs into stable product signals.

    Instrumentation fuels everything. I rely on behavioral analytics to trace user intent, effort, and satisfaction across flows, and I wire up Amplitude analytics for event schemas, funnel drop-offs, and cohort comparisons. Clear event taxonomies and naming discipline make it trivial to separate model quality from UX friction.

    Risk is part of the work, so I bake in AI risk management early. I include toxicity and PII checks in my offline evals, monitor safety metrics in every A/B, and set rollback criteria tied to user harm and system costs. Privacy-by-design, audit logs, and runtime safeguards aren’t afterthoughts—they’re acceptance criteria.

    The operating cadence matters as much as the math. I run continuous discovery with customer interviews to keep the test queue grounded in real jobs-to-be-done, and I align product trios on hypotheses, success metrics, and stop-loss rules before launch. Weekly readouts keep decisions crisp, and post-ship learning cycles feed the next iteration.

    Finally, I invest in upskilling the team. We run internal workshops on LLMs for product managers, standardize experiment templates, and maintain a living playbook so new experiments start at 80% instead of 0%. The result: faster learning loops, safer bets, and more confident shipping.


    Inspired by this post on Product School.


    Book a consult png image
  • Pretotyping vs. Prototyping: How I Validate Ideas Fast and Build Products Customers Love

    Pretotyping vs. Prototyping: How I Validate Ideas Fast and Build Products Customers Love

    I learned early in my career that beautiful prototypes don’t save you when you’re solving the wrong problem. What does save you is separating market risk from solution risk and choosing the fastest, lowest-cost way to get evidence. That’s why I rely on pretotyping to test demand in days and prototyping to refine usability and feasibility once I see a strong signal. The result: faster cycles, fewer wasted sprints, and products customers genuinely want.

    Pretotyping vs. prototyping explained: differences, benefits, examples, and when to use each approach to validate ideas before you build.

    Here’s how I define the two in practice. Pretotyping answers, “Should we build this at all?” Its goal is to validate real user intent and behavior with the lightest-weight artifact possible—often before any code. Think painted-door (fake door) experiments, Wizard-of-Oz flows powered by humans behind the scenes, concierge tests, landing-page smoke tests with waitlists or preorders, and simple A/B testing to gauge click-through intent. It optimizes for time-to-signal and cost-to-learn.

    Prototyping answers, “Can we build this well?” and “How should it work?” Once demand is evidenced, I prototype to de-risk solution details: usability, architecture, performance, and integration. This might include interactive UI models, high-fidelity flows, technical spikes, or service stubs. Here, I optimize for learning about user experience and technical feasibility without fully committing to production.

    When should you use each? If your biggest unknown is market risk—whether customers care at all—start with pretotyping. If your biggest unknown is solution risk—how to deliver an experience that’s usable, reliable, and scalable—move to prototyping. In other words, validate the “right thing” before you perfect the “thing right.”

    My decision rule is simple: identify the dominant risk, then pick the smallest experiment that can credibly invalidate it. For market risk, I look for evidence of behavior, not opinions: clicks on a painted door, signups on a landing page, willingness to pay (deposits, preorders), or sustained repeat usage in a Wizard-of-Oz flow. For solution risk, I look for task completion, time-on-task, error rates, and qualitative friction from usability sessions with a realistic prototype.

    Concrete examples from recent work help illustrate the difference. When exploring a new analytics insight, I shipped a fake door inside our product nav; a simple tooltip explained the concept and captured interest. Click-through rate, conversion to a short explainer, and waitlist signups told me whether the value proposition resonated before building anything. For a complex AI-assisted workflow, I ran a Wizard-of-Oz experiment: users experienced the end-to-end flow while our team manually handled the “AI” behind the curtain. That gave us real engagement data and edge cases to inform the prototype and later the MVP.

    Metrics matter. I set a clear hypothesis with a guardrail on sample size and a minimum detectable effect I’d consider actionable. For pretotyping, I focus on time-to-first-signal, intent conversion (CTR to interest, interest to signup), cost-per-qualified-lead, and evidence of willingness to pay. For prototyping, I prioritize task success rates, usability severity findings, and qualitative insights that materially change the design or technical approach. Above all, I avoid vanity metrics and anchor decisions to outcomes, not output.

    My repeatable playbook looks like this: (1) Frame the problem and value proposition in one crisp sentence. (2) Choose the leanest pretotyping method that can reveal real behavior. (3) Define success metrics and a decision rule before you run the test. (4) Launch quickly, instrument well, and let the data run long enough to be credible. (5) If demand is strong, promote to a prototype to refine UX and de-risk technicals; if not, iterate the proposition or stop. This keeps product discovery continuous and ensures roadmapping and sprint planning are guided by evidence.

    There are ethical guardrails I never skip. Painted doors must set correct expectations once clicked; waitlists or learn-more pages are honest and respectful. For Wizard-of-Oz and concierge tests, I’m explicit about data handling and provide timely follow-up. Trust compounds when experiments are transparent and user time is valued.

    Tooling can accelerate the cycle without diluting rigor. I often use lightweight design systems and no-code automations to stitch together realistic flows, and I’ll leverage gen AI for product prototyping to generate copy, microinteractions, or data scaffolding. But the principle remains: don’t over-invest until evidence earns the investment. Empowered product teams thrive when they optimize for learning velocity, not feature velocity.

    If you’ve ever felt the tension between shipping fast and shipping right, this approach resolves it. Pretotype to prove the market; prototype to perfect the solution. Do that consistently and you’ll spend more time delivering outcomes customers value—and far less time debating outputs.


    Inspired by this post on Product School.


    Book a consult png image
  • How AI Product Leaders Drive Better Products: My Take on Amplitude’s Mission and Impact

    How AI Product Leaders Drive Better Products: My Take on Amplitude’s Mission and Impact

    I’m constantly studying how AI is elevating product organizations, and Amplitude offers a compelling example of how to turn data into durable, customer-centered outcomes.

    Spencer Whittaker is a senior AI product manager at Amplitude. He focuses on using AI to advance Amplitude's mission of helping companies build better products.

    From my vantage point leading product teams, that focus translates into practical AI Strategy across behavioral analytics and Amplitude analytics: turning raw event streams into decision-ready insights that accelerate product-led growth and continuous discovery.

    In my own roadmap reviews, the highest-impact patterns are consistent: pair A/B testing with eval-driven development, coach PMs on LLMs for product managers to sharpen problem framing, and amplify signal quality through thoughtful instrumentation and journey mapping. When these practices come together, empowered product teams ship with confidence and reduce time-to-learning.

    Equally important are the guardrails: clear build vs buy criteria for gen ai components, privacy-by-design and data governance from day one, and a crisp measurement model that ties experiments to activation, retention analysis, and customer success outcomes.

    Practically, this means instrumenting hypotheses with the right metrics, setting a minimum detectable effect (MDE) where relevant, and looping insights back into the opportunity solution tree so the next sprint is smarter than the last. This disciplined rhythm separates hype from durable value.

    Seeing peers push this mission forward reinforces a core belief of mine: when AI helps teams find the right problems faster, we build products people truly love—and we do it responsibly, repeatably, and at scale.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image