Tag: behavioral analytics

  • Inside AI Product Management at Amplitude: How Leaders Turn Data into Better Products

    Inside AI Product Management at Amplitude: How Leaders Turn Data into Better Products

    When I think about the impact of AI on product management, one line sums it up for me: "Spencer Whittaker is a senior AI product manager at Amplitude. He focuses on using AI to advance Amplitude's mission of helping companies build better products." That focus on outcomes reflects how I frame AI Strategy—grounding every model and workflow in customer value and product-led growth.

    In practice, that means pairing Amplitude analytics and behavioral analytics with A/B testing and continuous discovery. I lean on eval-driven development to keep models honest, and I coach LLMs for product managers techniques so teams can prototype safely while we protect signal. Using a unified analytics platform clarifies what to build next and how to iterate faster.

    On teams I lead, product discovery stays tightly coupled to AI workflows: we map hypotheses to metrics, design experiments, and close the loop with instrumentation before we ship. That discipline turns AI from a demo into durable value, accelerating activation, retention, and feature adoption without sacrificing quality. A pragmatic AI product toolbox keeps us focused on measurable outcomes, not just novel capabilities.

    If you’re building with AI today, take a page from leaders pushing the craft forward: start with clear outcomes, connect your data in a unified analytics platform, and let A/B testing and continuous discovery guide your roadmap. With the right foundations—Amplitude analytics, behavioral analytics, and a sharp AI Strategy—you’ll transform insight into impact and build better products, faster.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • What’s New with Amplitude Agents: Faster Releases, Smarter Insights, and Must‑Try Upgrades

    What’s New with Amplitude Agents: Faster Releases, Smarter Insights, and Must‑Try Upgrades

    I’ve been deep in the work of turning agentic AI from a promising idea into reliable, measurable outcomes. Today, I want to share a concise, practitioner’s update on what’s new with Amplitude Agents—and, more importantly, how to get real value fast using proven product management techniques.

    We launched AI Agents a few weeks ago. We’ve been shipping pretty fast since then, so we wanted to loop you in on what’s new and what’s worth trying.

    Rapid releases only matter if they translate into user value. My approach is to treat every agent improvement as a learning opportunity: instrument it, set clear success metrics, run controlled experiments, and iterate. This eval-driven development mindset keeps us honest about what’s truly working in the wild.

    If you’re trying Amplitude Agents now, start with a narrowly scoped, high-signal workflow where success is unambiguous—think a single journey with a clear “done” state. Connect the experience to your unified analytics platform so you can see the full picture across events, funnels, and cohorts. In practice, I lean on Amplitude analytics and Agent Analytics to make this visibility effortless.

    Define how you’ll measure impact before you ship. Identify activation and completion events, baseline them, and then A/B test your agentic AI flow against the status quo. Behavioral analytics will show whether users are discovering the agent, sticking with it, and returning for more. When the story in the data is clean, it’s much easier to scale the win.

    Hardening matters as much as headlines. As you expand use, apply sensible guardrails—input validation, clear prompts, and transparent handoffs to deterministic flows when confidence is low. Pair this with observability so you can spot anomalies early and recover gracefully. These practices reduce risk while preserving the speed and creativity that make AI workflows powerful.

    Once the basics are working, dig into adoption patterns: segment by cohort, study user activation paths, and run retention analysis to find where the agent is truly changing behavior. These insights shape roadmap priorities and help you invest in the moments that drive durable value.

    We’ll keep shipping quickly and sharing practical guidance. If you have feedback, experiments to showcase, or questions about instrumentation, send them our way—I use that signal to refine our next set of improvements and learning agendas. Expect more short, focused updates and deeper dives on evaluation frameworks, prompt strategies, and rollout playbooks.

    In short: keep it scoped, instrument everything, test deliberately, and let the data guide your next move. That’s how Amplitude Agents becomes not just new, but indispensable.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Why Your Product Needs a Smarter Support Agent: Data-Driven, Agentic AI That Truly Helps

    Why Your Product Needs a Smarter Support Agent: Data-Driven, Agentic AI That Truly Helps

    Your product deserves a support experience that does more than point users to a help article. In my work leading product teams, I’ve seen how an intelligent, in-product assistant can reduce friction, accelerate user activation, and create the kind of product-led growth that traditional support channels struggle to deliver. The bar is higher now: customers expect immediate, context-aware help that feels proactive, measurable, and trustworthy.

    When I evaluate support solutions, I look for three capabilities: an assistant that truly knows the user’s context, can act on their behalf to resolve issues end-to-end, and can prove the impact with rigorous measurement. Anything less is just another interface to your knowledge base. The shift to agentic AI makes this possible—if it’s grounded in behavioral analytics and integrated with your unified analytics platform.

    Learn more about Amplitude AI Assistant. Our in-product support agent knows your users, acts on their behalf, and measures whether it actually helped.

    That promise resonates with how I design AI Strategy: start with data fidelity, not dialog. When an assistant is wired into Amplitude analytics and behavioral analytics, it can understand where a user is in the journey, the features they have (or haven’t) adopted, and which nudges or in-app guides historically drive success. This is the foundation for precise, contextual help—surfacing the right product tours at the right moments and removing guesswork.

    Knowing users isn’t enough; the assistant must act. With agentic AI, the assistant can execute safe, auditable steps on a user’s behalf—updating settings, triggering a workflow, or guiding a multi-step configuration—rather than handing off a to-do back to the customer. Done well, this reduces time-to-value and support tickets while aligning with a thoughtful customer support ai strategy that respects permissions, privacy-by-design, and clear guardrails.

    Equally important is measurement. I expect every AI touchpoint to demonstrate lift: faster time-to-resolution, higher feature adoption, improved retention, and lower churn. This is where robust A/B testing, Agent Analytics, and retention analysis come in—so we can quantify the assistant’s contribution against meaningful product outcomes, not vanity metrics. If we can’t measure it, we can’t manage it.

    Operationally, I advise teams to pilot with narrowly scoped, high-impact journeys and iterate with tight feedback loops. Instrument the assistant’s actions and outcomes, set minimum detectable effect thresholds for experiments, and continually refine prompts and playbooks. Tie insights back to your unified analytics platform so learnings inform roadmap choices and reinforce a durable product-led growth motion.

    In short, the next generation of in-product support will be built on data-rich context, agentic execution, and rigorous proof of value. That’s the standard I hold my teams to—and the experience users deserve when they ask for help.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • How AI Product Leaders Drive Better Products: My Take on Amplitude’s Mission and Impact

    How AI Product Leaders Drive Better Products: My Take on Amplitude’s Mission and Impact

    I’m constantly studying how AI is elevating product organizations, and Amplitude offers a compelling example of how to turn data into durable, customer-centered outcomes.

    Spencer Whittaker is a senior AI product manager at Amplitude. He focuses on using AI to advance Amplitude's mission of helping companies build better products.

    From my vantage point leading product teams, that focus translates into practical AI Strategy across behavioral analytics and Amplitude analytics: turning raw event streams into decision-ready insights that accelerate product-led growth and continuous discovery.

    In my own roadmap reviews, the highest-impact patterns are consistent: pair A/B testing with eval-driven development, coach PMs on LLMs for product managers to sharpen problem framing, and amplify signal quality through thoughtful instrumentation and journey mapping. When these practices come together, empowered product teams ship with confidence and reduce time-to-learning.

    Equally important are the guardrails: clear build vs buy criteria for gen ai components, privacy-by-design and data governance from day one, and a crisp measurement model that ties experiments to activation, retention analysis, and customer success outcomes.

    Practically, this means instrumenting hypotheses with the right metrics, setting a minimum detectable effect (MDE) where relevant, and looping insights back into the opportunity solution tree so the next sprint is smarter than the last. This disciplined rhythm separates hype from durable value.

    Seeing peers push this mission forward reinforces a core belief of mine: when AI helps teams find the right problems faster, we build products people truly love—and we do it responsibly, repeatably, and at scale.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Build vs. Buy for Churn Prediction: My Proven Playbook for Faster Retention and ROI

    Build vs. Buy for Churn Prediction: My Proven Playbook for Faster Retention and ROI

    Churn is the silent tax on growth, and I treat churn prediction as a core product capability—not a side project. Over the years, I’ve led teams through multiple implementations across different data maturities and go-to-market motions, and the same question keeps returning at kickoff: what’s the smartest path to impact now and defensibility later?

    “Should you build or buy your churn prediction model?” The right answer depends on time-to-value, data readiness, available talent, and whether churn prediction is a true differentiator for your product strategy or simply a must-have capability to power customer success and product-led growth.

    When speed and coverage matter most, I start by evaluating category platforms that pair behavioral analytics with activation. As one example, vendors emphasize immediate business outcomes such as integrations, in-app guides, and workflow triggers that help you act on risk signals fast—without waiting months for model training or data engineering.

    Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.

    Buying makes sense when you need rapid time-to-value, opinionated best practices, and a unified analytics platform to operationalize insights through product tours, in-app guides, and CRM integration. In these cases, I’m optimizing for coverage, consistent signal quality, and ease of activation for customer success—so the team can focus on interventions, not infrastructure.

    Building is compelling when churn prediction is a source of competitive differentiation or you have proprietary signals others can’t access. If your product generates unique behavioral data, requires custom anomaly detection or explainability constraints, or must blend usage telemetry with domain-specific risk scoring, a tailored model can raise precision and unlock novel retention levers.

    My hybrid approach has become a reliable playbook: buy first to establish a strong baseline and close the activation loop, then selectively build where proprietary data and context yield outsized gains. I use retention analysis to identify high-signal behaviors, then iterate with A/B testing and a clear minimum detectable effect (MDE) to validate uplift before committing engineering capacity.

    Total cost of ownership is non-negotiable. I account for more than license or training costs: ongoing data engineering, feature pipeline maintenance, model monitoring for drift, and AI risk management all add up. Strong data governance, privacy-by-design, and regulatory compliance must be baked in—whether I build, buy, or blend both.

    Activation determines real ROI. Predictions that don’t flow into customer success workflows, lifecycle messaging, or in-product nudges rarely move Net Recurring Revenue (NRR). I prioritize tight integrations that enable targeted experiments—journey mapping, contextual tooltips, and timely outreach—to reduce friction and increase user engagement at the moments that matter.

    My quick decision test: buy if time-to-value and adoption are the immediate goals; build if proprietary signals and explainability are core strategic assets; blend if you want fast wins now with room to differentiate later. Answering the build vs. buy question through this lens consistently improves retention, accelerates product-led growth, and keeps teams focused on the customer experience rather than plumbing.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image
  • Inside Amplitude’s AI Platform: Powerful Lessons for Product Leaders Shaping Analytics

    Inside Amplitude’s AI Platform: Powerful Lessons for Product Leaders Shaping Analytics

    Every so often, a single line captures the essence of platform thinking at scale. "Vinay is a Staff AI Engineer at Amplitude. He builds the foundational AI platforms that empower internal innovation and help define the future of AI analytics." That statement crystallizes the mandate many of us share: create durable AI capabilities that compound value across teams, products, and customers.

    When I think about "foundational AI platforms" in the context of Amplitude analytics and behavioral analytics, I see more than infrastructure. I see a product strategy choice: invest in a unified analytics platform that lowers the cost of experimentation, increases the trustworthiness of insights, and speeds time-to-learning for empowered product teams. That’s the engine behind sustainable product-led growth.

    For me, the platform blueprint starts with three layers: high-quality data foundations (schema design, governance, lineage), model lifecycle rigor (evaluation, observability, versioning), and safe, self-serve interfaces that meet teams where they work. Without strong data governance and clear accountability, even the smartest gen ai features struggle to gain adoption. With them, platform scalability and reliability become a competitive advantage—not just an operational checkbox.

    Empowering internal innovation requires thoughtful constraints. I’ve seen the best teams pair self-serve tooling with guardrails: templates for use cases, bias and risk checks, and well-documented pathways from prototype to production. This balance turns AI Strategy from a slide into a system—one that helps teams decide when to build vs buy, how to measure value, and how to retire what no longer serves the roadmap.

    Looking ahead, the future of AI analytics is about making intelligence ambient. That means stitching together event data, product usage, and customer context so insights surface exactly when decisions are made. It also means bringing gen ai responsibly into the workflow—summarizing behavior, explaining anomalies, and suggesting next best actions—while maintaining transparency and auditability.

    My practical takeaways: invest early in shared components that everyone can use (feature stores, evaluation harnesses, data contracts); standardize interfaces so teams ship faster with fewer handoffs; and measure platform outcomes with product metrics, not just infrastructure metrics. Done well, this approach compounds: faster cycles, higher confidence, and a steady drumbeat of wins that reinforce a culture of learning.

    In short, building the right AI foundations is how we unlock scale, create leverage for every team, and keep our edge in a dynamic market. That one line about building foundational AI platforms isn’t just a role description—it’s a north star for any product leader serious about shaping the next era of analytics.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • How We Taught Agentic AI to Speak Product Analytics—and Unlocked Actionable Insights

    How We Taught Agentic AI to Speak Product Analytics—and Unlocked Actionable Insights

    I set out to solve a deceptively simple problem: help our teams ask product questions in plain English and get trustworthy, analysis-grade answers—fast. That required more than a powerful model; it demanded agents that genuinely understand the language of product analytics, from behavioral analytics nuances to the messy reality of event taxonomies, funnels, and cohorts. In this post, I share how we engineered agentic AI that speaks our domain fluently and turns questions into decisions.

    The core challenge wasn’t data volume or dashboard sprawl; it was semantics. Different teams said “activation,” “onboarding,” or “first value” and meant overlapping but distinct things. Our PMs, analysts, and engineers navigated a maze of synonyms across Amplitude analytics, Pendo, and our unified analytics platform. Generic LLMs stumbled on these nuances, so we built a shared ontology—driver trees anchored to a clear North Star—with canonical definitions for activation, retention, and conversion, plus consistent event naming and cohort logic.

    We started with a rigorous metric catalog: every KPI linked to its drivers, exact formulas, cohorts, and time windows; every event mapped to a product taxonomy; every dashboard and SQL snippet versioned with ownership and lineage. That catalog became the ground truth for agents. We embedded data governance and privacy-by-design from the start—permissioning for fields and queries, PII redaction, and scoped access that reflected how product teams actually work.

    Next, we built a retrieval-first pipeline to ground the agents in our corpus before generation. We indexed metric definitions, dashboards, experiment readouts, runbooks, and high-signal Slack threads so the agent could cite relevant artifacts, not just predict plausible text. With careful context window management and prompt engineering, the agent retrieves definitions and prior analyses, then plans multi-step actions: run a query, compare cohorts, check “minimum detectable effect (MDE)” for an A/B test, and summarize findings with references.

    Architecturally, we treated this as “Agent Analytics”: an orchestrator that selects tools based on intent—querying Amplitude analytics or Pendo for behavioral paths and funnels, hitting our warehouse for cohort tables, or pulling experiment metadata and anomaly detection alerts. Tool use is permission-aware, auditable, and designed to fail safe. The agent’s outputs include citations back to the exact definitions, dashboards, and SQL used, so reviewers can validate and iterate.

    Quality came from eval-driven development, not intuition. We built a gold set of representative product questions (activation inflections, retention analysis by segment, funnel drop-offs after feature launches) and scored the agent on faithfulness to definitions, numerical accuracy, latency, and actionability. We incorporated regression checks to catch drifts after schema changes, and we tuned prompts to reduce overconfident answers and push for clarifying questions when context was missing.

    Safety and reliability were non-negotiable. We layered AI risk management with role-based access, guardrails that block destructive queries, and risk scoring for unfamiliar joins or sudden spikes in metric deltas. The agent logs every step—what it retrieved, which tools it called, and why—so analysts can replay and refine the chain of thought with transparent provenance.

    The payoff: product teams now self-serve nuanced questions in minutes instead of days, and our analysts spend more time on discovery than report wrangling. Retention analysis improved as the agent standardized cohort logic; conversion investigations accelerated thanks to consistent funnel definitions; and cross-functional decisions aligned around the same driver trees and shared language. Most importantly, the agent turned ambiguous asks into structured analyses that stand up to scrutiny.

    For fellow product leaders, my lesson is simple: start with semantics, not models. A crisp ontology, disciplined taxonomy, and clear ownership will outperform a flashy stack riddled with ambiguity. Avoid technology FOMO; favor retrieval-first grounding, small sharp tools, and continuous discovery with your product trios. When your organization speaks a common analytics language, agents can finally think with you, not just for you.

    Next, we’re extending the agent’s planning skills to recommend experiment designs, estimate power and “minimum detectable effect (MDE),” and propose driver-tree-informed bet sizing. We’re also tightening feedback loops so every accepted answer, edit, or override strengthens the retrieval corpus and evaluations. The vision: a calm, reliable layer that makes rigorous product analytics feel conversational—and helps teams move from questions to confident action.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • How We Taught Agentic AI to Speak Product Analytics—and Unlocked Actionable Insights

    How We Taught Agentic AI to Speak Product Analytics—and Unlocked Actionable Insights

    I set out to solve a deceptively simple problem: help our teams ask product questions in plain English and get trustworthy, analysis-grade answers—fast. That required more than a powerful model; it demanded agents that genuinely understand the language of product analytics, from behavioral analytics nuances to the messy reality of event taxonomies, funnels, and cohorts. In this post, I share how we engineered agentic AI that speaks our domain fluently and turns questions into decisions.

    The core challenge wasn’t data volume or dashboard sprawl; it was semantics. Different teams said “activation,” “onboarding,” or “first value” and meant overlapping but distinct things. Our PMs, analysts, and engineers navigated a maze of synonyms across Amplitude analytics, Pendo, and our unified analytics platform. Generic LLMs stumbled on these nuances, so we built a shared ontology—driver trees anchored to a clear North Star—with canonical definitions for activation, retention, and conversion, plus consistent event naming and cohort logic.

    We started with a rigorous metric catalog: every KPI linked to its drivers, exact formulas, cohorts, and time windows; every event mapped to a product taxonomy; every dashboard and SQL snippet versioned with ownership and lineage. That catalog became the ground truth for agents. We embedded data governance and privacy-by-design from the start—permissioning for fields and queries, PII redaction, and scoped access that reflected how product teams actually work.

    Next, we built a retrieval-first pipeline to ground the agents in our corpus before generation. We indexed metric definitions, dashboards, experiment readouts, runbooks, and high-signal Slack threads so the agent could cite relevant artifacts, not just predict plausible text. With careful context window management and prompt engineering, the agent retrieves definitions and prior analyses, then plans multi-step actions: run a query, compare cohorts, check “minimum detectable effect (MDE)” for an A/B test, and summarize findings with references.

    Architecturally, we treated this as “Agent Analytics”: an orchestrator that selects tools based on intent—querying Amplitude analytics or Pendo for behavioral paths and funnels, hitting our warehouse for cohort tables, or pulling experiment metadata and anomaly detection alerts. Tool use is permission-aware, auditable, and designed to fail safe. The agent’s outputs include citations back to the exact definitions, dashboards, and SQL used, so reviewers can validate and iterate.

    Quality came from eval-driven development, not intuition. We built a gold set of representative product questions (activation inflections, retention analysis by segment, funnel drop-offs after feature launches) and scored the agent on faithfulness to definitions, numerical accuracy, latency, and actionability. We incorporated regression checks to catch drifts after schema changes, and we tuned prompts to reduce overconfident answers and push for clarifying questions when context was missing.

    Safety and reliability were non-negotiable. We layered AI risk management with role-based access, guardrails that block destructive queries, and risk scoring for unfamiliar joins or sudden spikes in metric deltas. The agent logs every step—what it retrieved, which tools it called, and why—so analysts can replay and refine the chain of thought with transparent provenance.

    The payoff: product teams now self-serve nuanced questions in minutes instead of days, and our analysts spend more time on discovery than report wrangling. Retention analysis improved as the agent standardized cohort logic; conversion investigations accelerated thanks to consistent funnel definitions; and cross-functional decisions aligned around the same driver trees and shared language. Most importantly, the agent turned ambiguous asks into structured analyses that stand up to scrutiny.

    For fellow product leaders, my lesson is simple: start with semantics, not models. A crisp ontology, disciplined taxonomy, and clear ownership will outperform a flashy stack riddled with ambiguity. Avoid technology FOMO; favor retrieval-first grounding, small sharp tools, and continuous discovery with your product trios. When your organization speaks a common analytics language, agents can finally think with you, not just for you.

    Next, we’re extending the agent’s planning skills to recommend experiment designs, estimate power and “minimum detectable effect (MDE),” and propose driver-tree-informed bet sizing. We’re also tightening feedback loops so every accepted answer, edit, or override strengthens the retrieval corpus and evaluations. The vision: a calm, reliable layer that makes rigorous product analytics feel conversational—and helps teams move from questions to confident action.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Stop Forcing AI to Prove ROI: A Product Leader’s Playbook to Measure Real Business Value

    Stop Forcing AI to Prove ROI: A Product Leader’s Playbook to Measure Real Business Value

    Every planning cycle, I feel the drumbeat: “Show me the AI ROI—this quarter.” The pressure is real, especially when boards and CFOs expect immediate payback. Yet when I review stalled initiatives across teams and peers, the pattern is consistent: most companies treat AI like a feature to ship, not a system to manage. That mindset almost guarantees we measure the wrong things, declare victory (or failure) too early, and miss the durable value AI can create.

    Here’s the core problem I see: we leap to solution and skip the counterfactual. Without a baseline, a clear control, or a defined “what would have happened otherwise,” we’re guessing. We also fixate on lagging, financial KPIs that move slowly (revenue, cost, risk), then use outputs—not outcomes—as OKRs. If we don’t align on outcomes vs output OKRs upfront, the best team in the world can still optimize for activity over impact.

    My AI Strategy starts from a simple truth: value shows up along three vectors—revenue, cost, and risk—on different timelines. In the near term, we must validate leading indicators (adoption, engagement, activation) that ladder to those vectors through a transparent driver tree. Over time, those drivers compound into the lagging KPIs finance cares about. When we make the driver tree explicit, everyone can see how model precision, response time, and workflow integration roll up to conversion lift, case deflection, time-to-resolution, or reduced exposure.

    To make this rigorous, I run a five-step playbook. First, define the decision and business outcome in plain terms. Second, instrument the baseline with behavioral analytics on a unified analytics platform—tools like Amplitude analytics or Pendo help expose friction points we’ll later target. Third, create a counterfactual using A/B testing and specify a minimum detectable effect (MDE) so we know how long to run and how much traffic we need. Fourth, quantify costs (training, inference, integration, change management) and include AI risk management, privacy-by-design, and data governance up front. Fifth, lock a measurement plan that connects leading indicators to lagging ROI through the driver tree.

    Most AI initiatives don’t fail on model quality—they fail on adoption. If the workflow isn’t smoother, trust isn’t earned, or value isn’t obvious, users revert. That’s why I invest early in onboarding, in-app guides, product tours, and thoughtful tooltip design to reduce the time-to-first-value. Then I watch user activation, retention analysis, and task completion to ensure the assistive experience is not just novel—it’s habit-forming.

    For generative use cases, eval-driven development is non-negotiable. I maintain offline evaluations for accuracy and safety, and online evaluations for business impact. Retrieval-first pipeline health, context window management, and prompt engineering affect reliability; so do latency and grounding quality. We ship behind feature flags, measure guardrail effectiveness, and tighten feedback loops from human-in-the-loop reviews into model updates—continuously.

    On the business side, I avoid “AI theater” by structuring benefits like a CFO. Revenue: increased conversion or expansion driven by better recommendations, faster sales cycles, or higher trial activation. Cost: case deflection, agent time saved, fewer escalations, and lower rework. Risk: reduced exposure via automated checks, anomaly detection, and consistent policy application. If any claim can’t be tied to measured deltas—via A/B testing or strong quasi-experiments—it doesn’t go in the deck.

    Build vs buy deserves the same discipline. I map platform scalability, governance requirements, and total cost of ownership against time-to-impact. Teams often underestimate integration and maintenance drag; a pragmatic mix of bought components with thin custom layers can accelerate outcomes while keeping options open. The goal isn’t to own every layer—it’s to own the learning loop and the differentiated experience.

    I also remind teams that tooling should serve the strategy, not replace it. I’ve seen concise, effective messaging that captures the point: “Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.” The words are compelling because they reflect the three-vector value model and the adoption imperative. The same standard should apply to any AI initiative we propose.

    If you’re under pressure to prove ROI, shift the conversation: lead with the driver tree, specify your counterfactual, and anchor on leading indicators you can move in weeks—not quarters. Then connect those to the lagging KPIs finance expects over time. When we manage AI like a product—grounded in evidence, experimentation, and user-centered adoption—we don’t have to force ROI. We compound it.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image
  • Stop Misleading A/B Tests: Master Sample Size Assumptions for Reliable Results

    Stop Misleading A/B Tests: Master Sample Size Assumptions for Reliable Results

    I’ve learned the hard way that sample size calculators can be both empowering and deceptive. They feel wonderfully precise, but they’re only as trustworthy as the assumptions you feed them. When I lead A/B testing at scale, I treat the calculator as a planning tool, not a verdict—then I systematically validate the assumptions behind it so our decisions stay rigorous and our roadmap stays credible.

    At a minimum, most calculators assume you know your baseline rate, your “minimum detectable effect (MDE),” your desired statistical power, and your significance level. They also quietly assume independent observations, clean randomization, stable traffic quality, and a fixed test horizon with no peeking. If any of those break, the “right” sample size can be wildly wrong—and the test conclusions can nudge teams toward the wrong product or go-to-market bet.

    Baseline and variance come first for me. I estimate the baseline conversion (and volatility) from recent behavior using behavioral analytics, sanity-check it across key segments, and look for seasonality. Tools like Amplitude analytics help me spot anomalies, bots, or instrumentation drift. If baseline is unstable or highly skewed, I either stabilize it with longer lookbacks or narrow the target segment to reduce noise.

    Setting the “minimum detectable effect (MDE)” is where product strategy meets statistics. I work backward from an outcome that actually matters: the revenue, retention, or activation uplift that justifies the opportunity cost of building and running the experiment. If that effect size is implausible given historic lift and variance, I rethink the scope or stack changes into a sequenced set of learning experiments rather than overpromising a single moonshot.

    For power and alpha, I default to 80–90% power and a 5% significance level unless the downside risk of a false positive is unusually high, in which case I tighten alpha. I choose one-tailed tests only when we would not act on a negative result and we’ve explicitly pre-registered that decision; otherwise, two-tailed is safer for real-world ambiguity.

    Randomization and independence are where many tests quietly fail. I randomize at the user level (not session or pageview), guard against cross-device contamination, and ensure consistent exposure via feature flags. If there’s shared context—say, team-based usage or geographic clustering—I account for it via cluster randomization or acknowledge the inflated variance it can introduce.

    Traffic allocation integrity is non-negotiable. I monitor for sample ratio mismatch by comparing observed group splits to the intended allocation and immediately pause if they drift. When SRM appears, the root cause is often instrumentation gaps, eligibility filters applied asymmetrically, or caching layers. Fixing that early preserves trust in every test that follows.

    Fixed-horizon math assumes no peeking. If stakeholders need continuous reads, I use sequential testing methods with alpha spending or always-valid approaches designed for ongoing monitoring. If we commit to a fixed horizon, we stay disciplined: no early looks, no midstream metric swaps, no retrofitted hypotheses.

    Multiple comparisons can quietly inflate false positives. I predeclare one primary metric to decide, define guardrail metrics to protect experience and revenue, and apply appropriate corrections (for example, controlling the false discovery rate) when testing many variants or slicing results by numerous segments.

    Duration and seasonality matter more than most roadmaps admit. I run through full business cycles (at least one complete week for daily patterns, longer for B2B buying rhythms), plan for novelty effects, and watch for behavior settling after initial exposure. If the intervention changes long-run behavior, I extend the measurement window or add a post-test holdout to capture durable impact.

    Not all metrics are binomial. For revenue, time-on-task, or heavy-tailed distributions, I confirm variance assumptions, use robust estimators or bootstrapping, and consider variance reduction methods like CUPED to improve power without overextending duration. The calculator’s simplicity should not mask the data’s complexity.

    Finally, I connect experimentation to product outcomes. I map hypotheses to a driver tree, ensure each test ladders to activation, retention, or monetization, and document assumptions up front so we learn even when results are null. The result is a culture that respects the math and moves faster precisely because we trust our reads.

    Here’s the practical checklist I use before pressing “Start”: validate baseline and variance from recent behavior; set an MDE tied to meaningful business impact; choose power and alpha explicitly; confirm user-level randomization and stable exposure; watch for sample ratio mismatch; align on fixed-horizon vs sequential testing; predeclare a single primary metric and guardrails; run long enough to cover seasonality; use robust methods for non-binomial metrics; and write a brief pre-read so the whole team commits to the plan.

    When we honor these assumptions, sample size calculators become sharp instruments rather than blunt ones. You’ll ship fewer misleading wins, avoid costly false negatives, and build a repeatable experimentation engine that compounds learning—and results—over time.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Inside Amplitude’s ML Playbook: Practical Strategies for Smarter A/B Tests and Growth

    Inside Amplitude’s ML Playbook: Practical Strategies for Smarter A/B Tests and Growth

    I’m continually asked how machine learning can make product analytics more actionable. Drawing from Amplitude analytics in real-world settings, I’ve distilled what matters most for product teams that want faster, smarter decisions without sacrificing rigor.

    When I design experiments, I start with minimum detectable effect (MDE) to size samples correctly and avoid costly, inconclusive tests. I pair that with disciplined A/B testing hygiene—clear hypotheses, thoughtful stop rules, and guardrails for key metrics—so results translate into credible product strategy choices instead of noisy dashboards.

    For growth and retention, I map behavioral analytics to activation and long-term value. Driver trees help me connect feature adoption to revenue or retention, and anomaly detection keeps me from overreacting to outliers when seasonality or data quality shift.

    I segment cohorts by user intent and lifecycle stage, measure user activation with crisp event definitions, and monitor leading indicators across a unified analytics platform. This keeps cross-functional conversations grounded, accelerates product-led growth, and reduces the risk of optimizing for vanity metrics.

    Operationally, that means building self-serve views that flag MDE-ready experiments, surface retention analysis by cohort, and trigger anomaly detection alerts only when the signal outpaces noise. The payoff is fewer meetings debating data quality and more time shipping value.

    If you’re leveling up your analytics stack, start by tightening experimentation basics, instrumenting activation and retention with behavioral analytics, and wiring in anomaly detection as a safety net. You won’t just move faster—you’ll learn faster, and with the confidence to bet big when the data earns your trust.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Unlock Confident Decisions with Bayesian Statistics: Smarter A/B Tests from Small Samples

    Unlock Confident Decisions with Bayesian Statistics: Smarter A/B Tests from Small Samples

    Shipping great products is a game of making high‑quality decisions under uncertainty. In my role leading product management, I’ve seen teams stall when classic methods demand huge sample sizes before we can say anything useful. Bayesian statistics has become my go‑to approach for turning sparse data into clear, decision‑ready insights—especially when traffic is limited or experimentation windows are tight.

    Understand Bayesian statistics vs. frequentist methods and learn how Bayesian approaches improve experiment insights with small sample sizes.

    Here’s why I rely on it in A/B testing: frequentist methods focus on p‑values and long‑run error rates, which are tough to translate into action. With a Bayesian lens, I can express outcomes as intuitive probabilities—“Variant B has a 92% chance to outperform A”—and use credible intervals to communicate likely ranges of impact. That clarity reduces decision friction and helps the team move faster with confidence.

    Bayesian methods shine when sample sizes are small and the minimum detectable effect (MDE) of a frequentist test would be impractically large. I incorporate prior knowledge—historical conversion trends, seasonality, and learnings from related experiments—to stabilize noisy early data. Done thoughtfully, priors improve estimate quality without overfitting; I always run sensitivity checks to ensure the posterior is driven by the data we’re observing, not wishful thinking.

    In practice, my workflow is straightforward. I set a prior from historical performance in Amplitude analytics, run the experiment, and update the posterior daily. I track the probability of superiority, expected lift, and a credible interval that the CRO role can rally around. When the probability of a meaningful win crosses a pre‑agreed threshold, we ship. When it doesn’t, we bank the learning and move on—no prolonged debates about p‑values that few stakeholders truly understand.

    This approach also strengthens product discovery. By using behavioral analytics and retention analysis as informative priors, I can evaluate early signals from narrower cohorts—new geographies, niche segments, or enterprise accounts—where traffic is scarce. The result is faster iteration in product‑led growth environments, even when a full‑funnel test would take weeks to reach frequentist significance.

    Operationally, I treat Bayesian experimentation as part of a unified analytics platform strategy. The same posterior machinery that powers A/B testing can support anomaly detection during releases, quantify risk in phased rollouts, and estimate lift from in‑app guides or product tours. Because results are framed in plain language probabilities, cross‑functional teams make better, faster decisions aligned to outcomes rather than outputs.

    A few guardrails keep me honest. I preregister decision rules (stop/go thresholds, guardrail metrics), run prior sensitivity analyses, and document assumptions alongside results. That discipline prevents overconfidence, improves reproducibility, and builds trust with leadership.

    If your experiments are bottlenecked by low traffic or you’re tired of waiting weeks for a binary “significant/not significant,” consider a Bayesian upgrade. You’ll get earlier readouts, clearer stakeholder communication, and a repeatable path to compounding learning—without sacrificing rigor.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image