Category: Product Management

  • AI Evals for Product Managers: How I Measure Agent Quality—A Beginner’s Playbook

    AI Evals for Product Managers: How I Measure Agent Quality—A Beginner’s Playbook

    I’ve led multiple AI agent launches, and the single most reliable way I’ve found to ship with confidence is to treat evaluations as a product capability, not a side project. When we make AI quality measurable, predictable, and comparable over time, we move faster, reduce risk, and build trust with customers and stakeholders.

    Learn how product managers use AI evaluations to measure agent quality. Covers traces, LLM judges, offline evals, online evals, and how to connect evals to product outcomes.

    Why does this matter so much in product management? Because agent quality is only meaningful when it drives adoption, satisfaction, and revenue. I use eval-driven development to align the day-to-day iteration of prompts, policies, and workflows with business outcomes like activation, retention, and Net Recurring Revenue (NRR). That alignment turns AI quality from an abstract notion into a roadmap lever.

    First, traces. Traces are the spine of evaluation for agentic AI: they capture inputs, intermediate steps, tools invoked, and final responses. I instrument traces to make reasoning visible—what the agent tried, where it hesitated, and why it chose a path. With that visibility, I can compare prompts, policies, and tools, and I can teach the team to fix the root cause instead of patching symptoms. This is also where Agent Analytics becomes real: we move from anecdotes to observable behavior trends across cohorts and use cases.

    Next, LLM judges. I use model-as-judge to score qualities like helpfulness, coherence, or adherence to brand and policy. The trick is calibration. I pair LLM judges with a small, high-quality human-labeled set to ground the scale, then monitor drift as models, prompts, or data shift. LLM judges help me evaluate at speed, but I still spot-check edge cases and highly regulated flows to balance efficiency with risk controls.

    Offline evals come first. Before I expose users to changes, I run fixed test suites representing core scenarios, failure modes, and edge cases. I include golden examples, adversarial prompts, and domain-specific queries. Metrics cover task success, factuality, safety, latency, and cost. This is where prompt engineering and retrieval quality are tuned; if I’m using a retrieval-first pipeline, I evaluate evidence quality separately from generation so improvements are attributable and reproducible.

    Online evals follow to validate real-world performance. I roll changes out behind feature flags and use A/B testing to compare variants under production conditions. I track conversation outcomes, tool success rates, fallbacks to human support, and user satisfaction. These online signals close the loop on whether an offline improvement actually compounds value in the product—critical for product-led growth.

    Connecting evals to product outcomes is non-negotiable. I map quality signals to a driver tree: from per-turn scores (helpfulness, safety, latency) up to session-level outcomes (task completion, deflection, revenue intent), and finally to product KPIs (activation, retention, NRR). With this structure, I can set thresholds for launch gates, prioritize roadmap items that move the biggest levers, and build dashboards that leadership understands at a glance.

    A few lessons learned. Start with a minimal but durable test set and grow it as you discover new failure modes. Version everything—prompts, tools, and datasets—so you can reproduce wins. Beware metric drift when you swap models or update prompts. Blend human review where the cost of error is high. Above all, make evaluations part of your AI workflows and sprint rituals so quality improves continuously, not sporadically.

    If you’re just getting started, begin with traces and a small offline suite, add LLM judges for scale, then prove impact with a focused online experiment. Within a few cycles, you’ll have a living evaluation system that guides decisions, accelerates delivery, and gives your team—and your customers—confidence in every AI release.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • From Solutions Engineering to PMM Leadership: Darshil Gandhi’s Playbook for Amplitude’s Edge

    From Solutions Engineering to PMM Leadership: Darshil Gandhi’s Playbook for Amplitude’s Edge

    I look for product marketing leaders who translate market noise into clear decisions that move roadmap, revenue, and relationships. In that context, Darshil Gandhi exemplifies how competitive rigor and technical depth can sharpen product strategy and accelerate go-to-market strategy across empowered product teams.

    Darshil leads competitive intelligence, partner product marketing and technical marketing at Amplitude. He is a former solutions engineering team principal.

    That blend matters: a solutions engineering mindset grounds messaging in real implementation details, while competitive intelligence and partner product marketing align product positioning, points of parity, and competitive differentiation with what buyers actually evaluate. At a company centered on Amplitude analytics, that cross-functional view helps transform behavioral data into a crisp value proposition customers can feel in evaluations and expansions.

    In practice, I prioritize a few patterns when partnering with leaders who span these domains: align on a single competitive narrative using driver trees that connect capabilities to outcomes; use Amplitude analytics to validate claims and win themes; co-create partner playbooks that make integrations repeatable; and ensure technical marketing closes the loop by pressure-testing demos, docs-as-code, and reference architectures with field feedback. This strengthens stakeholder management across sales, solutions engineering, and product trios, reducing ambiguity and speeding decisions.

    The net effect is clarity: sharper differentiation in the field, cleaner handoffs between teams, and faster feedback cycles that de-risk launches. It’s a model I trust when stakes are high—use the truth of implementation to tell a compelling story, then let the market confirm it.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • We Open-Sourced Our AI Skills Library: Reusable Skills to Supercharge Product Velocity

    We Open-Sourced Our AI Skills Library: Reusable Skills to Supercharge Product Velocity

    We open-sourced our AI Skills library. Here's what we built, why we built it, and how to use it. I’m sharing the approach we’ve used to move faster with more confidence across product discovery, prototyping, and production—while keeping governance, safety, and measurement front and center.

    What we built is a modular, open-source library of “skills” for agentic AI and LLM-powered workflows—things like retrieval and grounding, summarization, classification, tool-use, data enrichment, safety guardrails, and evaluation harnesses. Each skill follows consistent interfaces and conventions so teams can compose them like building blocks, swap implementations without breaking flows, and standardize best practices across products.

    Why we built it is simple: we kept rebuilding the same core capabilities across experiments and teams. Standardizing these skills accelerates time-to-value, reduces integration risk, and helps product trios collaborate with a common language. It also lets us scale what works—prompt patterns, eval datasets, telemetry—so every new initiative starts on third base instead of at bat.

    How to use it in practice: start by running a quick-start example to see a baseline skill chain in action. Then compose your own flow by selecting skills (for example, retrieval + summarization + tool call), configure them with environment variables and guardrails, and wire in evaluation datasets. From there, instrument the pipeline with metrics so you can compare variants and promote the best-performing chain to your main app or API.

    In a typical stack, the library dovetails with analytics and experimentation: ship skill variants behind feature flags, measure impact with A/B testing, and observe runtime behavior with logs and traces. CI/CD hooks let you run evals pre-merge, and production dashboards keep an eye on latency, cost, and outcome quality. This creates a virtuous loop where ideas move from prototype to production with clear evidence.

    Common use cases include customer support summarization and triage, lead scoring and enrichment, anomaly detection in product telemetry, and automated content workflows. Because the skills are composable, you can try multiple retrieval-first strategies, swap prompt templates, or add tools (search, RAG, calculators, connectors) without rewriting everything from scratch.

    Governance and safety are built in. Guardrails handle PII redaction, content policy checks, and rate limiting; configs make it easy to enforce privacy-by-design; and evaluation harnesses encourage an eval-driven development culture. The result is faster iteration without sacrificing data governance or reliability.

    If you want to contribute, add a new skill, improve prompts, share eval datasets, or open an issue with a scenario you want supported. The roadmap focuses on richer retrieval adapters, better test fixtures, and deeper observability so teams can debug and optimize complex chains with confidence.

    I’m excited to see how you’ll use the library to accelerate your roadmap. Clone it, run a quick start, and compose your first workflow today—then measure, iterate, and scale what works. I’ll keep sharing patterns, learnings, and updates as we grow the skills catalog and sharpen the tooling.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Director of Product, Growth & AI at Amplitude: My Playbook for Viral Growth and Engagement

    Director of Product, Growth & AI at Amplitude: My Playbook for Viral Growth and Engagement

    I see the Director of Product, Growth & AI at Amplitude as a mandate to operationalize "viral and core growth strategies, user acquisition, and product engagement" with precision. From my vantage point, that means building a rigorous, metrics-first operating system grounded in Amplitude analytics and product-led growth principles, then layering in an AI Strategy that personalizes experiences without sacrificing control or safety.

    I start by defining a clear North Star Metric and mapping a driver tree to expose causal levers across acquisition, activation, engagement, retention, and monetization. With behavioral analytics and cohort analysis, I quantify which user behaviors correlate with long-term value. I operationalize rapid experimentation through A/B testing with sensible minimum detectable effect (MDE) thresholds, guardrail metrics, and sequential testing to ensure we move fast while preserving measurement integrity.

    For "viral and core growth strategies," I lean on durable growth loops more than one-off hacks. Viral loops might include collaboration invites, user-generated content, and shareable artifacts that make the product more valuable as it spreads. Core growth centers on frictionless activation: guided onboarding, in-app guides, product tours, progressive disclosure, and judicious tooltip design that connects users to the ‘aha’ moment quickly. Session replay and funnel instrumentation help isolate friction and systematically remove it.

    On user acquisition, I connect performance channels and go-to-market strategy tightly to in-product activation. Rather than optimizing for clicks, I optimize for post-signup behaviors that predict retention. This includes improving landing page-message-product congruence, refining qualification (so top-of-funnel aligns with downstream value), and orchestrating lifecycle messaging that nudges users toward key activation milestones.

    To deepen product engagement, I focus on leading indicators of retention and feature adoption. I segment by jobs-to-be-done and intent, then personalize in-app prompts to surface the right capability at the right moment. Retention analysis, pathing, and funnel breakouts inform which nudges to deploy and where—whether that’s smarter checklists, contextual education, or lightweight in-product interventions that turn sporadic usage into reliable habits.

    AI raises the ceiling on what’s possible here. With a thoughtful AI Strategy, I use gen ai to personalize onboarding flows, recommend next-best actions based on behavioral signals, and summarize complex activity patterns into actionable insights for the team. I maintain strict measurement: every AI intervention ships behind feature flags, is evaluated through controlled experiments, and adheres to privacy-by-design principles. The outcome is a system that learns continuously while staying aligned to business and user outcomes.

    Execution is where strategy becomes real. I rely on empowered product trios, continuous discovery with customers, and outcome-focused roadmaps that tie directly to the driver tree. This keeps the organization moving in sync: engineering prioritizes the highest-signal experiments, design accelerates comprehension and task success, and product ensures each release strengthens the core loop rather than adding ornamental features.

    Ultimately, the blueprint is simple and disciplined: anchor on "viral and core growth strategies, user acquisition, and product engagement," quantify what matters with behavioral analytics, and iterate through well-instrumented experiments. Combine that with targeted AI augmentation, and you create a compounding growth engine that is both measurable and resilient.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Supercharge Insights with Amplitude Agent Connectors: Connect Notion, Slack, Linear & More

    Supercharge Insights with Amplitude Agent Connectors: Connect Notion, Slack, Linear & More

    I’ve led enough multi-tool product organizations to know how quickly momentum erodes when insights and actions live in different places. When my teams bounce between Notion, Atlassian, Slack, Linear, and analytics dashboards, we pay a real tax in context switching. That’s why I’m excited about what Amplitude is enabling with Agent Connectors—bringing our daily work and our data-driven decisions into one fluid, agentic AI workflow.

    Connect Notion, Atlassian, Slack, Linear, and more to Amplitude's Global Agent. Get richer analysis and take action across tools without leaving Amplitude.

    Practically, this means I can treat Amplitude analytics as a unified analytics platform where analysis and execution finally meet. Instead of exporting charts or copying insights into docs, I can drive Agent Analytics directly from the same surface where I manage behavioral analytics, reducing friction and accelerating decisions. For my product strategy, that’s a meaningful shift—from “insight later” to “insight-to-action now.”

    Here’s how I’d use it on a typical day: I ask the agent to synthesize signals from recent feature usage, spotlight anomalies, and then draft a concise summary for our Slack channel. In the same flow, I can prompt it to reference our Notion specs for context and queue next steps in Linear, keeping Atlassian stakeholders looped in without any extra swiveling between tabs. The value isn’t just faster execution; it’s tighter alignment across teams because the analysis and the plan live together.

    From an operating model perspective, this is how I scale AI workflows responsibly. I can define clear prompts, approval paths, and ownership so the agent augments—not replaces—expert judgment. Data governance and permissions remain front and center: the agent sees what your teams are allowed to see, and we maintain auditability on critical workflow steps. The outcome is a trustworthy, repeatable system that compounds learning over time.

    If you’re exploring agentic AI for product teams, start small and instrument your ROI. Pick one or two connectors (Slack and Notion are great first choices), define a measurable workflow—like pushing weekly retention insights and creating prioritized follow-ups in Linear—and iterate using continuous discovery. In my experience, the first wins appear as reduced time-to-insight, fewer meetings to align, and faster cycle time from observation to shipped change.

    The big picture is simple: bring your work to your analytics, and your analytics to your work. With Agent Connectors, Amplitude’s Global Agent helps close the loop from understanding behavior to taking action—without leaving the place where your insights are born.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Broken Procurement Is Costing You Talent: A Product Leader’s Playbook for Speed and Sanity

    Broken Procurement Is Costing You Talent: A Product Leader’s Playbook for Speed and Sanity

    Procurement should accelerate value, not suffocate it. Listening to this episode, I found myself nodding (and wincing) through a painfully familiar story about how well-intended controls morph into barriers that keep great expertise out. As a product leader responsible for speed, outcomes, and brand experience, I see procurement as a direct mirror of culture—and an often overlooked part of the product operating system.

    In the conversation, Teresa is cranky—and honestly, she has every right to be. She’s simultaneously juggling seven speaking engagement contracts, and six of them have become a part-time job in themselves—think 80-page ethics policies, 800-question security forms, and Multi-Factor Authentication (MFA) questions asked 17 different times. Meanwhile, the one company that just put her fee on a credit card? Scheduled, confirmed, and done in two weeks. That contrast is the whole story: friction repels talent; clarity and simplicity attract it.

    Petra adds her own horror story—filling out 12 identical Word document forms—and together they surface a deeper truth I’ve seen across organizations: broken vendor processes don’t just frustrate consultants; they stop companies from getting the expertise they actually need. And despite what many assume, company size isn’t the deciding factor—leadership intent and process ownership are.

    If you’ve ever wondered why a training got canceled, why a speaker backed out, or why your team can’t seem to bring in outside experts, this is likely the culprit: procurement theater. Repetitive forms, unbounded scope creep, and sprawling security reviews create drag that outlasts any short-term legal or compliance gain. The opportunity cost—lost learning, slower progress, and talent that simply says no—is enormous.

    One detail that stood out: with CEO-level buy-in, a legal review timeline collapsed from four months to 10 days. I’ve seen the same thing. Executive sponsorship is the fastest procurement tool there is, and it reveals what the organization truly values. If you can compress the path when a leader cares, you can redesign the path so it’s always faster—without compromising real risk management.

    I also loved the clarity of a simple policy from the episode: Teresa’s new policy is straightforward—her paperwork, credit card payment, no vendor setup—or no speaking engagement. That’s not obstinance; it’s a bright-line test for whether an organization respects expert time and understands total cost. The best experts have options, and friction filters them out first.

    Here’s how I operationalize this in product-led organizations. Tier risk by engagement type (e.g., one-hour talk vs. long-term software vendor) and match the process to the risk. Offer a credit-card fast lane with standard, plain-English terms for low-risk work. Eliminate duplicate data entry and kill redundant questionnaires. Use a single, secure intake that auto-fills known fields. Track cycle time end to end, and publish SLAs for legal, InfoSec, and finance. Most importantly, make vendor experience a first-class metric—because it is a brand experience.

    Security and compliance matter, but they must be right-sized. If you’re buying a keynote, you’re not buying data processing—so why the 800-question security review? Calibrate controls to actual data access and system interaction. The episode even references AWS DynamoDB and GuardDuty, plus Claude Code—helpful reminders that your stack context matters, but not every purchase touches it. Don’t conflate deep technical diligence for a SaaS integration with a simple, no-data engagement.

    There’s a reason the classic film Office Space gets a nod—it’s the perfect metaphor for what happens when well-meaning governance calcifies. Bureaucracy compounds over time, usually after adverse events, until startups—or any team that still moves fast—run circles around you. Procurement that treats experts like adversaries won’t win the race that actually matters: learning faster than the market.

    If you want the full story, listen to the episode here: Spotify (https://open.spotify.com/episode/2JHnTvnZX2WcFczml7ozKY?ref=producttalk.org) | Apple Podcasts (https://podcasts.apple.com/kh/podcast/procurement/id1794203808?i=1000770701690&ref=producttalk.org). It’s cathartic, but more importantly, it’s a blueprint for fixing what’s broken.

    Mentioned in the episode: Hire Teresa to Speak (https://www.producttalk.org/hire-teresa-to-speak/), AWS DynamoDB (https://aws.amazon.com/dynamodb/?ref=producttalk.org), GuardDuty (https://aws.amazon.com/guardduty/?ref=producttalk.org), Claude Code (https://www.claude.com/product/claude-code?ref=producttalk.org), and Office Space (https://en.wikipedia.org/wiki/Office_Space?ref=producttalk.org).

    I’d love to hear your experiences and fixes. Where does your procurement flow break, how do you measure cycle time today, and what would it take to create a vendor experience you’d be proud to put your brand on? Drop your thoughts below and let’s trade playbooks.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Decode How Amplitude AI Thinks: Proven Workflows to Get Actionable, High-Accuracy Results

    Decode How Amplitude AI Thinks: Proven Workflows to Get Actionable, High-Accuracy Results

    I’ve learned that the fastest way to unlock better AI outcomes is to understand how the system reasons, then partner with it deliberately. In product organizations, that means treating AI like a capable collaborator with a transparent process, clear inputs, rigorous checks, and measurable success criteria. When I work this way, my teams ship insights and experiments faster—and with far fewer surprises.

    Discover how Amplitude AI thinks and best practices for working with it. Partner with AI at each step of its process for more accurate, actionable outputs.

    Here’s the mental model I use. AI moves through a series of steps: clarify the goal, ingest context, retrieve and rank relevant information, reason through candidate solutions, draft an answer, self-critique, and refine. My job is to actively guide each step. I define the objective precisely, supply high-signal context, specify constraints, ask for structured reasoning, and require a quality bar before anything ships to stakeholders.

    Start by setting intent and success criteria. I write a one-sentence objective (“what problem are we solving now”), then define the evaluation rubric (“what good looks like”) up front. This small habit powers eval-driven development: it keeps AI outputs aligned with product goals, not just plausible-sounding text. I’ll often include target metrics and guardrails, such as confidence thresholds or required evidence from “Amplitude analytics.”

    Next, I curate the context. For analytics use cases, I provide event taxonomies, metric definitions, segments, and recent behavioral analytics trends to ground the model. A retrieval-first pipeline helps here: I scope the corpus, trim noise, and apply context window management so the model sees only what’s essential. The result is sharper, faster answers that map to our real data model and “unified analytics platform.”

    Then I shape the prompt. I use concise role framing, 1–3 high-quality exemplars, and explicit constraints (format, length, tone, citation requirements). I also ask the model to show its reasoning with a short, labeled scratchpad and to state uncertainties. This is practical prompt engineering—not magic—designed to make reasoning inspectable and reproducible across “AI workflows.”

    When tools are available, I encourage agentic AI patterns: let the system plan, call functions, and iterate. With “Amplitude AI,” I ask it to propose the next best analysis (e.g., segment drill-down, funnel step attribution, or anomaly detection), run it, summarize findings, then reflect on whether the next step changes. If you’re using “Amplitude MCP,” formalize these actions as callable tools so the model can chain them reliably.

    Quality is never an afterthought. I build lightweight evaluations into every loop: compare the model’s output against the rubric, check factual grounding, and A/B test alternative prompts for clarity and conversion where appropriate. Over time, these evaluations become our regression suite, giving us confidence as data, prompts, or model versions evolve. This discipline keeps LLMs for product managers aligned with shifting business priorities.

    Finally, I turn insights into action. I ask “Amplitude AI” for decision-ready artifacts—clear hypotheses, prioritized opportunities, and concrete next steps owners can execute. I require the model to cite the specific supporting events or segments and to flag assumptions. That last step is crucial: it invites human judgment where it matters and prevents automation from outpacing accountability.

    This approach doesn’t slow teams down; it speeds them up with focus. By guiding each step—intent, context, reasoning, tools, and evaluation—you transform AI from a black box into a reliable copilot. The payoff is tangible: clearer insights, faster cycles, and outputs stakeholders trust the first time they see them.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Join Me in June: Master Opportunity-First Product Strategy with Continuous Discovery Habits

    Join Me in June: Master Opportunity-First Product Strategy with Continuous Discovery Habits

    I’m celebrating the five-year anniversary of Continuous Discovery Habits by inviting you to read it with me this June. As someone who leads product management and coaches product trios, I’ve seen how a shared discovery practice tightens alignment, speeds up learning, and drives outcomes. This month, we’ll go deep on prioritizing opportunities—not solutions—and I’ll guide you step by step so you can apply the ideas on your own team.

    Each month, I’m releasing an in-depth reading guide that includes:

    We’ll discuss each month’s reading in the comments, and we’ll gather quarterly on a live call to unpack real-world applications, trade wins and missteps, and keep the momentum going.

    Joining late? No problem. I monitor the comments on each reading guide throughout the year. Start with the current month or go back to January—whatever works for you. Ask for help, share what’s working, and connect with other readers at any point.

    If you want to participate, grab a copy of the book (or dust off your old copy), share the “Spread the Love” videos with your team, block time for the exercises, and register for the community sessions. Let’s do this.

    This Month’s Reading

    Chapter:

    Estimated reading time: ~16 minutes

    This month's chapter will introduce you to:

    Need a copy? Grab the book

    Share the Love with Friends and Colleagues

    We learn best in community. Use these short videos to spread the key ideas across your product trios, engineering partners, and stakeholders. Invite them to read along with you so your discovery cadence—and your product strategy—advance together.

    Reflect & Discuss What You Read

    When we reflect and discuss what we read, we absorb more and apply it faster. This chapter challenges a deeply ingrained habit: prioritizing solutions. I’ve been in those meetings—spreadsheets full of features, heated roadmap debates, and a creeping sense that we’re optimizing outputs rather than outcomes. The shift to opportunity-first thinking changed how my teams frame bets, sequence discovery, and communicate product strategy.

    Individual Reflection

    Team Discussion

    Put It Into Practice

    This month is all about shifting from solution-first to opportunity-first thinking. These short, focused exercises will help your product trio practice opportunity prioritization and improve decision speed without sacrificing product discovery rigor.

    Exercise: Map Your Roadmap to Opportunities

    Time: 45 minutesDo this: With your product trio

    Take your current roadmap or backlog and work backwards. For each planned feature or solution:

    This exercise often reveals that you're either:

    Use these insights to inform your next prioritization conversation.

    Exercise: Practice Two-Way Door Thinking

    Time: 30 minutesDo this: With your product trio

    Choose 3-5 recent or upcoming product decisions. For each one, discuss:

    The goal is to calibrate your team's decision-making speed. Two-way door decisions should be made quickly with "just enough" evidence. One-way door decisions deserve more deliberation and data.

    Go Deeper: Additional Reading

    If you prefer an audio summary of this month’s reading, including the book chapters and the following resources, I’ve included an audio version for members at the bottom of this post.

    Related In-Depth Guides

    Supplementary Reading

    Related Courses

    Our Live Discussion Schedule

    Our live discussion sessions are for registered members. Sessions are not recorded. Invitations will go out two weeks before the scheduled event—reserve time now.

    Audio Summary

    Prefer to listen? Stream the audio overview here: June — Prioritizing Opportunities (audio).

    Ready to put continuous discovery into action? Grab the book, share the videos with your team, schedule the exercises, and join the community sessions. Opportunity-first product strategy is a muscle we can build together.

    The chapters we will be readingA preview of the most important concepts we'll be learning aboutShort videos you can share with friends and colleagues to help spread the ideasIndividual and team discussion questions to help you absorb and engage with the readingTeam exercises to help you put the ideas into practiceAdditional reading to help you go deeper on the core ideasChapter 7: Prioritizing Opportunities, Not SolutionsWhy product strategy happens in the opportunity space, not the solution spaceHow to focus on one target opportunity at a time to deliver value iterativelyUsing the tree structure to simplify prioritization decisionsThe four criteria for assessing opportunities: sizing, market factors, company factors, and customer factorsWhy treating prioritization as a messy, subjective decision leads to better outcomes than scoring formulasThe concept of two-way door decisions and how they apply to opportunity prioritizationWork on one small opportunity at a time – Reduce your batch sizeGetting started with compare and contrast decisions – Choose the right target opportunityTurn big intractable problems into smaller, more solvable problems – The power of decompositionThink about your team's current roadmap or backlog. How much of your time is spent prioritizing features versus understanding and prioritizing customer opportunities? What would change if you flipped that ratio?Reflect on the last time you made a product decision. Did you treat it as a one-way door (irreversible) or a two-way door (reversible)? How did that framing affect your decision-making process and timeline?Consider the four assessment criteria (opportunity sizing, market factors, company factors, customer factors). Which of these does your team currently emphasize most? Which do you tend to overlook or underweight?As a team, list the top 5-10 items on your current roadmap or backlog. For each one, try to identify the underlying customer opportunity it addresses. If you can't clearly articulate the opportunity, what does that tell you about how you're making decisions?The chapter argues against scoring formulas (like RICE or ICE) for prioritization, calling them "made-up math." If your team uses a scoring system, discuss: What is it really measuring? Does it help you make better decisions, or does it just make subjective decisions feel more objective?Walk through a recent prioritization decision. Did you assess options in isolation ("should we build this?") or compare and contrast them? How might your decision have been different with a compare-and-contrast approach?Identify the customer opportunity it's meant to addressWrite it as something a customer might say (e.g., "I can't find anything to watch" not "We need better search")Look for patterns: Are multiple solutions addressing the same opportunity? Are some solutions disconnected from any clear customer need?Spreading yourself thin across too many opportunitiesOver-investing in a single opportunity with multiple solutionsBuilding solutions with no clear opportunity attachedIs this a one-way door decision (hard to reverse) or a two-way door decision (easy to reverse)?If it's a two-way door, what's the smallest step we could take to learn whether we're on the right track?What would we need to see to know we made the wrong choice?If we realize we're wrong, how quickly could we course-correct?Opportunity Solution Trees: Visualize Your Discovery to Stay Aligned and Drive OutcomesCustomer Interviews: Uncover Hidden Insights from Every ConversationPrioritize Opportunities, Not Solutions7 Key Benefits of Using Opportunity Solution TreesProduct in Practice: How 2-Way Door Decisions Helped Simply Business Learn FastProduct in Practice: Getting Started with Opportunity Solution Trees at SuperAwesomeProduct Discovery Fundamentals: Learn a structured and sustainable approach to continuous discovery.Tuesday, June 16, 2026: 9am-10am PDTThursday, September 17, 2026: 9am-10am PDTWednesday, December 16, 2026: 9am-10am PST


    Inspired by this post on Product Talk.


    Book a consult png image
  • Stop Support Tickets Before They Start: How AI Unsticks Users and Lifts Conversions

    Stop Support Tickets Before They Start: How AI Unsticks Users and Lifts Conversions

    Every moment of friction in a product carries a hidden cost: attention drifts, motivation wanes, and the next click becomes a support ticket—or worse, silent churn. Over the years, I’ve learned to treat “stuck” as an urgent product signal, not just an operational nuisance. When we unstick users in the flow, we protect revenue, brand trust, and the momentum that powers product-led growth.

    Learn how Amplitude’s Global Support team uses AI Assistant to reduce support tickets, prevent user churn, and increase conversions.

    I reference that line often because it captures a proven pattern: meet users where confusion peaks and resolve it instantly. In my practice, the formula is straightforward—pair behavioral analytics and session replay with a just-in-time AI Assistant, routed by clear driver trees. This transforms support from reactive firefighting into a proactive, in-product experience that accelerates onboarding and boosts user activation.

    Here’s how I operationalize it. First, I use Amplitude analytics and behavioral analytics to surface high-friction steps—pages with elevated drop-off, loops, or rage clicks. Session replay clarifies the “why” behind the numbers, while cohort and retention analysis reveal who’s most at risk. Then I deploy targeted in-app guides and tooltip design to preempt known pitfalls, while an AI Assistant handles real-time questions with context from our knowledge base and product docs.

    The AI Assistant is more than a chatbot. With well-structured AI workflows, it detects intent, pulls precise snippets from docs-as-code, and handles routine issues instantly. When complexity spikes, it executes a graceful handoff to consultative support via Intercom or a Zendesk integration—preserving conversation history and sentiment cues—so humans spend time where judgment matters. This hybrid model keeps response times low without sacrificing quality.

    To de-risk changes, I lean on A/B testing and feature flags. I measure time-to-value, activation rate, and funnel conversion as leading indicators, while tracking ticket deflection, CSAT, and NRR as trailing indicators. The goal isn’t just fewer tickets; it’s faster learning loops and a compounding improvement in user outcomes. When we see activation curves steepen and onboarding friction flatten, we know the system is working.

    Practically, I start with the top three friction points in onboarding, implement narrow in-app guides, and deploy the AI Assistant with strict guardrails and clear escalation paths. Weekly reviews align product, customer success, and solutions engineering around shared telemetry—so we tune prompts, content, and UI patterns together. Over time, I’ve seen ticket volume decline meaningfully, while conversion and retention rise as users experience fewer dead ends.

    If you’re evaluating where to begin, identify the moments where confusion compounds—pricing configuration, integrations, and data mapping are common culprits. Then introduce targeted, context-aware help right where users hesitate. You’ll not only prevent “every stuck user” from turning into a ticket—you’ll convert friction into confidence, and confidence into growth.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • How I Champion Platform Excellence: Lessons in Analytics, Scalability, and Product-Led Growth

    How I Champion Platform Excellence: Lessons in Analytics, Scalability, and Product-Led Growth

    I’m continually inspired by platform specialists who champion their analytics platforms end to end. When I study their work, I look for the connective tissue between strategy and execution—how behavioral analytics informs decisions, how a unified analytics platform reduces tool sprawl, and how great documentation and enablement convert insights into habit across product, engineering, and go-to-market teams.

    What consistently stands out is the rigor behind the scenes: clear data governance, privacy-by-design, and instrumentation standards that keep events trustworthy as products evolve. Platform scalability isn’t just about throughput; it’s about guardrails—naming conventions, schema versioning, and lineage—that let teams move quickly without sacrificing integrity. These are the unsung details that make insights reliable and repeatable at scale.

    I also pay close attention to how experimentation gets operationalized. Thoughtful A/B testing, well-scoped feature flags, and crisp definitions of “minimum detectable effect (MDE)” ensure that experiments produce signal instead of noise. Driver trees, opportunity solution trees, and continuous discovery keep teams anchored on outcomes, while retention analysis translates curiosity into durable growth. This is the backbone of product-led growth: small, fast bets tied to measurable behavioral shifts.

    Reliability and insight quality go hand in hand. Observability for event pipelines, anomaly detection to surface data drift, and targeted session replay help teams debug both product experience and analytics instrumentation. Paired with Web Vitals and clear ownership models, these practices shorten feedback loops, reduce blind spots, and keep platform credibility high—because trust is the real KPI behind every dashboard.

    In my own practice, I translate these lessons into roadmaps that balance discovery with delivery, and align solutions engineering, product, and design around the same north-star metrics. The result is a culture where platform champions don’t just advocate for tools—they enable outcomes. If you’re scaling an analytics stack or elevating your product strategy, these principles will help you move faster, with confidence, and make every insight count.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • AI Broke Your A/B Tests: 3 Proven Shifts to Rebuild a Resilient Experimentation Program

    AI Broke Your A/B Tests: 3 Proven Shifts to Rebuild a Resilient Experimentation Program

    I’ve watched a once-reliable A/B testing playbook buckle under the weight of generative AI. Traffic patterns aren’t stable, LLMs update behind the scenes, prompts evolve weekly, and personalization reshapes cohorts mid-flight. The result is non-stationary data, diluted statistical power, and “wins” that don’t replicate in production. If your experimentation program feels slower, noisier, and less trustworthy, you’re not imagining it—and you’re not alone.

    Learn why running more tests isn’t the answer to AI, and the three ways mature teams are shifting their experimentation programs.

    First, I’ve shifted from test volume to an evaluation stack—what I call eval-driven development. Instead of defaulting to production A/B tests, we front-load learning with offline evaluations (golden sets, synthetic scenarios), automated regressions on prompts and policies, and pre-production canaries. We size experiments with a clear minimum detectable effect (MDE), use sequential or Bayesian methods to handle drift, and reserve full A/B runs for hypotheses with sufficient power and operational readiness. This layered approach accelerates decisions, reduces traffic waste, and restores trust in effect sizes.

    Second, I’ve re-anchored our metrics and governance for AI-era reliability. We define a driver tree that links value creation to guardrail metrics such as latency, hallucination rate, cost per request, safety incidents, and user trust proxies. Persistent holdouts and long-lived control cohorts protect against platform-wide regressions, while anomaly detection highlights model or data shifts before they corrupt reads. Strong instrumentation—behavioral analytics, consistent event semantics, and product telemetry wired into Amplitude analytics—keeps our feedback loop tight and auditable.

    Third, we rebuilt rollout mechanics to make delivery experimentation-native. Feature flags, progressive delivery, and targeted canaries let us test safely in production while gating exposure by segment, risk, or policy. Shadow mode and offline replay provide signal before real users see risk. Multi-armed bandits help with exploration when goals are clear and guardrails are enforced, but we resist over-rotating to bandits when measurement is fragile. Tightly integrating experiments into CI/CD and observability shortens the cycle from hypothesis to validated outcome.

    In practice, here’s how I operationalize this shift. In 30 days, I audit the backlog, kill or consolidate tests that can’t meet MDE, and establish a minimal evaluation harness for prompts, policies, and safety checks. By 60 days, guardrail metrics are live with persistent holdouts and feature flags across AI surfaces. By 90 days, the team runs a balanced portfolio: offline evals for fast iteration, canaries for risk, and selective A/B testing for strategic bets—supported by continuous discovery to keep hypotheses grounded in real customer needs.

    AI didn’t eliminate the need for experimentation; it raised the bar for rigor. By moving from volume to validity, from vanity lifts to guardrailed outcomes, and from monolithic launches to progressive delivery, I’ve seen experimentation regain its edge—fewer false positives, faster cycles, and clearer signal on what truly drives impact. That’s how we turn a brittle testing culture into a resilient, learning system built for LLMs and beyond.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • How I Build High-Impact Experimentation Programs with Amplitude: Proven Practices at Scale

    How I Build High-Impact Experimentation Programs with Amplitude: Proven Practices at Scale

    I build experimentation programs to drive measurable outcomes, not just dashboards. In my product leadership work, I’ve seen how the right operating model turns experimentation into a reliable growth engine—especially when paired with the analytical depth of Amplitude. My goal is to help teams move from ad-hoc tests to a disciplined system that compounds learning and impact.

    Rigor starts with clarity. I translate strategic goals into testable hypotheses using driver trees, then structure A/B testing with a defined minimum detectable effect (MDE), guardrail metrics, and pre-registered decision criteria. This reduces p-hacking, shortens debate cycles, and makes outcomes auditable. I’m equally deliberate about risk: we monitor sample ratio mismatch, use feature flags for safe rollouts, and align on outcomes vs output OKRs so we celebrate business impact, not vanity wins.

    Amplitude analytics is my backbone for behavioral analytics at every step. I instrument clean event taxonomies, build funnels and cohorts to track user activation and retention analysis, and centralize experiment readouts in a unified analytics platform. This lets product trios quickly see how treatments shift behavior, where friction hides, and which moments matter most for product-led growth. The result is a trusted, shared source of truth that accelerates continuous discovery.

    At enterprise scale, governance matters as much as math. I often point to lessons inspired by Peacock’s experimentation program: standard naming conventions, centralized QA, CI/CD integration, and an active community of practice. Those practices keep velocity high without sacrificing validity, and they make wins repeatable across teams and surfaces.

    Operationally, I anchor the program in clear roles (data, engineering, design, product), templates for hypotheses and readouts, and a tight feedback loop from deploy to decision. With Amplitude, solutions engineering partnerships, and disciplined experiment hygiene, teams learn faster, ship safer, and build products customers love. That’s how experimentation becomes a strategic capability—not a side project.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image