Tag: product management leadership

  • AI Evals for Product Managers: How I Measure Agent Quality—A Beginner’s Playbook

    AI Evals for Product Managers: How I Measure Agent Quality—A Beginner’s Playbook

    I’ve led multiple AI agent launches, and the single most reliable way I’ve found to ship with confidence is to treat evaluations as a product capability, not a side project. When we make AI quality measurable, predictable, and comparable over time, we move faster, reduce risk, and build trust with customers and stakeholders.

    Learn how product managers use AI evaluations to measure agent quality. Covers traces, LLM judges, offline evals, online evals, and how to connect evals to product outcomes.

    Why does this matter so much in product management? Because agent quality is only meaningful when it drives adoption, satisfaction, and revenue. I use eval-driven development to align the day-to-day iteration of prompts, policies, and workflows with business outcomes like activation, retention, and Net Recurring Revenue (NRR). That alignment turns AI quality from an abstract notion into a roadmap lever.

    First, traces. Traces are the spine of evaluation for agentic AI: they capture inputs, intermediate steps, tools invoked, and final responses. I instrument traces to make reasoning visible—what the agent tried, where it hesitated, and why it chose a path. With that visibility, I can compare prompts, policies, and tools, and I can teach the team to fix the root cause instead of patching symptoms. This is also where Agent Analytics becomes real: we move from anecdotes to observable behavior trends across cohorts and use cases.

    Next, LLM judges. I use model-as-judge to score qualities like helpfulness, coherence, or adherence to brand and policy. The trick is calibration. I pair LLM judges with a small, high-quality human-labeled set to ground the scale, then monitor drift as models, prompts, or data shift. LLM judges help me evaluate at speed, but I still spot-check edge cases and highly regulated flows to balance efficiency with risk controls.

    Offline evals come first. Before I expose users to changes, I run fixed test suites representing core scenarios, failure modes, and edge cases. I include golden examples, adversarial prompts, and domain-specific queries. Metrics cover task success, factuality, safety, latency, and cost. This is where prompt engineering and retrieval quality are tuned; if I’m using a retrieval-first pipeline, I evaluate evidence quality separately from generation so improvements are attributable and reproducible.

    Online evals follow to validate real-world performance. I roll changes out behind feature flags and use A/B testing to compare variants under production conditions. I track conversation outcomes, tool success rates, fallbacks to human support, and user satisfaction. These online signals close the loop on whether an offline improvement actually compounds value in the product—critical for product-led growth.

    Connecting evals to product outcomes is non-negotiable. I map quality signals to a driver tree: from per-turn scores (helpfulness, safety, latency) up to session-level outcomes (task completion, deflection, revenue intent), and finally to product KPIs (activation, retention, NRR). With this structure, I can set thresholds for launch gates, prioritize roadmap items that move the biggest levers, and build dashboards that leadership understands at a glance.

    A few lessons learned. Start with a minimal but durable test set and grow it as you discover new failure modes. Version everything—prompts, tools, and datasets—so you can reproduce wins. Beware metric drift when you swap models or update prompts. Blend human review where the cost of error is high. Above all, make evaluations part of your AI workflows and sprint rituals so quality improves continuously, not sporadically.

    If you’re just getting started, begin with traces and a small offline suite, add LLM judges for scale, then prove impact with a focused online experiment. Within a few cycles, you’ll have a living evaluation system that guides decisions, accelerates delivery, and gives your team—and your customers—confidence in every AI release.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Crafting Beloved Tech Brands: My Moonshot Marketing Playbook for the Post-LLM Era

    I spend a lot of my time asking a deceptively simple question: what does excellent marketing actually look like in 2026? From the vantage point of product leadership, the answer isn’t a spreadsheet or a channel plan—it’s a feeling. Beloved tech brands earn the benefit of the doubt, create gravity around their roadmap, and make customers proud to belong. That kind of momentum is not an accident; it’s a system.

    Here’s the hard truth I’ve learned building and scaling products: giving teams different goals creates dysfunction. When brand, demand gen, product marketing, and comms run on fragmented OKRs, you manufacture internal headwinds. “Marketing is one engine – not separate pieces.” One strategy, one narrative, one set of outcomes—expressed through different craft disciplines and time horizons.

    That unity of purpose clarifies executive roles, too. The real difference between an SVP and a CMO is scope and narrative ownership. A great CMO architects the whole system—portfolio allocation, brand architecture, integrated go-to-market strategy, and the bar for creative taste—while refusing to get dragged into decisions they should never be making (for example, approving every headline or micromanaging channel tactics). Leaders should decide the outcomes, standards, and constraints; teams should control the craft.

    On portfolio design, I run marketing like a portfolio of moonshots. You need a healthy mix: proven programs that compound, emergent bets that learn fast, and a small set of true moonshots that can change the slope of the curve. The point isn’t bravado; it’s risk-balanced exploration. If everything ships safely, you’re under-investing in differentiation. If everything is a swing for the fences, you’re not building a repeatable growth engine.

    This is where taste becomes a strategic advantage. “Ubiquity is the opposite of cool.” If you want to be beloved, you cannot treat every channel, audience, and moment as equal. Early on, selective distribution, distinctive creative codes, and tight community loops create status and meaning. Later, you scale without sanding off the edges that made the product special.

    Why do a few companies build a flywheel of momentum while others stall? They align story, product, and distribution. The product earns trust, the narrative creates aspiration, and the go-to-market strategy ensures the right customers experience both at the right time. Then perception cycles kick in—the Silicon Valley clock turns—and irrational optimism or skepticism can amplify signals. The antidote is compounding proof: consistent product shipping, community advocacy, and creative that makes people care.

    Scaling taste across an organization is teachable. I codify brand principles, narrative guardrails, and examples of “right” versus “almost right.” I replace abstract feedback with decision rubrics—what we keep, kill, or revise and why. I run recurring creative reviews with a small cross-functional council, so judgment compounds. Taste can’t be fully automated, but it can be operationalized: shared references, a story bible, and a high bar for craft that’s explicit, not mystical.

    In a post-LLM world, the fundamentals haven’t changed—but the frontier has. Generative tools supercharge iteration and research, yet the artistry never really left. You still need a point of view, a tension worth resolving, and a value proposition that’s felt, not just stated. Can taste be encoded in software? Parts of it—pattern libraries, style constraints, data-driven feedback—absolutely. But the spark that makes work unforgettable remains human: judgment, risk tolerance, and the courage to ship something that might not fit the playbook.

    That’s why telling an optimistic, yet realistic story about AI matters. Over-automation drains humanity; under-automation wastes potential. The best work pairs AI Strategy with craft leadership: LLMs for rapid exploration, humans for narrative decisions and ethical judgment. Your message should show how AI expands customer agency, not just efficiency.

    The brand-versus-growth debate is a false choice. The right story accelerates pipeline, and the right demand programs reinforce the brand. Look at Apple’s discipline around product truth and design codes, or Google Chrome’s “The Web Is What You Make of It (Dear Sophie)” for proof that emotion and utility can co-exist. Notion, Pinterest, Square, HubSpot, and Harley-Davidson show how community, identity, and product-led growth interlock when the company knows exactly what it stands for.

    When it comes to launches, I’ve learned that announcement videos full of humans, lack humanity. Overproduced gloss often dilutes the truth customers seek: what problem does this solve, how quickly can I feel the value, and why does it matter now? Real users, real context, and a crisp arc from problem to promise will outperform most theatrics.

    Practically, I architect my week to protect taste and outcomes. Early-week for strategy, portfolio reviews, and cross-functional alignment; mid-week for deep creative and product marketing work; late-week for decision clears and postmortems. I time-box “disruptive energy”—space to chase non-obvious ideas—and I guard it like any critical meeting. Without protected cycles for exploration, the urgent will always suffocate the important.

    If there’s a single takeaway: playbooks are obsolete, but the fundamentals are not. The channels change; the psychology doesn’t. Run one engine. Allocate a true portfolio. Scale taste with rigor. In the AI era, make people care. That’s how beloved tech brands are built—and how they endure.


    Book a consult png image
  • Broken Procurement Is Costing You Talent: A Product Leader’s Playbook for Speed and Sanity

    Broken Procurement Is Costing You Talent: A Product Leader’s Playbook for Speed and Sanity

    Procurement should accelerate value, not suffocate it. Listening to this episode, I found myself nodding (and wincing) through a painfully familiar story about how well-intended controls morph into barriers that keep great expertise out. As a product leader responsible for speed, outcomes, and brand experience, I see procurement as a direct mirror of culture—and an often overlooked part of the product operating system.

    In the conversation, Teresa is cranky—and honestly, she has every right to be. She’s simultaneously juggling seven speaking engagement contracts, and six of them have become a part-time job in themselves—think 80-page ethics policies, 800-question security forms, and Multi-Factor Authentication (MFA) questions asked 17 different times. Meanwhile, the one company that just put her fee on a credit card? Scheduled, confirmed, and done in two weeks. That contrast is the whole story: friction repels talent; clarity and simplicity attract it.

    Petra adds her own horror story—filling out 12 identical Word document forms—and together they surface a deeper truth I’ve seen across organizations: broken vendor processes don’t just frustrate consultants; they stop companies from getting the expertise they actually need. And despite what many assume, company size isn’t the deciding factor—leadership intent and process ownership are.

    If you’ve ever wondered why a training got canceled, why a speaker backed out, or why your team can’t seem to bring in outside experts, this is likely the culprit: procurement theater. Repetitive forms, unbounded scope creep, and sprawling security reviews create drag that outlasts any short-term legal or compliance gain. The opportunity cost—lost learning, slower progress, and talent that simply says no—is enormous.

    One detail that stood out: with CEO-level buy-in, a legal review timeline collapsed from four months to 10 days. I’ve seen the same thing. Executive sponsorship is the fastest procurement tool there is, and it reveals what the organization truly values. If you can compress the path when a leader cares, you can redesign the path so it’s always faster—without compromising real risk management.

    I also loved the clarity of a simple policy from the episode: Teresa’s new policy is straightforward—her paperwork, credit card payment, no vendor setup—or no speaking engagement. That’s not obstinance; it’s a bright-line test for whether an organization respects expert time and understands total cost. The best experts have options, and friction filters them out first.

    Here’s how I operationalize this in product-led organizations. Tier risk by engagement type (e.g., one-hour talk vs. long-term software vendor) and match the process to the risk. Offer a credit-card fast lane with standard, plain-English terms for low-risk work. Eliminate duplicate data entry and kill redundant questionnaires. Use a single, secure intake that auto-fills known fields. Track cycle time end to end, and publish SLAs for legal, InfoSec, and finance. Most importantly, make vendor experience a first-class metric—because it is a brand experience.

    Security and compliance matter, but they must be right-sized. If you’re buying a keynote, you’re not buying data processing—so why the 800-question security review? Calibrate controls to actual data access and system interaction. The episode even references AWS DynamoDB and GuardDuty, plus Claude Code—helpful reminders that your stack context matters, but not every purchase touches it. Don’t conflate deep technical diligence for a SaaS integration with a simple, no-data engagement.

    There’s a reason the classic film Office Space gets a nod—it’s the perfect metaphor for what happens when well-meaning governance calcifies. Bureaucracy compounds over time, usually after adverse events, until startups—or any team that still moves fast—run circles around you. Procurement that treats experts like adversaries won’t win the race that actually matters: learning faster than the market.

    If you want the full story, listen to the episode here: Spotify (https://open.spotify.com/episode/2JHnTvnZX2WcFczml7ozKY?ref=producttalk.org) | Apple Podcasts (https://podcasts.apple.com/kh/podcast/procurement/id1794203808?i=1000770701690&ref=producttalk.org). It’s cathartic, but more importantly, it’s a blueprint for fixing what’s broken.

    Mentioned in the episode: Hire Teresa to Speak (https://www.producttalk.org/hire-teresa-to-speak/), AWS DynamoDB (https://aws.amazon.com/dynamodb/?ref=producttalk.org), GuardDuty (https://aws.amazon.com/guardduty/?ref=producttalk.org), Claude Code (https://www.claude.com/product/claude-code?ref=producttalk.org), and Office Space (https://en.wikipedia.org/wiki/Office_Space?ref=producttalk.org).

    I’d love to hear your experiences and fixes. Where does your procurement flow break, how do you measure cycle time today, and what would it take to create a vendor experience you’d be proud to put your brand on? Drop your thoughts below and let’s trade playbooks.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Join Me in June: Master Opportunity-First Product Strategy with Continuous Discovery Habits

    Join Me in June: Master Opportunity-First Product Strategy with Continuous Discovery Habits

    I’m celebrating the five-year anniversary of Continuous Discovery Habits by inviting you to read it with me this June. As someone who leads product management and coaches product trios, I’ve seen how a shared discovery practice tightens alignment, speeds up learning, and drives outcomes. This month, we’ll go deep on prioritizing opportunities—not solutions—and I’ll guide you step by step so you can apply the ideas on your own team.

    Each month, I’m releasing an in-depth reading guide that includes:

    We’ll discuss each month’s reading in the comments, and we’ll gather quarterly on a live call to unpack real-world applications, trade wins and missteps, and keep the momentum going.

    Joining late? No problem. I monitor the comments on each reading guide throughout the year. Start with the current month or go back to January—whatever works for you. Ask for help, share what’s working, and connect with other readers at any point.

    If you want to participate, grab a copy of the book (or dust off your old copy), share the “Spread the Love” videos with your team, block time for the exercises, and register for the community sessions. Let’s do this.

    This Month’s Reading

    Chapter:

    Estimated reading time: ~16 minutes

    This month's chapter will introduce you to:

    Need a copy? Grab the book

    Share the Love with Friends and Colleagues

    We learn best in community. Use these short videos to spread the key ideas across your product trios, engineering partners, and stakeholders. Invite them to read along with you so your discovery cadence—and your product strategy—advance together.

    Reflect & Discuss What You Read

    When we reflect and discuss what we read, we absorb more and apply it faster. This chapter challenges a deeply ingrained habit: prioritizing solutions. I’ve been in those meetings—spreadsheets full of features, heated roadmap debates, and a creeping sense that we’re optimizing outputs rather than outcomes. The shift to opportunity-first thinking changed how my teams frame bets, sequence discovery, and communicate product strategy.

    Individual Reflection

    Team Discussion

    Put It Into Practice

    This month is all about shifting from solution-first to opportunity-first thinking. These short, focused exercises will help your product trio practice opportunity prioritization and improve decision speed without sacrificing product discovery rigor.

    Exercise: Map Your Roadmap to Opportunities

    Time: 45 minutesDo this: With your product trio

    Take your current roadmap or backlog and work backwards. For each planned feature or solution:

    This exercise often reveals that you're either:

    Use these insights to inform your next prioritization conversation.

    Exercise: Practice Two-Way Door Thinking

    Time: 30 minutesDo this: With your product trio

    Choose 3-5 recent or upcoming product decisions. For each one, discuss:

    The goal is to calibrate your team's decision-making speed. Two-way door decisions should be made quickly with "just enough" evidence. One-way door decisions deserve more deliberation and data.

    Go Deeper: Additional Reading

    If you prefer an audio summary of this month’s reading, including the book chapters and the following resources, I’ve included an audio version for members at the bottom of this post.

    Related In-Depth Guides

    Supplementary Reading

    Related Courses

    Our Live Discussion Schedule

    Our live discussion sessions are for registered members. Sessions are not recorded. Invitations will go out two weeks before the scheduled event—reserve time now.

    Audio Summary

    Prefer to listen? Stream the audio overview here: June — Prioritizing Opportunities (audio).

    Ready to put continuous discovery into action? Grab the book, share the videos with your team, schedule the exercises, and join the community sessions. Opportunity-first product strategy is a muscle we can build together.

    The chapters we will be readingA preview of the most important concepts we'll be learning aboutShort videos you can share with friends and colleagues to help spread the ideasIndividual and team discussion questions to help you absorb and engage with the readingTeam exercises to help you put the ideas into practiceAdditional reading to help you go deeper on the core ideasChapter 7: Prioritizing Opportunities, Not SolutionsWhy product strategy happens in the opportunity space, not the solution spaceHow to focus on one target opportunity at a time to deliver value iterativelyUsing the tree structure to simplify prioritization decisionsThe four criteria for assessing opportunities: sizing, market factors, company factors, and customer factorsWhy treating prioritization as a messy, subjective decision leads to better outcomes than scoring formulasThe concept of two-way door decisions and how they apply to opportunity prioritizationWork on one small opportunity at a time – Reduce your batch sizeGetting started with compare and contrast decisions – Choose the right target opportunityTurn big intractable problems into smaller, more solvable problems – The power of decompositionThink about your team's current roadmap or backlog. How much of your time is spent prioritizing features versus understanding and prioritizing customer opportunities? What would change if you flipped that ratio?Reflect on the last time you made a product decision. Did you treat it as a one-way door (irreversible) or a two-way door (reversible)? How did that framing affect your decision-making process and timeline?Consider the four assessment criteria (opportunity sizing, market factors, company factors, customer factors). Which of these does your team currently emphasize most? Which do you tend to overlook or underweight?As a team, list the top 5-10 items on your current roadmap or backlog. For each one, try to identify the underlying customer opportunity it addresses. If you can't clearly articulate the opportunity, what does that tell you about how you're making decisions?The chapter argues against scoring formulas (like RICE or ICE) for prioritization, calling them "made-up math." If your team uses a scoring system, discuss: What is it really measuring? Does it help you make better decisions, or does it just make subjective decisions feel more objective?Walk through a recent prioritization decision. Did you assess options in isolation ("should we build this?") or compare and contrast them? How might your decision have been different with a compare-and-contrast approach?Identify the customer opportunity it's meant to addressWrite it as something a customer might say (e.g., "I can't find anything to watch" not "We need better search")Look for patterns: Are multiple solutions addressing the same opportunity? Are some solutions disconnected from any clear customer need?Spreading yourself thin across too many opportunitiesOver-investing in a single opportunity with multiple solutionsBuilding solutions with no clear opportunity attachedIs this a one-way door decision (hard to reverse) or a two-way door decision (easy to reverse)?If it's a two-way door, what's the smallest step we could take to learn whether we're on the right track?What would we need to see to know we made the wrong choice?If we realize we're wrong, how quickly could we course-correct?Opportunity Solution Trees: Visualize Your Discovery to Stay Aligned and Drive OutcomesCustomer Interviews: Uncover Hidden Insights from Every ConversationPrioritize Opportunities, Not Solutions7 Key Benefits of Using Opportunity Solution TreesProduct in Practice: How 2-Way Door Decisions Helped Simply Business Learn FastProduct in Practice: Getting Started with Opportunity Solution Trees at SuperAwesomeProduct Discovery Fundamentals: Learn a structured and sustainable approach to continuous discovery.Tuesday, June 16, 2026: 9am-10am PDTThursday, September 17, 2026: 9am-10am PDTWednesday, December 16, 2026: 9am-10am PST


    Inspired by this post on Product Talk.


    Book a consult png image
  • AI Broke Your A/B Tests: 3 Proven Shifts to Rebuild a Resilient Experimentation Program

    AI Broke Your A/B Tests: 3 Proven Shifts to Rebuild a Resilient Experimentation Program

    I’ve watched a once-reliable A/B testing playbook buckle under the weight of generative AI. Traffic patterns aren’t stable, LLMs update behind the scenes, prompts evolve weekly, and personalization reshapes cohorts mid-flight. The result is non-stationary data, diluted statistical power, and “wins” that don’t replicate in production. If your experimentation program feels slower, noisier, and less trustworthy, you’re not imagining it—and you’re not alone.

    Learn why running more tests isn’t the answer to AI, and the three ways mature teams are shifting their experimentation programs.

    First, I’ve shifted from test volume to an evaluation stack—what I call eval-driven development. Instead of defaulting to production A/B tests, we front-load learning with offline evaluations (golden sets, synthetic scenarios), automated regressions on prompts and policies, and pre-production canaries. We size experiments with a clear minimum detectable effect (MDE), use sequential or Bayesian methods to handle drift, and reserve full A/B runs for hypotheses with sufficient power and operational readiness. This layered approach accelerates decisions, reduces traffic waste, and restores trust in effect sizes.

    Second, I’ve re-anchored our metrics and governance for AI-era reliability. We define a driver tree that links value creation to guardrail metrics such as latency, hallucination rate, cost per request, safety incidents, and user trust proxies. Persistent holdouts and long-lived control cohorts protect against platform-wide regressions, while anomaly detection highlights model or data shifts before they corrupt reads. Strong instrumentation—behavioral analytics, consistent event semantics, and product telemetry wired into Amplitude analytics—keeps our feedback loop tight and auditable.

    Third, we rebuilt rollout mechanics to make delivery experimentation-native. Feature flags, progressive delivery, and targeted canaries let us test safely in production while gating exposure by segment, risk, or policy. Shadow mode and offline replay provide signal before real users see risk. Multi-armed bandits help with exploration when goals are clear and guardrails are enforced, but we resist over-rotating to bandits when measurement is fragile. Tightly integrating experiments into CI/CD and observability shortens the cycle from hypothesis to validated outcome.

    In practice, here’s how I operationalize this shift. In 30 days, I audit the backlog, kill or consolidate tests that can’t meet MDE, and establish a minimal evaluation harness for prompts, policies, and safety checks. By 60 days, guardrail metrics are live with persistent holdouts and feature flags across AI surfaces. By 90 days, the team runs a balanced portfolio: offline evals for fast iteration, canaries for risk, and selective A/B testing for strategic bets—supported by continuous discovery to keep hypotheses grounded in real customer needs.

    AI didn’t eliminate the need for experimentation; it raised the bar for rigor. By moving from volume to validity, from vanity lifts to guardrailed outcomes, and from monolithic launches to progressive delivery, I’ve seen experimentation regain its edge—fewer false positives, faster cycles, and clearer signal on what truly drives impact. That’s how we turn a brittle testing culture into a resilient, learning system built for LLMs and beyond.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • The Counterintuitive Playbook for CLI Agents: Why Ruthless Subtraction Beats Feature Creep

    The Counterintuitive Playbook for CLI Agents: Why Ruthless Subtraction Beats Feature Creep

    I’ve learned the hard way that the fastest path to a reliable command-line agent is radical subtraction. "In the last month of developing Amplitude Wizard CLI, we cut more than we added. Learn less is more when it comes to building CLI agents." That decision was less about minimalism and more about product strategy: constraints sharpen behavior, clarify intent, and raise trust.

    When I evaluate agentic AI systems, especially those that act on developer environments, I start by asking what the agent must never do. By establishing hard guardrails first, the design naturally converges on an opinionated, safe, and teachable interface. Every additional flag, tool, or permission expands the blast radius; every removal shortens the path to first success.

    For CLI agents, the most valuable product choice is a narrow toolset with sane defaults. Opinionated workflows reduce cognitive load and failure modes, while clear human override points keep users in control. I prefer a bias toward idempotent actions, reversible changes, and explicit confirmation gates for anything destructive. If a feature can’t explain itself in a single, crisp sentence in the help text, it likely doesn’t belong.

    Security and reliability flow from limits. Progressive permissioning, scoped credentials, and time-bounded tokens prevent the agent from wandering. Dry-run modes build confidence without side effects. When a user can reason about what the agent will and won’t do, adoption accelerates—and support tickets plummet.

    Observability is the other half of trust. I instrument "Agent Analytics" across every run: inputs, tool choices, durations, outcomes, and error patterns. Those signals reveal where the agent gets confused, which steps users abandon, and which prompts need pruning. With that loop in place, "less is more" stops being a philosophy and becomes an evidence-backed operating model.

    I anchor the roadmap in eval-driven development. Before adding a capability, I define a measurable task, a success threshold, and the smallest viable interface to reach it. If the capability can’t lift completion rate, time-to-first-success, or re-run stability, it waits. That simple discipline protects the experience from feature creep and preserves velocity in CI/CD.

    Under the hood, I design for a retrieval-first pipeline and careful context window management. The agent should fetch only the minimally relevant facts, present a compact plan, and execute predictably. Thoughtful prompt engineering helps—but prompts are not a substitute for clear boundaries, deterministic tool contracts, and robust error handling.

    Documentation is product. I maintain docs-as-code with runnable examples that mirror the golden paths. When the docs and the CLI disagree, the CLI changes—never the docs. This creates an internal forcing function: if we can’t document it simply, we probably shouldn’t ship it.

    My litmus test for any proposed addition is simple: does this make the mental model smaller? If not, cut it, make it progressive, or hide it behind a clearly named subcommand. Defaults should be boring, safe, and fast. Advanced power should be opt-in and discoverable without overwhelming new users.

    The paradox of agentic AI is that capability grows as surface area shrinks. By removing distractions, we amplify signal, increase repeatability, and earn the right to add the next carefully chosen step. The result is a CLI agent that feels sharp, dependable, and—most importantly—useful on day one.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Inside Growth Engineering at Amplitude: My Playbook to Accelerate Product-Led Growth with Analytics

    Inside Growth Engineering at Amplitude: My Playbook to Accelerate Product-Led Growth with Analytics

    I’m often asked how leading growth teams turn insights into compounding business results. Few organizations illustrate this better than the Growth Engineering team at Amplitude. Drawing from their example and my own experience, I’ve distilled a practical playbook that any product organization can use to move faster, learn smarter, and scale impact.

    At the core is a disciplined blend of behavioral analytics and rapid experimentation. Amplitude analytics, as part of a unified analytics platform, enables precise event instrumentation, cohorting, and funnel analysis that surface where activation and retention truly break down. When I combine those signals with qualitative insights, I can prioritize fewer, higher-leverage bets that directly improve user activation and long-term retention.

    My growth loop always starts with clearly stated hypotheses, success metrics, and A/B testing power considerations, including a defined minimum detectable effect (MDE). I pair feature flags with staged rollouts to de-risk changes and accelerate iteration without compromising stability. This cadence turns every release into a learning opportunity, compounding knowledge across teams and time.

    Cross-functional execution is non-negotiable. I rely on tight “product trios” collaboration—product, engineering, and design—so we can ship small, measurable changes quickly, observe outcomes, and then widen scope with confidence. The Growth Engineering mindset keeps us grounded in real user behavior, not assumptions, and ensures our roadmap is fueled by evidence rather than opinion.

    Consider onboarding. Instead of a single redesign, I prefer a series of targeted experiments—tweaking progressive disclosure, refining tooltip design, and adding in-app guides where users predictably stall. Each test is instrumented end to end, from first action to activation event, and validated via retention analysis to confirm that short-term lifts turn into durable habit formation.

    When prioritizing, I map ideas to driver trees tied to our North Star metric. Behavioral analytics tell me which levers—time-to-value, depth-of-use, or frequency—will yield the biggest gain. That clarity focuses engineering effort on interventions that actually shift outcomes, not just outputs.

    If you’re building your own Growth Engineering capability, start with three moves: instrument ruthlessly so you can trust your signals, adopt feature flags to speed safe experimentation, and hold teams accountable to measurable, user-centric outcomes. Do this consistently and you’ll feel the compounding effect—faster learning cycles, stronger product-market fit signals, and a durable engine for product-led growth.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Is Technology Still Net Positive? A Product Leader’s Reckoning and Playbook for Humane Growth

    Is Technology Still Net Positive? A Product Leader’s Reckoning and Playbook for Humane Growth

    I’ve spent my career building products on top of the internet, championing social media, and now scaling AI. Lately, I keep returning to an uncomfortable but necessary question: are we still building a net positive future—or have we drifted into something else entirely?

    A recent long-form conversation in my podcast queue challenged me to do a deeper self-audit. If you want to hear the debate that sparked this reflection, you can listen on: Spotify | Apple Podcasts. What follows is my synthesis as a product management leader: the hard truths, the hopeful paths forward, and the practical actions I’m taking with my teams.

    The moment that hit me hardest was a family member’s blunt assessment that the internet has become “net negative.” That phrase landed like a wake-up call—a reminder that those of us inside tech often operate in an echo chamber. We see our roadmaps, our metrics, our progress; the rest of the world experiences the second-order effects. As a leader, I have to seek out those outside-in perspectives with the same rigor I apply to any product discovery practice.

    Another truth I can’t ignore: somewhere along the way, parts of our industry slid from “make people’s lives better” to “extract maximum value at any human cost.” You can see it in incentives that prioritize growth at all costs, in waves of layoffs that treat people as an expense line, and in platform behaviors that resemble a modern tycoon era. This isn’t just a moral critique—it’s a product strategy risk. Extractive models erode trust, weaken retention, and invite regulatory and reputational headwinds that no amount of optimization can out-execute.

    The loneliness crisis is real, and technology has too often replaced human connection instead of augmenting it. Spend a week in San Francisco and you’ll notice what I call “isolation by design”—QR-code menus, autonomous Waymos, frictionless everything, but fewer genuine human moments. It’s efficient, yes, but alienating. No algorithm can substitute for physical touch, care, and community. As builders, we should design products that create on-ramps to real-world connection, not cul-de-sacs of infinite scroll.

    We still have agency. “Don’t be evil” shouldn’t be a nostalgic slogan; it should be a minimum bar. Responsible product management means being a citizen of the ecosystems we influence: naming trade-offs clearly, instrumenting for externalities, and building AI risk management into our operating cadence. It also means stepping outside the industry narrative to ask neighbors, parents, teachers, and small business owners how our products actually land in their lives.

    One idea that gives me hope is “mom and pop tech”: AI-enabled, hyper-local tools crafted for specific neighborhoods and communities. Think “inch wide, mile deep”—software that solves a real problem for a defined community rather than chasing a horizontal total addressable market. Consider ride share. The extractive platform playbook maximized liquidity but squeezed drivers and frayed local fabric. A community-owned alternative could optimize for safety, fair wages, and neighborhood vitality over blitz-scaled margins. That’s civic tech with a viable product strategy.

    I’m also watching how social norms evolve. At a recent Elternabend at a German primary school, parents collectively agreed to delay smartphones until age 11 or 12—a striking shift from just five years ago when many 7–8 year olds had devices. Culture moves, sometimes faster than we expect. Product-led growth that ignores cultural momentum (or ethical guardrails) is fragile growth.

    So what do we do on Monday morning? First, rebuild our discovery muscles outside the echo chamber: continuous discovery with the people most affected by our products, not just our power users. Second, measure what matters: add well-being, community impact, and qualitative trust signals to the same dashboards that track activation and retention. Third, resist technology FOMO—choose fewer bets and go deeper, especially where AI can be applied responsibly to unlock real-world value. Fourth, cultivate communities of practice that normalize responsible experimentation, privacy-by-design, and transparent communication. Finally, narrate the change: as product people, we are educators as much as we are builders; our stories shape what teams believe is possible.

    If you’re looking for frameworks to anchor this work, revisit classics like Bowling Alone: The Collapse and Revival of American Community for context on social capital, and pair that with modern conversations on local resilience and community spaces. The future isn’t written yet. With clear principles, careful incentives, and the courage to narrow our scope in service of depth, we can still build technology that strengthens the bonds that make life worth living.

    I’d love to hear how you’re approaching this in your organization—especially examples of “mom and pop tech,” AI Strategy in service of community, or product strategies that trade a little scale for a lot of human good. Join the conversation in the comments.


    Inspired by this post on Product Talk.


    Book a consult png image
  • From Ed‑Tech Roots to Core Analytics: Product Leadership Lessons Inspired by Amplitude

    From Ed‑Tech Roots to Core Analytics: Product Leadership Lessons Inspired by Amplitude

    I often look to Amplitude and its core analytics product when I’m coaching teams and refining our own product strategy. The discipline required to turn raw event streams into actionable behavioral analytics mirrors what I expect from empowered product teams: precise instrumentation, clear decision points, and a relentless focus on outcomes.

    Some of the most effective product managers I meet began their careers in the ed-tech and recruiting space. That early-stage, resource-constrained environment cultivates sharp prioritization instincts and a comfort with ambiguity—muscles that translate directly into building scalable analytics capabilities without losing speed or customer empathy.

    In my practice, I anchor discovery and roadmap decisions in driver trees that connect north-star outcomes to measurable input metrics. That structure keeps product trios aligned on the questions that matter: What behaviors predict retention? Where does user activation stall? Which experiments will meaningfully shift our core metrics? Paired with continuous discovery, this approach ensures we ship learnings—not just features.

    Tactically, I encourage teams to combine Amplitude analytics with a unified analytics platform mindset: centralize event taxonomy, standardize cohort definitions, and operationalize retention analysis alongside acquisition and activation. When we treat analytics as a product, not a tool, we unlock faster iteration loops, smarter A/B testing, and clearer trade-offs between depth and breadth in our product surface area.

    Product-led growth hinges on narratives supported by evidence. I’ve found that clear opportunities emerge when we map journeys, quantify friction with session replay and funnels, and then validate solution ideas through small, reversible bets. This is where outcome-based roadmapping shines: we commit to moving a metric, not to a specific feature, and we let the data guide sequencing.

    At the leadership level, I focus on execution readiness: crisp problem statements, decision logs, and CI/CD practices that reduce batch size and increase deployment frequency. The goal isn’t shipping more; it’s compounding learning. When teams internalize this mindset, analytics stops being a dashboard and becomes a competitive advantage.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Old-School Selling Beats PLG in the AI Era: My GTM Playbook for 8‑Day Enterprise Deals

    Old-school, in-person selling is having a renaissance in the AI era, and I’ve seen why up close. From leading product and go-to-market teams through hypergrowth, I keep returning to one lesson: enterprise buyers still reward the teams who show up, orchestrate change management, and own outcomes end-to-end. The tech has changed; the human dynamics haven’t.

    Has the sales playbook changed in the AI era? The tools are faster and the surface area is bigger, but the core motion remains the same: “showing up” beats letting the marketplace decide. That’s why in-person enterprise rollouts still beat product-led motions, especially when the stakes include security, governance, and cross-functional adoption. You win by reducing organizational risk, not by assuming free trials will do the heavy lifting.

    Great enterprise sellers collapse silos. They sell to engineers and executives in one motion, pairing deeply technical validation with crisp business narratives. In my org, that means every high-velocity pilot has a dual thread: hands-on, eval-driven proof for the builders and a value architecture for the budget owners. When those motions run in parallel, time-to-value plummets and procurement friction fades.

    Selling to AI-native buyers who grew up on ChatGPT changes tempo, not fundamentals. The same seller, different tempo: 8 weeks vs. 8 business days. These buyers evaluate fast, expect clear ROI, and push for automation-first workflows. How AI-native buyers handle build vs. buy decisions comes down to build for differentiation and buy for acceleration. If you make procurement feel like product—frictionless, instrumented, and transparent—you’ll meet their bar.

    Process matters, but humanity wins. Building a robust sales process that still leaves room for unscripted moments is where trust is formed. I’ll never forget the story of the rep who taught a champion’s son guitar over Zoom—an unscripted moment that cemented a partnership. The lesson: raise the floor without capping the ceiling. Equip every rep with repeatable plays, then celebrate the creative instincts that make champions out of customers.

    In early GTM, why the three highest-leverage early sales hires aren’t sellers at all resonates with my experience. I prioritize a solutions engineer who can de-risk integration, a forward-deployed operator who can run the first rollout like a product manager, and a customer success lead who designs adoption paths from day zero. Together, they compress the value journey from proof to production.

    Compensation design shapes your talent market. The case for outsized commission accelerators for star sellers — and the kind of person they attract is real: magnets for competitors who close complex, multi-threaded deals and thrive with ownership. But beware: why too much process narrows the kind of seller you attract. Over-script it and you filter out the very people who can navigate ambiguity with customers.

    Under the hood, instrumenting the funnel from stage zero to close keeps the system honest. I track intent signals before pipeline, conversion by persona and use case, proof milestones, and time-to-value in production. The three pillars of GTM excellence for me are repeatable discovery, referenceable outcomes, and relentless enablement. And inside the leadership team, building peers who are 80% aligned, not 100% preserves healthy tension while keeping execution fast.

    AI is expanding the definition of enablement—whether AI is changing what good enablement looks like isn’t a theoretical question anymore. I see world-class teams arming reps with retrieval-first knowledge bases, sandbox environments, and objection libraries that evolve weekly. Meanwhile, selling against direct and implied competitors at once is the norm: your battlecard must cover “do nothing,” internal tools, adjacent categories, and new AI entrants—while you still remember why in-person enterprise rollouts still beat product-led motions for durable adoption.

    Planning horizons tighten in AI markets. How far out should a GTM leader be planning? I work a dual cadence: a rolling 6-week operating plan that’s ruthlessly tactical and a 2–3 quarter roadmap for coverage, enablement, and category storytelling. What a normal week looks like in hypergrowth blends customer time, pipeline triage, onboarding and enablement, deal engineering, and process tuning—always with one or two high-conviction bets that could bend the curve.

    References: Ahead: https://www.ahead.com; Amazon: https://www.amazon.com; Anthropic: https://www.anthropic.com; Attio: https://www.attio.com; Augment Code: https://www.augmentcode.com/; Cognition: https://cognition.ai; Cursor: https://cursor.com; Dani McCabe: https://www.linkedin.com/in/danielle-mccabe/; Datadog: https://www.datadoghq.com; GitHub Copilot: https://github.com/features/copilot; HubSpot: https://www.hubspot.com; Jeremy Powers: https://www.linkedin.com/in/jeremypowers/; JPMorgan: https://www.jpmorgan.com; Matt McClernan: https://www.linkedin.com/in/mattmcclernan/; MongoDB: https://www.mongodb.com; Nicole Rettinger: https://www.linkedin.com/in/nicole-rettinger-23b20465/; Notion: https://www.notion.com; OpenAI: https://openai.com; Parag Agrawal: https://www.linkedin.com/in/paragagr/; Parallel: https://parallel.ai; Snowflake: https://www.snowflake.com; University of Chicago: https://www.uchicago.edu; Windsurf: https://windsurf.com

    If you’re scaling an AI product today, pair a disciplined sales-led growth engine with the best of product-led growth: fast paths to proof, hands-on validation for builders, executive-level value mapping, and human moments that turn customers into advocates. That’s how you compress an eight-week cycle into five business days—and keep the expansion flywheel spinning.


    Book a consult png image
  • My Always‑On AI Team: How I Get Claude Agents to Tackle Work While I’m Offline

    My Always‑On AI Team: How I Get Claude Agents to Tackle Work While I’m Offline

    Most mornings I wake up to a to-do list that’s already been updated—because my always-on team of agentic AI assistants has been working while I sleep. I rely on Claude to orchestrate these agents so routine prep, follow-ups, and retrospectives never slip through the cracks.

    When a podcast recording hits my calendar, my podcast-manager agent (powered by Claude) automatically creates a podcast-interview-prep task with a concise summary of who I’m interviewing and what they are building. It also creates a transcript review document with the correct share settings. After the recording, it adds a task to my to-do list to share the transcript with the podcast participants.

    For sales, my sales-admin agent (also powered by Claude) prepares a sales-meeting-prep task with notes on who I’m meeting with, where they are in the sales process, and what I need to move the deal forward. After the call, it generates clear next-step tasks so momentum doesn’t stall.

    Every week, my coding-manager agent (still powered by Claude) compiles a report from my prior week’s coding sessions and offers targeted tips. It flags recurring mistakes or dead ends, shows how to avoid them, and suggests ways to work better with Claude. It’s the retrospective I never skip.

    In this walkthrough, I’ll explain how I get Claude to complete tasks for me while I’m away from the computer—and how I designed the system to balance power, safety, and cost control.

    I first explored this approach after seeing the rapid growth of OpenClaw. OpenClaw is an open-source "agent harness" that lets you configure personalized agents to act on your behalf. It’s incredibly promising, but the early wave of enthusiasm also revealed pitfalls: complex safety configuration, overly broad machine access (browser, terminal, files, credentials), third-party skills of varying quality, and surprise usage bills.

    After hearing one too many horror stories about wasted hours and unexpected charges, I set out to design a safer, more predictable way to capture the benefits of OpenClaw while managing risk and spend. That’s what led to my current agent setup.

    For transparency: I’m a long-time practitioner and a genuine fan of Claude Code. I have not received any compensation from Anthropic for writing about my approach. If that ever changes, I will disclose it—both because it’s required by the FTC in the U.S. and because it’s simply the right thing to do.

    An Overview of How My Agent Team Works

    Today, I run three specialized agents: a podcast manager, a sales admin, and a coding manager. As I invest more, I expect this team to grow—because the pattern scales cleanly across use cases.

    This system runs on four core components that keep everything reliable, auditable, and cost-aware.

    First, agent identity. I use a simple but powerful convention: an identity markdown file that tells the agent who it is, where its task folder lives, and provides context for the types of tasks it will do. This keeps scope tight and intent explicit—critical for safety and predictable automation.

    Second, the scheduler. I’m using MacOS’s built-in scheduler (via LaunchAgents). This is like cron, but runs with all your user permissions on Mac. That means I can run all of this under my Claude Code Max subscription or my ChatGPT/Codex subscription. The result is a dependable heartbeat for my AI workflows without relying on fragile cloud glue.

    Third, tasks. Each agent owns a dedicated folder of tasks. A task is a markdown file with frontmatter. That structure makes work items easy to create, parse, review, and version—perfect for repeatable automation with a human-in-the-loop safety net.

    Fourth, scripts. Each agent has its own scripts folder with utilities it can call on demand or that run on a schedule. These scripts are small, composable, and transparent—so I can evolve capabilities without ballooning risk or complexity.

    Agent identity, tasks, and scripts are saved in Obsidian—not Claude Code skills or agents. The scheduler runs on my always-on Mac Mini. The benefit of this is it just works across all of my devices and I can seamlessly switch between Claude Code, Codex—or any other coding CLI—as I need to. All it takes is updating my script that the scheduler uses.

    In practice, this architecture delivers exactly what I want from agentic AI: clarity of responsibility, strong guardrails, and outcomes that compound. My podcast manager keeps interviews buttoned up, my sales admin removes administrative drag, and my coding manager turns lessons learned into steady skill gains—all while I focus on higher-leverage product management work.

    If you’re considering a similar setup, start with a single agent and a narrow task, then expand. Keep identities crisp, scripts small, and schedules explicit. With that foundation, you’ll get the benefits of automation and delegation—without surrendering control.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Inside My Pricing Playbook: Building Value-Based Packaging That Balances Growth and Profit

    Inside My Pricing Playbook: Building Value-Based Packaging That Balances Growth and Profit

    Pricing looks deceptively simple from the outside; inside, it’s anything but. Over the years, I’ve learned that every price tag is really a strategic statement about value, priorities, and the future we’re building toward.

    At Fin, pricing and packaging (P&P) is more than a finishing touch. It’s a research problem, a forecasting challenge, a commercial decision, and ultimately, a strategic statement, requiring deep cross-functional work. We must balance the needs and wants of our customers, the value delivered by our product, and the broader vision we are building towards.

    Our approach keeps evolving as our product and market mature. I treat it as a living system—continuously informed by research, GTM learning, and customer behavior, never "set and forget."

    Here’s how I run the process in practice, especially when we launch something new that needs to be monetized, like Fin, our AI Agent. The work moves from qualitative discovery to quantitative validation to commercial modeling, with tight partnership across product, research, data science, finance, GTM, and engineering.

    Step 1: Foundational research

    I start by talking to buyers to understand their mental models of value. How do they define ROI? What pricing models do they expect in this category? What feels intuitive, and what feels off? This early discovery shapes two crucial choices: the pricing model and the pricing metric.

    The pricing model is the overall structure; value-based, usage-based, access-based, fixed fee, and so on. With Fin, we chose a value-based model: you only pay when Fin delivers value. Our research clearly showed that buyers don’t want to pay for usage, they want to pay for results.

    The pricing metric is the unit of value within that model, the unit we anchor pricing to. For Fin, the pricing metric is “outcomes.” An outcome is defined by Fin successfully handling a customer service query.

    Small definitional changes can dramatically alter how customers perceive value, so I obsess over details. Buyers rarely hand us the “right” model; they reveal how they evaluate value, and I translate that into a model and metric that align with their goals and expectations.

    Throughout, I loop in execs, finance, GTM, and engineering to ensure alignment before proceeding. Pricing choices cut across the business; they can’t be made in isolation.

    Step 2: Willingness to pay

    Once we have a model and metric, I quantify what the market will bear. This is where rigorous willingness-to-pay (WTP) research comes in, grounded in the language we validated through the qualitative work.

    Here’s the kind of framing I use in surveys to keep things concrete and consistent with our model and metric:

    You would only pay when Fin delivers an outcome (→ the model). An outcome is counted when the AI Agent resolves a customer query with no further help needed (→ the metric). Would you be willing to pay $X per outcome for Fin?

    The foundational qual is so important as a first step. It helps us decide what we should be asking about before we start asking how much people will pay. Without the qual ground work, you risk building a very convincing answer to the wrong question.

    The goal isn’t to find a perfect price. That doesn’t exist. The goal is to ground our discussions in the reality of the market.

    I use methods like Gabor-Granger and Van Westendorp to understand WTP and to shape a demand curve that informs strategy, not just a single number.

    This chart shows us what percentage of the market is willing to buy the product at various price points. The demand curve shows that 69% of buyers were willing to pay for the product at $0.86 per outcome, whereas only 39% were willing to pay at $1.42.

    The dashed line shows the price point at which revenue for the business would be maximized (by multiplying adoption by the dollar amount).

    This allows us to debate knotty questions like: What’s the right balance between growth and revenue? How sensitive is demand to price changes? At what price do we start losing the market? If we wanted to increase adoption, would lowering our prices by $X make a meaningful difference?

    Those conversations help me weigh customer value and business outcomes side by side. At this stage, decisions feel more tangible, but I don’t finalize a price until I’ve modeled the operational realities.

    Step 3: Modeling

    By now I have a validated model, a clear metric, and a strong WTP signal. Next I translate theory into a commercially workable plan—this is where data science and finance are indispensable.

    I start with a list price aligned to our strategy and commercial goals. Then I adjust for likely discounting to estimate realized price. Next, I analyze beta usage to project outcomes per customer by segment and derive average ARR. I combine usage projections with WTP to model attach rates across conservative-to-optimistic scenarios. Finally, I connect the dots in our long-range plan—logos, ARR, margins—iterating until the numbers and narrative cohere.

    The modeling step is important because willingness-to-pay data is somewhat theoretical. It reflects intent, not behavior. Modeling helps us bridge that gap.

    The goal of this step is to land on a price point recommendation, alongside forecasts for ARR and adoption. It allows us to understand the real business impact of the decisions we’re making.

    Alongside all of this, we need to ensure any decision we make falls in line with our pricing principles and broader business objectives.

    Step 4: Sign-off and execution

    With the analysis complete, I consolidate everything into a clear P&P recommendation for executive approval. Once approved, the real work begins: enabling sales, communicating changes to customers, instrumenting ROI proof points, and monitoring performance so we can learn and iterate.

    Do we run the full process every time?

    Not always. This is the ideal process, and I apply it end-to-end for the most material decisions. In reality, time and resource constraints require judgment; rigor should mirror impact. When uncertainty crops up midstream, I run scrappier, targeted research rather than forcing a linear path.

    The ongoing challenge

    As Fin’s breadth has expanded, our pricing system has had to evolve, too. For a while, modular pricing worked well—each product had its own logic tied to a crisp outcome. As we add more products, more Agent capabilities, and more outcomes, the question shifts from “what is the right P&P for this one product?” to “how does everything fit together into a coherent pricing system?”

    We must recognize that pricing isn’t something you set once and leave alone. As products evolve, especially in a world where AI is rapidly changing how value is created and delivered, it’s important to regularly step back and review the bigger picture, not just the component parts.

    For example, outcome-based pricing has served us well, particularly when our products were tightly tied to clear, measurable outcomes. But as our products become more varied, and as we continue building toward a broader platform, it becomes less straightforward to apply a single model cleanly everywhere.

    The challenge becomes less about replacing one model with another, and more about continually looking up and asking: what pricing philosophy best reflects the value we’re delivering today? And how do we deliver that philosophy in a way that still feels right for customers?

    In short, there is no finish line, pricing is never “done” – and that’s exactly how it should be.

    Why this work matters

    Pricing and packaging is often noticeable only when it goes wrong. A confusing model, a bad metric, or a price that feels disconnected from value. And we hear about those quickly.

    When pricing is done well, it becomes nearly invisible—but it still does a lot of work. It shapes how people perceive value, clarifies what they’re paying for, and makes the product easier to sell, easier to buy, and easier to scale. Most importantly, it forces us to be honest about what the product is really worth. That’s why I take it so seriously—and why I treat pricing as a product in its own right.


    Inspired by this post on The Intercom Blog.


    Book a consult png image