Category: Product Management

  • Turn Clicks into Revenue: How I Connect Behavior to Conversions with Persisted Properties

    Turn Clicks into Revenue: How I Connect Behavior to Conversions with Persisted Properties

    Every revenue story starts with a behavior: a tap, a scroll, a search, an “aha” moment. My job is to make sure we don’t just see those moments—we connect them directly to purchases so marketing, growth, and product can act with confidence.

    "Learn how Amplitude’s persisted properties and session analytics help marketing and growth teams connect behavioral data to purchase outcomes without engineering support." That sentence captures the promise I look for in a modern analytics stack: attribution that endures across sessions and analysis that moves at the pace of experimentation.

    Here’s how I frame it. Persisted properties let me carry forward the critical context behind a user’s journey—campaign touchpoints, audience attributes, and key in-product actions—so when a conversion happens, I can see the exact trail of behaviors that preceded it. Instead of losing signal between anonymous exploration and account creation, I keep the connective tissue intact and attribute outcomes to the interactions that truly mattered.

    Session analytics completes the picture. By understanding how users navigate within each visit—where they hesitate, what they repeat, and which micro-conversions predict success—I can link behavioral analytics to revenue outcomes with far greater precision. In practice, this means better funnels, smarter cohorts, and faster iteration cycles inside Amplitude analytics. When appropriate, I’ll also pair findings with session replay for qualitative context, but the core decision loops are driven by quantifiable behavior patterns.

    My operating rhythm is straightforward: I start by defining the purchase outcome clearly, then identify the minimal set of properties that must persist to tell the full attribution story. From there, I instrument events and validate that each persisted property is captured reliably across the journey. With clean inputs, I build conversion funnels, use cohorts to isolate high-intent behaviors, and apply driver analysis to separate correlation from causation. That’s how I isolate the behaviors that consistently generate qualified leads and high-value activations.

    The impact is both strategic and immediate. Marketing can test offers and channels with a unified analytics platform and know which touchpoints lift conversion, not just clicks. Growth can optimize user activation flows based on the behaviors that truly predict upgrade. Product can prioritize the moments that drive retention analysis instead of chasing vanity metrics. Most importantly, teams move from opinion to evidence without waiting in an engineering queue.

    In my experience, the real unlock comes when we use persisted properties to bridge pre-signup exploration with post-signup intent. That’s where product-led growth takes off: we can trace the first meaningful action to a downstream expansion event, tie it to a specific campaign or in-app guide, and then double down confidently. The result isn’t just better dashboards—it’s a tighter feedback loop between hypothesis, experiment, and measurable revenue impact.

    If you’re aiming to connect behavior to outcomes with clarity and speed, lean into persisted properties and session analytics. You’ll empower teams to discover the “moments that matter,” attribute them accurately to conversions, and iterate toward a repeatable growth engine—without slowing down your roadmap or depending on engineering for every new question.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Building AI-Era GTM and Analytics That Make Tough Calls Simple: A Product Leader’s Playbook

    Building AI-Era GTM and Analytics That Make Tough Calls Simple: A Product Leader’s Playbook

    I build "GTM and analytics products for the AI era—tools that make hard calls simple." That guiding principle shapes how I design systems, prioritize roadmaps, and lead teams: we earn speed by engineering clarity. My north star is straightforward—turn noisy signals into trusted insights that move the business, without adding friction for customers or chaos for teams.

    In practice, this starts with behavioral analytics. Whether you're using Amplitude analytics or a homegrown stack, the goal is the same: a unified analytics platform that captures clean events, enforces a clear taxonomy, and maps behaviors to outcomes. I focus on journey mapping, activation and retention analysis, and honest attribution so that every GTM motion ladders to real product usage, not vanity metrics.

    Decisions should be testable and reversible. I operationalize experimentation with A/B testing, feature flags, and guardrailed rollouts. Minimum detectable effect, power analyses, and anomaly detection aren’t academic exercises; they’re the foundation for credible learnings. When a result is unclear, we tighten hypotheses, shrink blast radius, and iterate quickly—biasing for learning while protecting the customer experience.

    AI changes the surface area of product work, but it doesn’t change the discipline. I treat LLMs for product managers as a capability, not a shortcut: eval-driven development, clear success criteria, and human-in-the-loop feedback remain non-negotiable. Privacy-by-design and data governance shape what we build; responsible prompts, retrieval strategies, and safety checks shape how it behaves in the wild. When the model is uncertain, the product should be honest about it—and offer a graceful fallback.

    Great GTM is a system, not a launch day. I connect product strategy to go-to-market strategy through product-led growth loops: in-app guides that meet users where they are, onboarding that accelerates time-to-value, and signals that identify true qualified intent. Driver trees tie adoption to monetization so that marketing, sales, and success work from the same picture—making trade-offs visible and reversible.

    Execution is where clarity compounds. Continuous discovery with product trios keeps problems crisp and solutions grounded in user truth. Product roadmapping and sprint planning follow outcome-first principles: fewer projects, clearer intents, stronger accountability. When teams can trace every backlog item to a metric that matters, they move faster with less oversight—and deliver results that stand up to scrutiny.

    When we do all of this well, decisions feel simple because the work behind them is rigorous. That’s the promise of modern GTM and analytics in the AI era: no theatrics, just dependable systems that turn possibilities into predictable progress.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Inside Lorikeet’s Dual-Agent Support: AI Humility, Faster Resolutions, and Safer Guardrails

    Inside Lorikeet’s Dual-Agent Support: AI Humility, Faster Resolutions, and Safer Guardrails

    I keep asking myself a simple, high-stakes question: what does it take to build an AI customer support agent that actually knows when it can't help — and says so?

    Recently, I dug into how Jamie Hall (Co-founder & CTO), Xharmagne Carandang, and Rona Wang at Lorikeet are answering that question for enterprises in regulated industries. Their target outcome is refreshingly concrete: an agent that responds like the best customer support you’ve ever had — one that knows you, gets things fixed, and hands off gracefully when it’s out of its depth.

    What resonated first was the honesty about early missteps. The team explored reflection tools and information dashboards before a healthcare startup reframed the job-to-be-done with a blunt directive: just help us clear the inbox. The earliest prototype wasn’t flashy — a command-line script spitting out a CSV — yet it paved the way for a scalable, measurable foundation.

    Today, the system runs on a dual-agent architecture: a Concierge that handles customer tickets end-to-end, and a Coach that helps customers configure, test, and continuously improve it. That split is more than a technical choice; it’s a product strategy that separates operational resolution from the meta-work of quality, guardrails, and evaluation.

    The backbone principle is "AI humility" — defaulting to a human handoff when uncertain. In practice, this isn’t about avoiding responsibility; it’s about preserving trust. When an agent signals uncertainty, it protects brand equity and customer experience while still accelerating the path to resolution.

    Lorikeet integrates with Zendesk and Intercom instead of replacing them. That decision respects the entrenched workflows and analytics ecosystems support leaders already rely on, and it reduces adoption friction while enhancing existing queues, macros, and reporting.

    The UX has evolved from a workflow builder to a conversational interface — and yet the blank chat box is still hard. Guardrails, prompts, and example-led onboarding help teams get started without forcing them to be prompt engineers. When you’re aiming for low cognitive load, a hybrid of guided steps and conversational nudges works better than a pure canvas.

    One of the most nuanced patterns is "resolution in the loop": how human agents unblock the AI without taking over a ticket. Instead of a full manual escalation, humans can provide a targeted nudge — a missing piece of data, a policy citation, a link to a system of record — and let the Concierge finish the job. That collaboration preserves productivity while keeping humans in the quality loop.

    Guardrails turned out to be deeply domain-specific — a cannabis company’s support tickets famously broke the team’s first approach. That’s a crucial lesson for regulated industries: policy nuance often lives in the edge cases. Lorikeet responded by making customer-configurable guardrails a first-class capability through the Coach interface.

    Even more interesting, they’re flipping the configuration workflow so customers define "what good looks like" before they ever write a standard operating procedure. By anchoring configuration in outcomes and test cases rather than prose SOPs, teams move faster, reduce ambiguity, and get to measurable quality earlier.

    The platform leans into eval-driven development: using AI to diagnose failure modes in traces and automatically suggest fixes. A "Trace Diagnosis Agent" surfaces root causes and remediation paths, shrinking the feedback loop from discovery to improvement.

    Culturally, the product engineering cadence is customer-obsessed: every engineer asks weekly what they learned from a customer. That lightweight ritual is a forcing function for continuous discovery and keeps prioritization tethered to real-world tickets, not just internal hypotheses.

    Here’s how I translate these lessons for any customer support AI strategy in regulated environments. First, ship with opinionated "AI humility" and measure handoffs as a quality feature, not a failure. Second, separate resolution from configuration via a dual-agent architecture so each can evolve independently. Third, integrate where your customers already work (Zendesk, Intercom) to accelerate time-to-value. Fourth, make guardrails domain-native and customer-configurable, and start with evals that define "what good looks like". Finally, invest in trace analysis and automatic fix suggestions to shorten the learning cycle.

    If you’re scaling support in healthcare, financial services, or any high-stakes domain, these patterns are practical, defensible, and ready to operationalize. Build the Concierge to resolve, empower the Coach to continuously improve, and let "resolution in the loop" bind humans and agents into one reliable system of service.


    Inspired by this post on Product Talk.


    Book a consult png image
  • The Counterintuitive Playbook for CLI Agents: Why Ruthless Subtraction Beats Feature Creep

    The Counterintuitive Playbook for CLI Agents: Why Ruthless Subtraction Beats Feature Creep

    I’ve learned the hard way that the fastest path to a reliable command-line agent is radical subtraction. "In the last month of developing Amplitude Wizard CLI, we cut more than we added. Learn less is more when it comes to building CLI agents." That decision was less about minimalism and more about product strategy: constraints sharpen behavior, clarify intent, and raise trust.

    When I evaluate agentic AI systems, especially those that act on developer environments, I start by asking what the agent must never do. By establishing hard guardrails first, the design naturally converges on an opinionated, safe, and teachable interface. Every additional flag, tool, or permission expands the blast radius; every removal shortens the path to first success.

    For CLI agents, the most valuable product choice is a narrow toolset with sane defaults. Opinionated workflows reduce cognitive load and failure modes, while clear human override points keep users in control. I prefer a bias toward idempotent actions, reversible changes, and explicit confirmation gates for anything destructive. If a feature can’t explain itself in a single, crisp sentence in the help text, it likely doesn’t belong.

    Security and reliability flow from limits. Progressive permissioning, scoped credentials, and time-bounded tokens prevent the agent from wandering. Dry-run modes build confidence without side effects. When a user can reason about what the agent will and won’t do, adoption accelerates—and support tickets plummet.

    Observability is the other half of trust. I instrument "Agent Analytics" across every run: inputs, tool choices, durations, outcomes, and error patterns. Those signals reveal where the agent gets confused, which steps users abandon, and which prompts need pruning. With that loop in place, "less is more" stops being a philosophy and becomes an evidence-backed operating model.

    I anchor the roadmap in eval-driven development. Before adding a capability, I define a measurable task, a success threshold, and the smallest viable interface to reach it. If the capability can’t lift completion rate, time-to-first-success, or re-run stability, it waits. That simple discipline protects the experience from feature creep and preserves velocity in CI/CD.

    Under the hood, I design for a retrieval-first pipeline and careful context window management. The agent should fetch only the minimally relevant facts, present a compact plan, and execute predictably. Thoughtful prompt engineering helps—but prompts are not a substitute for clear boundaries, deterministic tool contracts, and robust error handling.

    Documentation is product. I maintain docs-as-code with runnable examples that mirror the golden paths. When the docs and the CLI disagree, the CLI changes—never the docs. This creates an internal forcing function: if we can’t document it simply, we probably shouldn’t ship it.

    My litmus test for any proposed addition is simple: does this make the mental model smaller? If not, cut it, make it progressive, or hide it behind a clearly named subcommand. Defaults should be boring, safe, and fast. Advanced power should be opt-in and discoverable without overwhelming new users.

    The paradox of agentic AI is that capability grows as surface area shrinks. By removing distractions, we amplify signal, increase repeatability, and earn the right to add the next carefully chosen step. The result is a CLI agent that feels sharp, dependable, and—most importantly—useful on day one.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Inside Growth Engineering at Amplitude: My Playbook to Accelerate Product-Led Growth with Analytics

    Inside Growth Engineering at Amplitude: My Playbook to Accelerate Product-Led Growth with Analytics

    I’m often asked how leading growth teams turn insights into compounding business results. Few organizations illustrate this better than the Growth Engineering team at Amplitude. Drawing from their example and my own experience, I’ve distilled a practical playbook that any product organization can use to move faster, learn smarter, and scale impact.

    At the core is a disciplined blend of behavioral analytics and rapid experimentation. Amplitude analytics, as part of a unified analytics platform, enables precise event instrumentation, cohorting, and funnel analysis that surface where activation and retention truly break down. When I combine those signals with qualitative insights, I can prioritize fewer, higher-leverage bets that directly improve user activation and long-term retention.

    My growth loop always starts with clearly stated hypotheses, success metrics, and A/B testing power considerations, including a defined minimum detectable effect (MDE). I pair feature flags with staged rollouts to de-risk changes and accelerate iteration without compromising stability. This cadence turns every release into a learning opportunity, compounding knowledge across teams and time.

    Cross-functional execution is non-negotiable. I rely on tight “product trios” collaboration—product, engineering, and design—so we can ship small, measurable changes quickly, observe outcomes, and then widen scope with confidence. The Growth Engineering mindset keeps us grounded in real user behavior, not assumptions, and ensures our roadmap is fueled by evidence rather than opinion.

    Consider onboarding. Instead of a single redesign, I prefer a series of targeted experiments—tweaking progressive disclosure, refining tooltip design, and adding in-app guides where users predictably stall. Each test is instrumented end to end, from first action to activation event, and validated via retention analysis to confirm that short-term lifts turn into durable habit formation.

    When prioritizing, I map ideas to driver trees tied to our North Star metric. Behavioral analytics tell me which levers—time-to-value, depth-of-use, or frequency—will yield the biggest gain. That clarity focuses engineering effort on interventions that actually shift outcomes, not just outputs.

    If you’re building your own Growth Engineering capability, start with three moves: instrument ruthlessly so you can trust your signals, adopt feature flags to speed safe experimentation, and hold teams accountable to measurable, user-centric outcomes. Do this consistently and you’ll feel the compounding effect—faster learning cycles, stronger product-market fit signals, and a durable engine for product-led growth.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Is Technology Still Net Positive? A Product Leader’s Reckoning and Playbook for Humane Growth

    Is Technology Still Net Positive? A Product Leader’s Reckoning and Playbook for Humane Growth

    I’ve spent my career building products on top of the internet, championing social media, and now scaling AI. Lately, I keep returning to an uncomfortable but necessary question: are we still building a net positive future—or have we drifted into something else entirely?

    A recent long-form conversation in my podcast queue challenged me to do a deeper self-audit. If you want to hear the debate that sparked this reflection, you can listen on: Spotify | Apple Podcasts. What follows is my synthesis as a product management leader: the hard truths, the hopeful paths forward, and the practical actions I’m taking with my teams.

    The moment that hit me hardest was a family member’s blunt assessment that the internet has become “net negative.” That phrase landed like a wake-up call—a reminder that those of us inside tech often operate in an echo chamber. We see our roadmaps, our metrics, our progress; the rest of the world experiences the second-order effects. As a leader, I have to seek out those outside-in perspectives with the same rigor I apply to any product discovery practice.

    Another truth I can’t ignore: somewhere along the way, parts of our industry slid from “make people’s lives better” to “extract maximum value at any human cost.” You can see it in incentives that prioritize growth at all costs, in waves of layoffs that treat people as an expense line, and in platform behaviors that resemble a modern tycoon era. This isn’t just a moral critique—it’s a product strategy risk. Extractive models erode trust, weaken retention, and invite regulatory and reputational headwinds that no amount of optimization can out-execute.

    The loneliness crisis is real, and technology has too often replaced human connection instead of augmenting it. Spend a week in San Francisco and you’ll notice what I call “isolation by design”—QR-code menus, autonomous Waymos, frictionless everything, but fewer genuine human moments. It’s efficient, yes, but alienating. No algorithm can substitute for physical touch, care, and community. As builders, we should design products that create on-ramps to real-world connection, not cul-de-sacs of infinite scroll.

    We still have agency. “Don’t be evil” shouldn’t be a nostalgic slogan; it should be a minimum bar. Responsible product management means being a citizen of the ecosystems we influence: naming trade-offs clearly, instrumenting for externalities, and building AI risk management into our operating cadence. It also means stepping outside the industry narrative to ask neighbors, parents, teachers, and small business owners how our products actually land in their lives.

    One idea that gives me hope is “mom and pop tech”: AI-enabled, hyper-local tools crafted for specific neighborhoods and communities. Think “inch wide, mile deep”—software that solves a real problem for a defined community rather than chasing a horizontal total addressable market. Consider ride share. The extractive platform playbook maximized liquidity but squeezed drivers and frayed local fabric. A community-owned alternative could optimize for safety, fair wages, and neighborhood vitality over blitz-scaled margins. That’s civic tech with a viable product strategy.

    I’m also watching how social norms evolve. At a recent Elternabend at a German primary school, parents collectively agreed to delay smartphones until age 11 or 12—a striking shift from just five years ago when many 7–8 year olds had devices. Culture moves, sometimes faster than we expect. Product-led growth that ignores cultural momentum (or ethical guardrails) is fragile growth.

    So what do we do on Monday morning? First, rebuild our discovery muscles outside the echo chamber: continuous discovery with the people most affected by our products, not just our power users. Second, measure what matters: add well-being, community impact, and qualitative trust signals to the same dashboards that track activation and retention. Third, resist technology FOMO—choose fewer bets and go deeper, especially where AI can be applied responsibly to unlock real-world value. Fourth, cultivate communities of practice that normalize responsible experimentation, privacy-by-design, and transparent communication. Finally, narrate the change: as product people, we are educators as much as we are builders; our stories shape what teams believe is possible.

    If you’re looking for frameworks to anchor this work, revisit classics like Bowling Alone: The Collapse and Revival of American Community for context on social capital, and pair that with modern conversations on local resilience and community spaces. The future isn’t written yet. With clear principles, careful incentives, and the courage to narrow our scope in service of depth, we can still build technology that strengthens the bonds that make life worth living.

    I’d love to hear how you’re approaching this in your organization—especially examples of “mom and pop tech,” AI Strategy in service of community, or product strategies that trade a little scale for a lot of human good. Join the conversation in the comments.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Prompt Like a Pro: Three Battle-Tested Tips for Amplitude Global Agent Success

    Prompt Like a Pro: Three Battle-Tested Tips for Amplitude Global Agent Success

    When I guide teams building agentic AI features, I’ve seen a single prompt turn Amplitude Global Agent into either a world-class analyst or a well-meaning rambler. The difference isn’t magic—it’s method. With the right structure and iteration, we consistently get faster, clearer insights that stand up to product and analytics scrutiny.

    AI has gotten really good, but success still depends on the quality of your prompts. Explore three best practices for prompting in Amplitude Global Agent.

    Tip 1 — Define the role, goal, and guardrails. I begin every prompt by stating the agent’s role (for example: “You are a product analyst”), the business objective (“identify activation drop-offs by cohort”), and the boundaries (“use only Amplitude analytics events and properties provided; return JSON with metric, segment, timeframe”). This simple pattern reduces ambiguity, improves context window management, and yields outputs I can compare across runs.

    Tip 2 — Ground the model with concrete context and examples. Agent outputs improve dramatically when I supply the exact data it should reference: event names, properties, segments, filters, and timeframes. I often include a short example—one ideal question and one ideal answer—to anchor tone, structure, and depth. Think retrieval-first pipeline: feed the agent authoritative snippets (definitions, dashboards, prior queries) rather than hoping it guesses. That’s how I cut hallucinations and make results reproducible for LLMs for product managers.

    Tip 3 — Iterate with measurement, not vibes. I version prompts, A/B test variants, and log inputs/outputs so I can score quality with lightweight evals (accuracy against known answers, clarity, and actionability). Over time, a small library of “winning” prompts emerges for common AI workflows—activation analysis, retention cohorts, anomaly detection—so the team can move from tinkering to repeatable performance. This is where Agent Analytics practices pay off: we inspect outcomes, not just outputs.

    A practical starter structure I use: Role and Audience; Objective and Success Criteria; Data Context (events, properties, segments, timeframe); Constraints (sources, methods, privacy); Output Format (tables/JSON, fields, length); Examples (one good Q/A); and Fallbacks (what to do when data is insufficient). Even written as plain language, that scaffold reliably steers Amplitude Global Agent to precise, defensible answers.

    The emotional arc here is familiar: when the agent nails a complex funnel question in one pass, the team gets that “oh wow” moment; when it meanders, morale dips. Clear prompting turns those spikes of delight into a steady cadence of wins—less rework, faster learning loops, and cleaner handoffs from discovery to delivery. In short, invest in prompt engineering once, and you compound gains across every analysis session.

    If you’re just getting started, pick one critical question (for example, activation or retention), apply the three tips above, and commit to two to three prompt iterations with scoring. Within a single sprint, you’ll have a robust template you can reuse and adapt—helping Amplitude Global Agent deliver trustworthy insights at the speed your product strategy demands.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Beyond Accuracy: How I Evaluate AI Customer Service Agents That Delight and Scale

    When teams evaluate AI Agent options for customer service, I often see the rigor aimed at the wrong subset of criteria. After leading and observing dozens of proof of concept (POC) efforts with our customers and prospects, I understand why performance—accuracy scores, resolution rates, and benchmark tests on curated datasets—soaks up most of the attention. But those indicators alone won’t guarantee success once you leave the sandbox and face real customers.

    If your POC only proves that the AI “works,” you’re missing the bigger picture. Here’s what else I look for to make the best long-term decision.

    How does it handle your real-world setup?

    Performance is table stakes, but it has to reflect the messiness of an actual support environment. The best-performing Agents don’t just get answers right—they exhibit resilient, human-like behavior under pressure. I watch how the Agent behaves when it doesn’t know an answer: does it recover or spiral? Does it stay on track through multi-step requests, and how gracefully does it hand off to human agents? If your knowledge base depends on a retrieval-first pipeline, test cross-source retrieval and grounding—not just single-document lookups.

    When I build evaluation scenarios, I put the Agent through its paces with a broad, realistic mix:

    • Multi-turn queries that require the Agent to carry context across a conversation, not just answer isolated questions.
    • Vague or fragmented inputs, like typos, grammatical errors, and incomplete questions, because that’s how customers actually write.
    • Edge cases and sensitive scenarios, like billing disputes, frustrated customers, and questions that sit at the boundary of what the Agent is trained on.
    • Different phrasings of the same question. An Agent that handles one version well but fails on a rephrasing has a knowledge problem, not a performance problem.
    • Queries that require pulling from multiple knowledge sources. Real issues are rarely answered by a single help article, and an Agent that can only handle single-source questions will hit a ceiling fast.
    • Multilingual conversations, if your customer base requires it. Performance can vary significantly across languages and it’s better to discover that in testing than in production.

    This preparation is worth the effort. Any Agent can look impressive in a demo; what matters is how it holds up as part of your team, serving your customers in production.

    What does it feel like to interact with the Agent?

    Two AI Agents can post the same quantitative scores—resolution rates, containment rate, and more—and still deliver very different customer experiences. Resolution rate tells me whether the Agent finishes conversations; it says nothing about how customers felt during them. I deliberately assess the experience, not just the outcome, because conversation design shapes trust and brand perception.

    Here’s what I look for to ensure the AI Agent is enjoyable to interact with:

    • Is the tone natural and on-brand, or does it feel robotic and generic?
    • Does it build trust early in the conversation, or does it create friction that makes customers want to immediately request a human?
    • When it doesn’t know the answer, does it handle that gracefully?
    • When it hands off to a human, is that transition seamless, or does the customer feel abandoned?

    As George Dilthey at Clay put it when evaluating their AI setup: “Keep what’s important to your business up front and center. For us, that was transparency and control over the customer experience.”

    That framing is exactly right. The Agent represents your brand in every conversation. Customers don’t experience “accuracy,” they experience conversations. An Agent that’s technically accurate but tonally off-brand will erode customer trust over time.

    I make the experience dimension explicit in my POCs. I have people on my team—and when possible, a small cohort of real customers—interact with the Agent under realistic conditions. Then I ask how it felt, not just whether it worked.

    Can you keep improving it after launch?

    This is the dimension most teams don’t evaluate at all, and it’s possibly the most important one. Choosing an Agent that works today and ensures you can continuously improve the customer experience over time requires more than a functional demo. You’re buying a system that must get better every week, not just during the first sprint.

    The feedback loop

    Can your team easily review conversations and identify where the Agent is underperforming? Can you pinpoint specific gaps (missing knowledge, incorrect tone, poor handoff decisions) and act on them quickly? The faster the loop between “something isn’t working” and “we’ve fixed it,” the more value compounds over time. In practice, that means instrumenting conversations, leveraging Agent Analytics, tagging misroutes and tone slips, and running targeted evals on known failure modes.

    The speed of iteration

    When you identify a gap, how quickly can you address it? This is partly a question of tooling (how easy is it to update knowledge, refine guidance, adjust behavior?) and partly a question of team capability. The teams getting the most out of AI are the ones that have changed how they operate and made continuous improvement a part of their everyday work. They’ve committed to going all-in for the long term, not just the first few weeks when launching their AI Agent. We treat this as eval-driven development: automate evaluations that mirror real tickets, tighten prompt engineering and retrieval settings, and ship small fixes daily.

    The vendor partnership

    The vendor behind the Agent matters just as much as the solution itself. You’re choosing a partner for transformation that will help you evolve how your business delivers customer experience. Ask:

    • How does customer feedback influence the product roadmap, and can they show you examples?
    • If you have feedback on limitations or weaknesses, do they engage transparently or get defensive?
    • What kind of support will you get post-launch?
    • Are they shaping where AI customer experience is going, or reacting to what others are building?

    How a vendor responds to those questions tells you more about the long-term relationship than any benchmark result.

    What a good POC proves

    If your POC only proves “the AI works,” you haven’t done enough. A strong proof of concept tests performance in realistic conditions, evaluates the experience from the customer’s perspective, and validates the system that will support continuous improvement after launch. Done well, it sets you up for long-term operational success and builds organizational AI readiness—not just a flashy demo.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • From Ed‑Tech Roots to Core Analytics: Product Leadership Lessons Inspired by Amplitude

    From Ed‑Tech Roots to Core Analytics: Product Leadership Lessons Inspired by Amplitude

    I often look to Amplitude and its core analytics product when I’m coaching teams and refining our own product strategy. The discipline required to turn raw event streams into actionable behavioral analytics mirrors what I expect from empowered product teams: precise instrumentation, clear decision points, and a relentless focus on outcomes.

    Some of the most effective product managers I meet began their careers in the ed-tech and recruiting space. That early-stage, resource-constrained environment cultivates sharp prioritization instincts and a comfort with ambiguity—muscles that translate directly into building scalable analytics capabilities without losing speed or customer empathy.

    In my practice, I anchor discovery and roadmap decisions in driver trees that connect north-star outcomes to measurable input metrics. That structure keeps product trios aligned on the questions that matter: What behaviors predict retention? Where does user activation stall? Which experiments will meaningfully shift our core metrics? Paired with continuous discovery, this approach ensures we ship learnings—not just features.

    Tactically, I encourage teams to combine Amplitude analytics with a unified analytics platform mindset: centralize event taxonomy, standardize cohort definitions, and operationalize retention analysis alongside acquisition and activation. When we treat analytics as a product, not a tool, we unlock faster iteration loops, smarter A/B testing, and clearer trade-offs between depth and breadth in our product surface area.

    Product-led growth hinges on narratives supported by evidence. I’ve found that clear opportunities emerge when we map journeys, quantify friction with session replay and funnels, and then validate solution ideas through small, reversible bets. This is where outcome-based roadmapping shines: we commit to moving a metric, not to a specific feature, and we let the data guide sequencing.

    At the leadership level, I focus on execution readiness: crisp problem statements, decision logs, and CI/CD practices that reduce batch size and increase deployment frequency. The goal isn’t shipping more; it’s compounding learning. When teams internalize this mindset, analytics stops being a dashboard and becomes a competitive advantage.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Stop Losing Users: How a Second Message and Prompt Audit Drive 2–3x Retention

    Stop Losing Users: How a Second Message and Prompt Audit Drive 2–3x Retention

    Default prompts are quietly sabotaging agent retention. I learned this the hard way while reviewing early funnels for our voice and chat agents—engagement looked great at the greeting, but the moment the agent stopped after a single reply, the conversation flatlined. The fix wasn’t a fancy LLM trick; it was a disciplined second message and a rigorous audit of defaults across every entry point.

    When an AI agent opens with a generic, low-friction greeting and then waits, users hesitate. Cognitive load rises, intent stays fuzzy, and drop-off follows. A thoughtful second message—delivered quickly, with clarity and options—reduces ambiguity and gives people a low-effort path to progress. It’s a small behavioral nudge that pays off in outsized retention gains.

    Here’s the pattern that consistently works for me. First, keep the initial default prompt short, confident, and specific to the channel and task domain. Then ship a fast follow-up if the user hesitates for a few seconds. That second message should clarify what the agent can do, present 2–3 concrete choices, and invite free-form input. I’ve repeatedly seen this simple sequence unlock a 2–3x retention lift in early sessions, especially for first-time users.

    Auditing default prompts is where the leverage lives. I inventory every ingress—web widget, IVR, SMS, in-app, help center—and catalogue the exact default system, developer, and user-facing prompts. Then I inspect turn-1 and turn-2 transcripts in Agent Analytics to quantify where users stall: time-to-first-intent, clarification rate, option selection rate, and completion. This makes the drop-off visible and turns “vibes” into data we can A/B test.

    Designing the second message is a conversation design exercise, not a copy tweak. My recipe: empathize with the user’s likely uncertainty, constrain scope so the agent appears capable, and apply choice architecture. For voice AI agents, I keep it shorter, use confirmation questions, and bias toward read-back for accuracy. For chat, I include tappable options and examples that mirror top intents. The goal is momentum without feeling pushy.

    Operationally, I run controlled A/B tests on default and second-message variants, sized to a realistic minimum detectable effect. I segment by source (ad, organic, support), device, and use case, because the winning prompt for sales qualification rarely matches the one for customer support. With proper instrumentation in our analytics stack, we track retention curves over the first 3–5 sessions, not just single-session reply rates, to avoid optimizing for chatter over outcomes.

    Strong prompt engineering underpins the experience. I keep system prompts stable and explicit about persona, tone, and refusal behavior; manage the context window so examples don’t drown live intent; and use a retrieval-first pipeline when domain knowledge matters. The most expensive mistake I see is shipping defaults like “How can I help you?” without guardrails or examples—great for demos, bad for real users.

    If you’re starting fresh, begin with a prompt audit this week: list all defaults, map them to top intents, and pair each with a channel-appropriate second message. Instrument the funnel, launch two variants, and set a crisp success metric (e.g., turn-2 continuation rate to task start, then task completion). This is one of those rare changes that is simple to ship and compounds across onboarding, activation, and long-term retention.

    The takeaway is straightforward: don’t let your best work stall after the first reply. A disciplined second message and a focused default prompt audit will lift engagement, reduce ambiguity, and create the kind of early momentum that sustains retention over time.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Supercharge Core Web Vitals with Amplitude’s Global Agent: Faster Rankings, Happier Users

    Supercharge Core Web Vitals with Amplitude’s Global Agent: Faster Rankings, Happier Users

    I measure product health by a simple equation: speed plus clarity equals trust. That’s why I prioritize Core Web Vitals and search performance together—because the fastest path to better UX and higher rankings is a closed loop between measurement, diagnosis, and action. Standardizing on Amplitude’s Global Agent with Amplitude AI Agents let my teams compress that loop from weeks to hours, and in many cases, to minutes.

    Learn how to track your web vitals and page rankings faster with Amplitude AI Agents and improve your site’s user experience and SEO rankings. That goal sounds ambitious, but with the right instrumentation and analytics workflow, it becomes a repeatable operating rhythm rather than a one-off project.

    Here’s what changed for us with Amplitude’s Global Agent: a single, consistent way to capture performance signals across pages and journeys, unified context for every session, and a lightweight footprint that doesn’t get in the way of speed. By centralizing measurement, we eliminated blind spots and gave product, growth, and engineering one shared truth for Core Web Vitals and behavioral analytics.

    My practical playbook is straightforward: 1) Establish a performance baseline for Core Web Vitals on key templates and critical user paths. 2) Segment results by device, location, acquisition channel, and content type to surface where users actually feel the friction. 3) Connect those vitals to downstream behaviors—scroll depth, engagement, and conversion—so we prioritize fixes that move business outcomes, not just lab scores. 4) Use feature flags and A/B testing to ship improvements safely and quantify uplift. 5) Close the loop with Agent Analytics to keep learnings visible and actionable.

    Operationally, we rely on anomaly detection to flag regressions early, CI/CD guardrails to prevent performance slips at deploy time, and observability plus session replay to accelerate root-cause analysis. This combination reduces mean time to resolution, protects page experience during fast iteration cycles, and helps us avoid trading UX for speed—or vice versa.

    The strategic benefit is compounding: better Core Web Vitals improve user perception and increase engagement, which strengthens SEO signals and, ultimately, page rankings. With a unified analytics platform in place, we can spotlight the few improvements that create outsized gains, then scale those patterns across the site with confidence.

    If your roadmap includes faster pages, stronger rankings, and happier users, align your teams around this simple loop: measure precisely, diagnose quickly, experiment safely, and learn continuously. Amplitude’s Global Agent and Amplitude AI Agents give you the instrumentation and insight to make that loop your competitive advantage.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • My Always‑On AI Team: How I Get Claude Agents to Tackle Work While I’m Offline

    My Always‑On AI Team: How I Get Claude Agents to Tackle Work While I’m Offline

    Most mornings I wake up to a to-do list that’s already been updated—because my always-on team of agentic AI assistants has been working while I sleep. I rely on Claude to orchestrate these agents so routine prep, follow-ups, and retrospectives never slip through the cracks.

    When a podcast recording hits my calendar, my podcast-manager agent (powered by Claude) automatically creates a podcast-interview-prep task with a concise summary of who I’m interviewing and what they are building. It also creates a transcript review document with the correct share settings. After the recording, it adds a task to my to-do list to share the transcript with the podcast participants.

    For sales, my sales-admin agent (also powered by Claude) prepares a sales-meeting-prep task with notes on who I’m meeting with, where they are in the sales process, and what I need to move the deal forward. After the call, it generates clear next-step tasks so momentum doesn’t stall.

    Every week, my coding-manager agent (still powered by Claude) compiles a report from my prior week’s coding sessions and offers targeted tips. It flags recurring mistakes or dead ends, shows how to avoid them, and suggests ways to work better with Claude. It’s the retrospective I never skip.

    In this walkthrough, I’ll explain how I get Claude to complete tasks for me while I’m away from the computer—and how I designed the system to balance power, safety, and cost control.

    I first explored this approach after seeing the rapid growth of OpenClaw. OpenClaw is an open-source "agent harness" that lets you configure personalized agents to act on your behalf. It’s incredibly promising, but the early wave of enthusiasm also revealed pitfalls: complex safety configuration, overly broad machine access (browser, terminal, files, credentials), third-party skills of varying quality, and surprise usage bills.

    After hearing one too many horror stories about wasted hours and unexpected charges, I set out to design a safer, more predictable way to capture the benefits of OpenClaw while managing risk and spend. That’s what led to my current agent setup.

    For transparency: I’m a long-time practitioner and a genuine fan of Claude Code. I have not received any compensation from Anthropic for writing about my approach. If that ever changes, I will disclose it—both because it’s required by the FTC in the U.S. and because it’s simply the right thing to do.

    An Overview of How My Agent Team Works

    Today, I run three specialized agents: a podcast manager, a sales admin, and a coding manager. As I invest more, I expect this team to grow—because the pattern scales cleanly across use cases.

    This system runs on four core components that keep everything reliable, auditable, and cost-aware.

    First, agent identity. I use a simple but powerful convention: an identity markdown file that tells the agent who it is, where its task folder lives, and provides context for the types of tasks it will do. This keeps scope tight and intent explicit—critical for safety and predictable automation.

    Second, the scheduler. I’m using MacOS’s built-in scheduler (via LaunchAgents). This is like cron, but runs with all your user permissions on Mac. That means I can run all of this under my Claude Code Max subscription or my ChatGPT/Codex subscription. The result is a dependable heartbeat for my AI workflows without relying on fragile cloud glue.

    Third, tasks. Each agent owns a dedicated folder of tasks. A task is a markdown file with frontmatter. That structure makes work items easy to create, parse, review, and version—perfect for repeatable automation with a human-in-the-loop safety net.

    Fourth, scripts. Each agent has its own scripts folder with utilities it can call on demand or that run on a schedule. These scripts are small, composable, and transparent—so I can evolve capabilities without ballooning risk or complexity.

    Agent identity, tasks, and scripts are saved in Obsidian—not Claude Code skills or agents. The scheduler runs on my always-on Mac Mini. The benefit of this is it just works across all of my devices and I can seamlessly switch between Claude Code, Codex—or any other coding CLI—as I need to. All it takes is updating my script that the scheduler uses.

    In practice, this architecture delivers exactly what I want from agentic AI: clarity of responsibility, strong guardrails, and outcomes that compound. My podcast manager keeps interviews buttoned up, my sales admin removes administrative drag, and my coding manager turns lessons learned into steady skill gains—all while I focus on higher-leverage product management work.

    If you’re considering a similar setup, start with a single agent and a narrow task, then expand. Keep identities crisp, scripts small, and schedules explicit. With that foundation, you’ll get the benefits of automation and delegation—without surrendering control.


    Inspired by this post on Product Talk.


    Book a consult png image