Tag: prompt engineering

  • We Open-Sourced Our AI Skills Library: Reusable Skills to Supercharge Product Velocity

    We Open-Sourced Our AI Skills Library: Reusable Skills to Supercharge Product Velocity

    We open-sourced our AI Skills library. Here's what we built, why we built it, and how to use it. I’m sharing the approach we’ve used to move faster with more confidence across product discovery, prototyping, and production—while keeping governance, safety, and measurement front and center.

    What we built is a modular, open-source library of “skills” for agentic AI and LLM-powered workflows—things like retrieval and grounding, summarization, classification, tool-use, data enrichment, safety guardrails, and evaluation harnesses. Each skill follows consistent interfaces and conventions so teams can compose them like building blocks, swap implementations without breaking flows, and standardize best practices across products.

    Why we built it is simple: we kept rebuilding the same core capabilities across experiments and teams. Standardizing these skills accelerates time-to-value, reduces integration risk, and helps product trios collaborate with a common language. It also lets us scale what works—prompt patterns, eval datasets, telemetry—so every new initiative starts on third base instead of at bat.

    How to use it in practice: start by running a quick-start example to see a baseline skill chain in action. Then compose your own flow by selecting skills (for example, retrieval + summarization + tool call), configure them with environment variables and guardrails, and wire in evaluation datasets. From there, instrument the pipeline with metrics so you can compare variants and promote the best-performing chain to your main app or API.

    In a typical stack, the library dovetails with analytics and experimentation: ship skill variants behind feature flags, measure impact with A/B testing, and observe runtime behavior with logs and traces. CI/CD hooks let you run evals pre-merge, and production dashboards keep an eye on latency, cost, and outcome quality. This creates a virtuous loop where ideas move from prototype to production with clear evidence.

    Common use cases include customer support summarization and triage, lead scoring and enrichment, anomaly detection in product telemetry, and automated content workflows. Because the skills are composable, you can try multiple retrieval-first strategies, swap prompt templates, or add tools (search, RAG, calculators, connectors) without rewriting everything from scratch.

    Governance and safety are built in. Guardrails handle PII redaction, content policy checks, and rate limiting; configs make it easy to enforce privacy-by-design; and evaluation harnesses encourage an eval-driven development culture. The result is faster iteration without sacrificing data governance or reliability.

    If you want to contribute, add a new skill, improve prompts, share eval datasets, or open an issue with a scenario you want supported. The roadmap focuses on richer retrieval adapters, better test fixtures, and deeper observability so teams can debug and optimize complex chains with confidence.

    I’m excited to see how you’ll use the library to accelerate your roadmap. Clone it, run a quick start, and compose your first workflow today—then measure, iterate, and scale what works. I’ll keep sharing patterns, learnings, and updates as we grow the skills catalog and sharpen the tooling.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Decode How Amplitude AI Thinks: Proven Workflows to Get Actionable, High-Accuracy Results

    Decode How Amplitude AI Thinks: Proven Workflows to Get Actionable, High-Accuracy Results

    I’ve learned that the fastest way to unlock better AI outcomes is to understand how the system reasons, then partner with it deliberately. In product organizations, that means treating AI like a capable collaborator with a transparent process, clear inputs, rigorous checks, and measurable success criteria. When I work this way, my teams ship insights and experiments faster—and with far fewer surprises.

    Discover how Amplitude AI thinks and best practices for working with it. Partner with AI at each step of its process for more accurate, actionable outputs.

    Here’s the mental model I use. AI moves through a series of steps: clarify the goal, ingest context, retrieve and rank relevant information, reason through candidate solutions, draft an answer, self-critique, and refine. My job is to actively guide each step. I define the objective precisely, supply high-signal context, specify constraints, ask for structured reasoning, and require a quality bar before anything ships to stakeholders.

    Start by setting intent and success criteria. I write a one-sentence objective (“what problem are we solving now”), then define the evaluation rubric (“what good looks like”) up front. This small habit powers eval-driven development: it keeps AI outputs aligned with product goals, not just plausible-sounding text. I’ll often include target metrics and guardrails, such as confidence thresholds or required evidence from “Amplitude analytics.”

    Next, I curate the context. For analytics use cases, I provide event taxonomies, metric definitions, segments, and recent behavioral analytics trends to ground the model. A retrieval-first pipeline helps here: I scope the corpus, trim noise, and apply context window management so the model sees only what’s essential. The result is sharper, faster answers that map to our real data model and “unified analytics platform.”

    Then I shape the prompt. I use concise role framing, 1–3 high-quality exemplars, and explicit constraints (format, length, tone, citation requirements). I also ask the model to show its reasoning with a short, labeled scratchpad and to state uncertainties. This is practical prompt engineering—not magic—designed to make reasoning inspectable and reproducible across “AI workflows.”

    When tools are available, I encourage agentic AI patterns: let the system plan, call functions, and iterate. With “Amplitude AI,” I ask it to propose the next best analysis (e.g., segment drill-down, funnel step attribution, or anomaly detection), run it, summarize findings, then reflect on whether the next step changes. If you’re using “Amplitude MCP,” formalize these actions as callable tools so the model can chain them reliably.

    Quality is never an afterthought. I build lightweight evaluations into every loop: compare the model’s output against the rubric, check factual grounding, and A/B test alternative prompts for clarity and conversion where appropriate. Over time, these evaluations become our regression suite, giving us confidence as data, prompts, or model versions evolve. This discipline keeps LLMs for product managers aligned with shifting business priorities.

    Finally, I turn insights into action. I ask “Amplitude AI” for decision-ready artifacts—clear hypotheses, prioritized opportunities, and concrete next steps owners can execute. I require the model to cite the specific supporting events or segments and to flag assumptions. That last step is crucial: it invites human judgment where it matters and prevents automation from outpacing accountability.

    This approach doesn’t slow teams down; it speeds them up with focus. By guiding each step—intent, context, reasoning, tools, and evaluation—you transform AI from a black box into a reliable copilot. The payoff is tangible: clearer insights, faster cycles, and outputs stakeholders trust the first time they see them.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • The Counterintuitive Playbook for CLI Agents: Why Ruthless Subtraction Beats Feature Creep

    The Counterintuitive Playbook for CLI Agents: Why Ruthless Subtraction Beats Feature Creep

    I’ve learned the hard way that the fastest path to a reliable command-line agent is radical subtraction. "In the last month of developing Amplitude Wizard CLI, we cut more than we added. Learn less is more when it comes to building CLI agents." That decision was less about minimalism and more about product strategy: constraints sharpen behavior, clarify intent, and raise trust.

    When I evaluate agentic AI systems, especially those that act on developer environments, I start by asking what the agent must never do. By establishing hard guardrails first, the design naturally converges on an opinionated, safe, and teachable interface. Every additional flag, tool, or permission expands the blast radius; every removal shortens the path to first success.

    For CLI agents, the most valuable product choice is a narrow toolset with sane defaults. Opinionated workflows reduce cognitive load and failure modes, while clear human override points keep users in control. I prefer a bias toward idempotent actions, reversible changes, and explicit confirmation gates for anything destructive. If a feature can’t explain itself in a single, crisp sentence in the help text, it likely doesn’t belong.

    Security and reliability flow from limits. Progressive permissioning, scoped credentials, and time-bounded tokens prevent the agent from wandering. Dry-run modes build confidence without side effects. When a user can reason about what the agent will and won’t do, adoption accelerates—and support tickets plummet.

    Observability is the other half of trust. I instrument "Agent Analytics" across every run: inputs, tool choices, durations, outcomes, and error patterns. Those signals reveal where the agent gets confused, which steps users abandon, and which prompts need pruning. With that loop in place, "less is more" stops being a philosophy and becomes an evidence-backed operating model.

    I anchor the roadmap in eval-driven development. Before adding a capability, I define a measurable task, a success threshold, and the smallest viable interface to reach it. If the capability can’t lift completion rate, time-to-first-success, or re-run stability, it waits. That simple discipline protects the experience from feature creep and preserves velocity in CI/CD.

    Under the hood, I design for a retrieval-first pipeline and careful context window management. The agent should fetch only the minimally relevant facts, present a compact plan, and execute predictably. Thoughtful prompt engineering helps—but prompts are not a substitute for clear boundaries, deterministic tool contracts, and robust error handling.

    Documentation is product. I maintain docs-as-code with runnable examples that mirror the golden paths. When the docs and the CLI disagree, the CLI changes—never the docs. This creates an internal forcing function: if we can’t document it simply, we probably shouldn’t ship it.

    My litmus test for any proposed addition is simple: does this make the mental model smaller? If not, cut it, make it progressive, or hide it behind a clearly named subcommand. Defaults should be boring, safe, and fast. Advanced power should be opt-in and discoverable without overwhelming new users.

    The paradox of agentic AI is that capability grows as surface area shrinks. By removing distractions, we amplify signal, increase repeatability, and earn the right to add the next carefully chosen step. The result is a CLI agent that feels sharp, dependable, and—most importantly—useful on day one.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Prompt Like a Pro: Three Battle-Tested Tips for Amplitude Global Agent Success

    Prompt Like a Pro: Three Battle-Tested Tips for Amplitude Global Agent Success

    When I guide teams building agentic AI features, I’ve seen a single prompt turn Amplitude Global Agent into either a world-class analyst or a well-meaning rambler. The difference isn’t magic—it’s method. With the right structure and iteration, we consistently get faster, clearer insights that stand up to product and analytics scrutiny.

    AI has gotten really good, but success still depends on the quality of your prompts. Explore three best practices for prompting in Amplitude Global Agent.

    Tip 1 — Define the role, goal, and guardrails. I begin every prompt by stating the agent’s role (for example: “You are a product analyst”), the business objective (“identify activation drop-offs by cohort”), and the boundaries (“use only Amplitude analytics events and properties provided; return JSON with metric, segment, timeframe”). This simple pattern reduces ambiguity, improves context window management, and yields outputs I can compare across runs.

    Tip 2 — Ground the model with concrete context and examples. Agent outputs improve dramatically when I supply the exact data it should reference: event names, properties, segments, filters, and timeframes. I often include a short example—one ideal question and one ideal answer—to anchor tone, structure, and depth. Think retrieval-first pipeline: feed the agent authoritative snippets (definitions, dashboards, prior queries) rather than hoping it guesses. That’s how I cut hallucinations and make results reproducible for LLMs for product managers.

    Tip 3 — Iterate with measurement, not vibes. I version prompts, A/B test variants, and log inputs/outputs so I can score quality with lightweight evals (accuracy against known answers, clarity, and actionability). Over time, a small library of “winning” prompts emerges for common AI workflows—activation analysis, retention cohorts, anomaly detection—so the team can move from tinkering to repeatable performance. This is where Agent Analytics practices pay off: we inspect outcomes, not just outputs.

    A practical starter structure I use: Role and Audience; Objective and Success Criteria; Data Context (events, properties, segments, timeframe); Constraints (sources, methods, privacy); Output Format (tables/JSON, fields, length); Examples (one good Q/A); and Fallbacks (what to do when data is insufficient). Even written as plain language, that scaffold reliably steers Amplitude Global Agent to precise, defensible answers.

    The emotional arc here is familiar: when the agent nails a complex funnel question in one pass, the team gets that “oh wow” moment; when it meanders, morale dips. Clear prompting turns those spikes of delight into a steady cadence of wins—less rework, faster learning loops, and cleaner handoffs from discovery to delivery. In short, invest in prompt engineering once, and you compound gains across every analysis session.

    If you’re just getting started, pick one critical question (for example, activation or retention), apply the three tips above, and commit to two to three prompt iterations with scoring. Within a single sprint, you’ll have a robust template you can reuse and adapt—helping Amplitude Global Agent deliver trustworthy insights at the speed your product strategy demands.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Stop Losing Users: How a Second Message and Prompt Audit Drive 2–3x Retention

    Stop Losing Users: How a Second Message and Prompt Audit Drive 2–3x Retention

    Default prompts are quietly sabotaging agent retention. I learned this the hard way while reviewing early funnels for our voice and chat agents—engagement looked great at the greeting, but the moment the agent stopped after a single reply, the conversation flatlined. The fix wasn’t a fancy LLM trick; it was a disciplined second message and a rigorous audit of defaults across every entry point.

    When an AI agent opens with a generic, low-friction greeting and then waits, users hesitate. Cognitive load rises, intent stays fuzzy, and drop-off follows. A thoughtful second message—delivered quickly, with clarity and options—reduces ambiguity and gives people a low-effort path to progress. It’s a small behavioral nudge that pays off in outsized retention gains.

    Here’s the pattern that consistently works for me. First, keep the initial default prompt short, confident, and specific to the channel and task domain. Then ship a fast follow-up if the user hesitates for a few seconds. That second message should clarify what the agent can do, present 2–3 concrete choices, and invite free-form input. I’ve repeatedly seen this simple sequence unlock a 2–3x retention lift in early sessions, especially for first-time users.

    Auditing default prompts is where the leverage lives. I inventory every ingress—web widget, IVR, SMS, in-app, help center—and catalogue the exact default system, developer, and user-facing prompts. Then I inspect turn-1 and turn-2 transcripts in Agent Analytics to quantify where users stall: time-to-first-intent, clarification rate, option selection rate, and completion. This makes the drop-off visible and turns “vibes” into data we can A/B test.

    Designing the second message is a conversation design exercise, not a copy tweak. My recipe: empathize with the user’s likely uncertainty, constrain scope so the agent appears capable, and apply choice architecture. For voice AI agents, I keep it shorter, use confirmation questions, and bias toward read-back for accuracy. For chat, I include tappable options and examples that mirror top intents. The goal is momentum without feeling pushy.

    Operationally, I run controlled A/B tests on default and second-message variants, sized to a realistic minimum detectable effect. I segment by source (ad, organic, support), device, and use case, because the winning prompt for sales qualification rarely matches the one for customer support. With proper instrumentation in our analytics stack, we track retention curves over the first 3–5 sessions, not just single-session reply rates, to avoid optimizing for chatter over outcomes.

    Strong prompt engineering underpins the experience. I keep system prompts stable and explicit about persona, tone, and refusal behavior; manage the context window so examples don’t drown live intent; and use a retrieval-first pipeline when domain knowledge matters. The most expensive mistake I see is shipping defaults like “How can I help you?” without guardrails or examples—great for demos, bad for real users.

    If you’re starting fresh, begin with a prompt audit this week: list all defaults, map them to top intents, and pair each with a channel-appropriate second message. Instrument the funnel, launch two variants, and set a crisp success metric (e.g., turn-2 continuation rate to task start, then task completion). This is one of those rare changes that is simple to ship and compounds across onboarding, activation, and long-term retention.

    The takeaway is straightforward: don’t let your best work stall after the first reply. A disciplined second message and a focused default prompt audit will lift engagement, reduce ambiguity, and create the kind of early momentum that sustains retention over time.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • My Always‑On AI Team: How I Get Claude Agents to Tackle Work While I’m Offline

    My Always‑On AI Team: How I Get Claude Agents to Tackle Work While I’m Offline

    Most mornings I wake up to a to-do list that’s already been updated—because my always-on team of agentic AI assistants has been working while I sleep. I rely on Claude to orchestrate these agents so routine prep, follow-ups, and retrospectives never slip through the cracks.

    When a podcast recording hits my calendar, my podcast-manager agent (powered by Claude) automatically creates a podcast-interview-prep task with a concise summary of who I’m interviewing and what they are building. It also creates a transcript review document with the correct share settings. After the recording, it adds a task to my to-do list to share the transcript with the podcast participants.

    For sales, my sales-admin agent (also powered by Claude) prepares a sales-meeting-prep task with notes on who I’m meeting with, where they are in the sales process, and what I need to move the deal forward. After the call, it generates clear next-step tasks so momentum doesn’t stall.

    Every week, my coding-manager agent (still powered by Claude) compiles a report from my prior week’s coding sessions and offers targeted tips. It flags recurring mistakes or dead ends, shows how to avoid them, and suggests ways to work better with Claude. It’s the retrospective I never skip.

    In this walkthrough, I’ll explain how I get Claude to complete tasks for me while I’m away from the computer—and how I designed the system to balance power, safety, and cost control.

    I first explored this approach after seeing the rapid growth of OpenClaw. OpenClaw is an open-source "agent harness" that lets you configure personalized agents to act on your behalf. It’s incredibly promising, but the early wave of enthusiasm also revealed pitfalls: complex safety configuration, overly broad machine access (browser, terminal, files, credentials), third-party skills of varying quality, and surprise usage bills.

    After hearing one too many horror stories about wasted hours and unexpected charges, I set out to design a safer, more predictable way to capture the benefits of OpenClaw while managing risk and spend. That’s what led to my current agent setup.

    For transparency: I’m a long-time practitioner and a genuine fan of Claude Code. I have not received any compensation from Anthropic for writing about my approach. If that ever changes, I will disclose it—both because it’s required by the FTC in the U.S. and because it’s simply the right thing to do.

    An Overview of How My Agent Team Works

    Today, I run three specialized agents: a podcast manager, a sales admin, and a coding manager. As I invest more, I expect this team to grow—because the pattern scales cleanly across use cases.

    This system runs on four core components that keep everything reliable, auditable, and cost-aware.

    First, agent identity. I use a simple but powerful convention: an identity markdown file that tells the agent who it is, where its task folder lives, and provides context for the types of tasks it will do. This keeps scope tight and intent explicit—critical for safety and predictable automation.

    Second, the scheduler. I’m using MacOS’s built-in scheduler (via LaunchAgents). This is like cron, but runs with all your user permissions on Mac. That means I can run all of this under my Claude Code Max subscription or my ChatGPT/Codex subscription. The result is a dependable heartbeat for my AI workflows without relying on fragile cloud glue.

    Third, tasks. Each agent owns a dedicated folder of tasks. A task is a markdown file with frontmatter. That structure makes work items easy to create, parse, review, and version—perfect for repeatable automation with a human-in-the-loop safety net.

    Fourth, scripts. Each agent has its own scripts folder with utilities it can call on demand or that run on a schedule. These scripts are small, composable, and transparent—so I can evolve capabilities without ballooning risk or complexity.

    Agent identity, tasks, and scripts are saved in Obsidian—not Claude Code skills or agents. The scheduler runs on my always-on Mac Mini. The benefit of this is it just works across all of my devices and I can seamlessly switch between Claude Code, Codex—or any other coding CLI—as I need to. All it takes is updating my script that the scheduler uses.

    In practice, this architecture delivers exactly what I want from agentic AI: clarity of responsibility, strong guardrails, and outcomes that compound. My podcast manager keeps interviews buttoned up, my sales admin removes administrative drag, and my coding manager turns lessons learned into steady skill gains—all while I focus on higher-leverage product management work.

    If you’re considering a similar setup, start with a single agent and a narrow task, then expand. Keep identities crisp, scripts small, and schedules explicit. With that foundation, you’ll get the benefits of automation and delegation—without surrendering control.


    Inspired by this post on Product Talk.


    Book a consult png image
  • From PM to AI Engineer: RAG, Evals, and Discovery—The Surprising Playbook I’m Applying

    From PM to AI Engineer: RAG, Evals, and Discovery—The Surprising Playbook I’m Applying

    I just finished a standout conversation on AI engineering and product discovery that hit squarely at the questions I hear from product leaders every week: What does practical AI engineering actually look like for product managers, and how do we ramp without a traditional software background?

    Listen to this episode on: Spotify | Apple Podcasts

    Here’s the arc that resonated with me: a product leader goes from occasional tinkerer to spending 60% of her time on real engineering work—building AI-powered tools for continuous discovery, forming a licensing partnership with Vistaly, and quietly constructing "Teresa Bot," an AI discovery coach trained on everything she’s ever written. The journey is less about mastering every framework up front and more about structuring learning, tightening feedback loops, and shipping useful outcomes.

    The most energizing throughline is the myth-busting: you don’t need a deep engineering pedigree to operate in this space. Curiosity, rigorous discovery habits, and eval-driven development will take you further than brute-force coding. As one moment put beautifully, "I know anything that I don't know how to do, Claude will teach me how to do. And Claude is infinitely patient." That captures the posture I expect modern PMs to adopt with LLMs and tools like Claude Code.

    On the nuts and bolts, the discussion gets concrete about AI engineering in practice: context engineering, prompt writing, RAG, observability, and evals. This is the real stack—think retrieval-first pipeline design, prompt engineering guardrails, instrumentation for model drift, and continuous, automated evals to protect behavior as you iterate. If you’ve been dabbling with context window management but haven’t formalized your test harnesses or dashboards, this is your cue.

    What I appreciated most is how directly discovery skills transfer. Framing assumptions, running tight customer interviews, mapping opportunity solution trees, and aligning stakeholders—these are precisely the muscles you need to shape problem spaces before you “vibe code” solutions. As one reflection nails it, "The moment I learned more about data science, all of my discovery work became so different." That’s the bridge from qualitative sense-making to measurable, model-centered learning.

    The partnership with Vistaly is also a smart build vs buy case study. Rather than reinvent infrastructure, the choice to license purpose-built opportunity solution tree software keeps focus on the differentiated layer—learning systems and product outcomes. As it’s put plainly: "I don't want to build all that stuff. I don't really want to be a software company. I'm almost set up like an AI researcher." Product leaders should internalize this lens for platform choices across their AI roadmaps.

    On "Teresa Bot," the implementation breadcrumbs are familiar and pragmatic: pair a solid retrieval-first pipeline (RAG) with clean content sources, keep prompts modular, enforce code review even for vibe coding, and stand up observability and evals early. I’ve had similar success using Claude Code for rapid iteration while treating every prompt and context change as a versioned artifact. That discipline pays dividends when you need to trace regressions or prove improvements.

    If you’re a PM ready to lean in, start small and systematic. Pick one high-signal discovery workflow, model the knowledge you already have, and wire up basic evals before you scale. Keep a lab notebook, use programmatic tests to gate deployments, and measure outcome movement—not just model cleverness. This is where LLMs for product managers move from novelty to execution readiness.

    Resources mentioned: Watch the episode on YouTube, Claude Code, Vistaly (opportunity solution tree software), Opportunity Solution Trees: Visualize Your Discovery to Stay Aligned and Drive Outcomes, Product Talk Academy, Just Now Possible Podcast, Vibe Coding Best Practices: Avoid the Doom Loop with Planning and Code Reviews, and the AI Evals for Engineers and PMs course on Maven.

    What stood out to you—RAG design choices, eval frameworks, or the discovery-to-engineering mindset shift? Drop your thoughts below; I’d love to learn how you’re applying these patterns in your own product roadmaps.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Stop Asking AI Anything: The 3 Outcome-Based Prompts That Unlock Real Product Insights

    Stop Asking AI Anything: The 3 Outcome-Based Prompts That Unlock Real Product Insights

    Too often I watch teams ping a global agent with vague AMAs and then wonder why they get generic summaries instead of decisive guidance. When I lead product reviews, I push the team to treat AI like a partner in decision-making, not a trivia engine. That simple mindset shift transforms how quickly we move from questions to confident action.

    AI isn’t built for AMA (ask me anything). Get recommendations for outcome-based questions for the best results with Amplitude AI.

    In practice, outcome-based prompting means I don’t ask an agent to “analyze the data.” I ask it to help me reach a specific product decision, grounded in behavioral analytics and connected to our outcomes vs output OKRs. To make that concrete, I always frame my prompts around three things.

    First, I state the outcome and metric. I name the business goal and the exact measure in Amplitude analytics that will validate success—activation rate, funnel conversion from A to B, or 8-week retention. I’ll reference the relevant events, segments, or driver trees so the agent has a crisp target. This is where product strategy meets measurement discipline.

    Second, I define the context and constraints. I specify the user cohort, the timeframe, and the surface area I care about—new self-serve signups in the last 30 days, first-session behavior on web only, or EU traffic where data governance rules apply. On a unified analytics platform, this context lets an agentic AI narrow its search to the highest-signal slices of behavioral analytics rather than pattern-matching across noise.

    Third, I declare the decision and deliverable. I tell the agent exactly what I will do next and the format I need to act: a ranked list of levers for an A/B testing plan, a recommended prompt engineering template for in-app guides, or a one-page brief I can hand to the growth team. Clear decisions lead to clear outputs; vague intents lead to vague answers.

    Operationally, I turn these three elements into reusable prompt templates, and I track their performance with Agent Analytics. I review traces to see which inputs drive the best recommendations, and I refine prompts the same way I iterate on product copy. For LLMs for product managers, this is the craft: small, testable improvements that compound into outsized impact.

    Here’s a quick example. When I needed to lift user activation, I asked for a prioritized set of friction points blocking first-value within 24 hours for new self-serve accounts, based on last month’s data. I defined activation as completing event X within Y hours, asked the agent to analyze top drop-offs in the funnel, and requested an action plan with two experiment ideas and success thresholds. The response mapped behaviors to interventions, connected to retention analysis, and gave me a prompt engineering snippet for the onboarding nudge we shipped the same week.

    If your AI workflow still starts with “What does the data say?”, you’ll keep getting broad narratives. Start with outcomes, sharpen the context, and specify the decision you will make. That’s how Amplitude analytics, paired with agentic AI, stops being interesting and starts being indispensable.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • No More Accidental Agents: How We Engineered Global Agent’s Helpful, Curious Personality

    No More Accidental Agents: How We Engineered Global Agent’s Helpful, Curious Personality

    Most teams ship AI agent personalities by accident—emergent quirks, brittle prompts, and uneven behavior. We refused to let that happen. From day one, we treated personality as a first-class product surface, one that should be designed, instrumented, and iterated with the same rigor as any core capability.

    Learn how we designed Global Agent’s personality and fine-tuned its inquisitiveness and helpfulness using Agent Analytics.

    In my role leading product at HighLevel, Inc., I framed our approach around agentic AI and conversation design: personality is not “flavor text”; it is the control system for how an agent interprets context, asks questions, and decides when to act. Our product strategy prioritized clarity, empathy, and consistency—so the agent would be curious enough to resolve ambiguity without becoming interrogatory, and helpful enough to move work forward without overstepping.

    We made that intent measurable. Using behavioral analytics, we defined operational signals such as clarification-question rate, resolution-path efficiency, and escalation quality. We combined eval-driven development with targeted A/B testing to compare prompt patterns and tool strategies, ensuring each change had a clear hypothesis and measurable outcome.

    To calibrate inquisitiveness, we mapped decision points where the agent should ask follow-ups versus proceed autonomously. Prompt engineering codified those thresholds, while a retrieval-first pipeline reduced unnecessary questions by improving context completeness up front. When the agent did ask, we constrained tone and cadence to keep queries concise, respectful, and progress-oriented.

    To enhance helpfulness, we prioritized precise action-taking and unambiguous guidance. Context window management preserved relevant facts without diluting intent, and guardrails aligned with AI risk management principles ensured the agent stayed within policy, privacy, and compliance boundaries. The result was an assistant that resolved more tasks end-to-end, with fewer stalls and clearer handoffs when human help was warranted.

    Agent Analytics became our nervous system. We instrumented every dialog turn to attribute outcomes to design choices, then used driver trees to connect micro-behaviors to macro results like time-to-resolution and customer satisfaction. This closed-loop view let us ship confidently, knowing which levers improved helpfulness, which sharpened curiosity, and which merely added noise.

    Process mattered as much as tooling. Product trios ran continuous discovery with customers to surface edge cases—ambiguous intents, multi-intent turns, and sensitive scenarios—while our engineering partners operationalized experiments with clean rollback paths. We favored small, testable changes over sweeping rewrites, building momentum and trust with each iteration.

    The payoff is a personality that feels consistent across use cases: curious when clarity is missing, decisive when action is obvious, and transparent when limits are reached. Users experience fewer dead ends, faster resolutions, and a brand voice that shows up the same way every time—because it was defined, measured, and improved on purpose.

    If you’re building agentic AI, don’t leave personality to chance. Treat it like a product: set clear outcomes, instrument deeply with Agent Analytics, and iterate with eval-driven development and A/B testing. That’s how curiosity becomes a feature, helpfulness becomes a habit, and your agent becomes reliably, intentionally excellent.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • From Prototype to Production: How I Built Reliable AI-Generated Opportunity Solution Trees

    From Prototype to Production: How I Built Reliable AI-Generated Opportunity Solution Trees

    I just wrapped an all-out engineering sprint. That still sounds odd coming from me, because while I’ve written code on and off for years, I don’t self-identify as an engineer. I’m a product manager who used to be a designer. It’s been a long time since I wrote code for a living.

    But AI has expanded what’s just now possible—for our products, and for us. It’s pushed me to do more than I imagined. In that spirit, I want to share a recent engineering story. It includes technical details, and a year ago I couldn’t have done any of it. I learned it with the help of AI, and my aim is to show what’s now within reach.

    I’ve been building two services with a partner at Vistaly: AI-generated interview snapshots and AI-generated opportunity solution trees. We put out a call for alpha partners, received over 100 applicants, and selected eight design partners to start.

    Opportunity Solution Tree diagram with a blue Desired Outcome branching to green Opportunity nodes, yellow Solution nodes, and orange Assumption Tests for product discovery and AI workflows.
    A clear, color‑coded map from desired outcome to opportunities, solutions, and assumption tests—showing how to structure discovery work and prompt AI to generate, compare, and validate product ideas.

    Each team uploaded three customer interviews. I identified the key moments and opportunities and then generated an opportunity solution tree from those snapshots. I provide the AI services; Vistaly is building the UI and workflows around them.

    Early feedback was strong. Teams immediately asked to upload more interviews—exactly the kind of demand signal you hope to see—so we got to work making that possible.

    Dark interface screenshot of an opportunity solution tree with colored cards and dotted connectors, showing merged, moved, and evidence-added Opportunity notes about onboarding, support, and bot readiness.
    Go behind the scenes as AI turns raw feedback into a clear Opportunity Solution Tree. Linked cards reveal user needs—onboarding, support offload, and bot-readiness signals—so product teams can spot priorities and next steps at a glance.

    Updating an opportunity solution tree with new interview content is far harder than generating a new tree from scratch. I initially underestimated the complexity. Our goal wasn’t to produce a tree and declare it truth. We wanted teams to engage, correct, and collaborate with the AI—scaffolding cross-interview synthesis instead of doing it for them.

    To support that, we needed a way to communicate precisely how a tree would change after new interviews were added. We took inspiration from git diff and set out to build the equivalent for opportunity solution trees—step-by-step change sets that explain each proposed modification.

    Diagram of an opportunity solution tree with an Outcome node pointing to Opportunity A and Opportunity B; B branches to child opportunities and shows source evidence, labeled “Updates Can't Result in Data Loss.”
    A clear visual of AI‑generated opportunity solution trees: outcomes feed opportunities that branch into sub‑opportunities, while evidence is preserved. The structure ensures updates stay traceable and never cause data loss.

    That decision was right, but the lift was larger than I expected. It wasn’t enough to generate an updated tree; I also had to provide a clear, ordered walkthrough of what changed and why.

    I often see the same pattern with AI: it’s easy to get to an impressive prototype, but much harder to reach a production-grade product. That was exactly my experience here. My service actually comprised two sub-services: generating a new tree from scratch and updating an existing tree with new interviews. The first worked well in alpha; the second had to be built before anyone could add a fourth interview.

    Opportunity Solution Tree diagram: teal Outcome links to Opportunities A and B; Opportunities C and D branch under B; right panel lists the change set steps for adding nodes.
    Explore how an outcome expands into an Opportunity Solution Tree: Opportunities A and B stem from the goal, with C and D nested under B, while a concise change set tracks every node added along the way.

    On the surface, these services look similar. In reality, updates must preserve existing structure unless new evidence requires a change. You have to account for compound operations—merges, splits, deletes—while guaranteeing no data loss. Every node has source opportunities (supporting evidence from interviews) and children (tree sub-opportunities), and neither can be dropped.

    In classic AI fashion, I got a reasonable version working in a few days and shipped it to our design partners. One team quickly hit our beta limits and asked to convert to a paid subscription so they could keep going. They showed a willingness to pay, converted, and started uploading aggressively.

    Diagram of an Opportunity Solution Tree showing how parent 'Opportunity A' with children x, y, z is split into 'Opportunity A' and 'Opportunity B' to reassign evidence and connections.
    Watch an Opportunity Solution Tree evolve: the original parent A with x, y, z branches is split into A and B, shifting evidence while preserving links—mirroring how AI refines scope and structure in discovery.

    At the 14th, 15th, and 16th uploads, the cracks appeared. We saw odd behavior in some trees. The Vistaly team noticed that the change sets—the step-by-step instructions emitted by my service—didn’t always reconstruct the final tree my service also emitted. We needed those steps to match exactly, so teams could review and accept, modify, or reject each change with confidence.

    They flagged the issue the day I was flying to New Orleans for Jazz Fest. In hindsight, I’m glad I didn’t grasp the scope of what awaited me. I had roughly 80% of the work still to do to make tree updates rock solid. At least I got to enjoy the music first.

    Flowchart merging two opportunity solution trees: Opportunity B with children y and z, and Opportunity C with t, u, v, consolidated into one tree led by Opportunity C connected to five child opportunity nodes.
    From fragments to focus: this diagram shows how Opportunities B and C are merged into a single Opportunity Solution Tree, removing duplicates and unifying context so AI can rank and explore five related opportunities with clarity.

    Back home, I started diagnosing. My service was a pipeline: several LLM-driven steps followed by deterministic code to compare trees and produce change sets. As I dug in, I realized that approach was flawed. Tree diffs, unlike linear document diffs, are ambiguous.

    In a document, if I add a sentence, the diff shows an addition. If I delete a paragraph and rewrite it, the diff shows a removal and an addition. Simple. But trees are different. Suppose I split opportunity A into A and B, and later merge B with C. The split can disappear from the final diff.

    Diagram of an opportunity solution tree labeled 'Input Tree' showing an Outcome node branching to Opportunity A and C, each with child nodes x-z and t-v, with arrows indicating hierarchy.
    Peek inside our process: a simple opportunity solution tree maps an outcome to prioritized opportunities A and C with downstream options x-z and t-v. A clear snapshot of how AI organizes product discovery.

    When the model splits an opportunity, it must distribute A’s source opportunities and children between A and B. For instance, if A has source opportunities 1, 2, 3 and children x, y, z, after the split A might keep 1, 2, and x, while B takes 3, y, and z.

    Now suppose the model merges B into C. If C originally had source opportunities 4 and 5 and children t, u, v, then after the merge C now has source opportunities 3, 4, 5 and children t, u, v, y, z. When you compare the original and final trees, it looks like A somehow donated some evidence and children directly to C. The split and merge that explain why are invisible to a naive diff.

    Opportunity Solution Tree diagram titled Output Tree: a blue Outcome node branches to green Opportunity A and Opportunity C, which expand to nodes x-v with arrows; Product Talk badge.
    See how an AI-generated Opportunity Solution Tree unfolds: one Outcome flows to Opportunities A and C, then into options x–v. Clean colors and arrows reveal the hierarchy from goal to opportunities at a glance.

    That was the core insight: we didn’t just need to show what changed—we needed to show why it changed. I had to reconstruct each move step-by-step. That meant getting the model to show its work, which opened a new can of worms.

    I refactored my prompts so the model produced both the final output and the exact change set it used to get there. The action language was explicit: add, delete, reframe, merge, split, and so on. Crucially, I asked the model to describe its moves in user-meaningful terms—“split A into A and B, then merge B into C”—not as opaque reassignments of sources and children.

    Diagram of an AI-generated Opportunity Solution Tree: blue Outcome node with children Opportunity A and Opportunity B; B branches to Opportunity C and D. A right-hand list shows the change set for each step.
    Watch an opportunity solution tree take shape: start with the outcome, add opportunities A and B, then extend B to C and D. The paired change set makes every edit transparent—ideal for AI-assisted product discovery.

    For each LLM step, the model now emitted its recommendation and the corresponding change set. This helped, but it wasn’t perfect. After extensive testing and error analysis, two classes of errors emerged: (1) the model attempted an invalid move, and (2) the change set didn’t actually generate the recommendation.

    Category 1 felt like designing a game while the model played it creatively. For example, what happens when the model tries to merge a parent with a child? If opportunity A has children B, C, and D and the model merges A with B, the merge is directional. If the instruction is “keep A, delete B,” that works—the parent absorbs the child. But if the instruction is “keep B, delete A,” then C and D become orphans. These puzzles were solvable and even fun.

    Diagram of Opportunity Solution Tree merge rules: merging node B into parent A is allowed, while merging A into B is not because it would orphan opportunities B, C, and D.
    Visual explainer from Product Talk on AI-generated Opportunity Solution Trees. It contrasts an allowed merge (B into A) with a not-allowed merge (A into B) that leaves child opportunities orphaned, guiding safe hierarchy edits.

    Category 2 was harder. Despite prompt iterations, I could only push the discrepancy rate down to about 1 in 40 instances. With 10–20 LLM calls per run, that meant roughly half of all runs still failed. Not acceptable for production. I hit a wall. A paying customer was waiting, and more design partners were queued up.

    Next, I tried to correct the model’s mistakes with deterministic code. I had promised that my change sets would generate the output tree, so I wrote verifiers: detect conflicts (e.g., delete a node, then try to use it later), guard against data loss, prevent orphaned nodes, and more. Detection was straightforward; correction was not. Fixing issues required guessing the model’s intent. If the sequence said “delete A, then merge A with B,” should I remove A entirely or salvage A’s sources and children by merging into B? There were dozens of such cases with no unambiguous answer.

    Workflow diagram titled 'My Simple Repair Loop' showing an iterative validation cycle: Generate the Change Set → Run the validation tool → Check Result, with branches to retry on failure or exit on pass.
    A step-by-step loop shows how changes are validated: generate a change set, run a validation tool, review the result, then repeat on failure and exit on pass—mirroring iterative work behind AI-built Opportunity Solution Trees.

    After 11 straight days of deep work—including weekends—I was exhausted. I dislike hustle culture; this isn’t how I design my life. But I was stuck, and then I had an insight.

    On a walk with my husband (also an engineer), I realized I could have the LLM repair its own mistakes. My data contract with Vistaly requires that the change set must generate the output tree. I had already built robust validation code. I knew exactly when a change set failed—and why. No amount of prompt tuning alone was fixing it. So I turned the validator into a tool for the model and created a simple agentic loop.

    The loop works like this: the model proposes a change set, calls the validation tool, and gets back a pass/fail plus specific feedback. If it fails, the model uses those instructions to repair the change set and calls the tool again. Iterate until success or a max number of turns.

    I prototyped in Node.js with a single model call, a verifier pass, and a repair attempt. At first, the loop didn’t converge—it just accumulated compute. I experimented with how to communicate errors, how much context to include, and how to sequence feedback. Eventually, it clicked: the model began fixing its own mistakes and typically returned a valid change set in one or two repairs. It was, in practice, eval-driven development applied to LLM outputs.

    I had already built an agent loop utility for another AI workflow, so I productionized quickly: model call, optional tool invocation, tool result returned to the model, repeat until the validator signals success or the loop times out. I integrated the new loop into the pipeline and shipped the revamped service to Vistaly on Monday at noon. They’re integrating now, and it will be in the hands of our design partners shortly. I’m relieved—and ready for a day off.

    Reflecting on the last two weeks, a few things stand out. First, I shed limiting beliefs about being an engineer. To make this reliable, I had to solve legitimately hard problems, and that feels good.

    Second, this was genuinely fun. Designing the action set and watching the model push those boundaries was like working through elegant puzzles. Models are incredibly creative, and harnessing that creativity with the right constraints is deeply satisfying.

    Third, I learned when I can and can’t trust Claude to write code for me. Since Opus 4.6 came out, I gave Claude a much longer leash. After the past two weeks, Claude is back on a short leash. I found a lot of gaps in my implementation in areas where I simply trusted that Claude got it right, when in fact it didn’t. If you don’t have the right infrastructure—planning, testing, code review—this can be disastrous. I’ll be investing more here and sharing what I learn.

    Finally, if this work had been spread over two months, it would have been thoroughly enjoyable. I’m discovering how much I like being an AI engineer. It feels like a new chapter where I can combine opportunity solution trees with modern AI engineering—and deliver real value to product teams doing continuous discovery.

    I’m excited to share more of what we’re building with Vistaly and to onboard more design partners soon. If you’re interested, get on the waiting list. And if you’ve been hesitant to stretch beyond your current skill set, I hope this story nudges you to take the first small step toward what’s just now possible.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Unlock High-Leverage PM Work: 5 Claude Cowork Playbooks to Turbocharge Your Strategy

    Unlock High-Leverage PM Work: 5 Claude Cowork Playbooks to Turbocharge Your Strategy

    In my role leading product teams, I’m relentless about freeing time for high-leverage work—clarifying strategy, sharpening positioning, and unblocking execution. Claude Cowork has become a reliable AI partner in that mission, helping me automate repeatable tasks while preserving judgment for the decisions that matter most.

    Get 5 playbooks to automate common product management tasks with Claude Cowork and free yourself for higher-leverage PM work.

    When I say “playbooks,” I mean structured, repeatable workflows that turn messy inputs into crisp outputs—without sacrificing rigor. With agentic AI, LLMs for product managers, and thoughtful prompt engineering, these playbooks plug directly into my product roadmapping and sprint planning process, accelerating discovery, analysis, and stakeholder alignment.

    Playbook 1: Continuous discovery synthesis. I route raw customer interviews, support threads, and behavioral analytics into Claude Cowork to cluster themes, extract Jobs-to-Be-Done, and propose opportunity areas. It drafts an initial opportunity solution tree with clear problem statements, target outcomes, and candidate solutions, which I then refine with the team. This shortens the loop between customer interviews and actionable insights while preserving the nuance that continuous discovery requires.

    Playbook 2: Strategy-to-roadmap alignment. Starting from our product strategy and target outcomes, I ask Claude Cowork to translate goals into a prioritized roadmap, calling out outcomes vs output OKRs and showing driver trees that connect initiatives to measurable impact. It flags dependencies and suggests stakeholder management touchpoints, making the narrative behind prioritization transparent and easier to socialize across product trios and leadership.

    Playbook 3: Experiment design and A/B testing. To move from ideas to evidence, I have Claude Cowork generate testable hypotheses, success metrics, and guardrails for A/B testing. It produces experiment briefs, checks statistical assumptions like minimum detectable effect (MDE), and suggests instrumentation plans for tools such as Amplitude analytics. I use these drafts to speed up reviews without compromising on methodological rigor.

    Playbook 4: Launch communications and in-product guidance. After we ship, I leverage Claude Cowork to assemble UX writing, release notes, and in-app guides tailored to user segments. It proposes short product tours, contextual tooltips, and support macros that keep messaging consistent across Pendo or Intercom while reinforcing our value proposition. The result is faster, more cohesive go-to-market execution with fewer round-trips.

    Playbook 5: AI risk, governance, and quality checks. Before anything goes live, I use Claude Cowork to run structured reviews for data governance, privacy-by-design, and AI risk management. It helps draft acceptance criteria, red-team prompts for edge cases, and an eval-driven development checklist so the team can track model behavior and mitigate regressions over time. These safeguards maintain trust as we scale AI workflows across the product surface.

    To make these playbooks sing, I seed Claude Cowork with a retrieval-first pipeline of canonical docs—vision, strategy, OKRs, analytics dashboards, and definition-of-done checklists—plus prompt templates tuned for our voice and review standards. Tight context window management, explicit role instructions, and lightweight evaluations keep outputs accurate, auditable, and on-brand.

    The impact has been compounding: faster discovery-to-decision cycles, clearer roadmaps tied to outcomes, stronger experiments, and launch content that lands. Most importantly, the team spends more time on creative problem solving and stakeholder partnership, not manual synthesis or formatting. If you’re ready to reclaim your calendar and elevate your product strategy, start with these five Claude Cowork playbooks and iterate from there.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • 5 Proven Agent Skills I Use to Automate Weekly Product Reviews with Claude, Cursor, and Codex

    5 Proven Agent Skills I Use to Automate Weekly Product Reviews with Claude, Cursor, and Codex

    Weekly product reviews are where strategy meets execution, and over the past year I’ve turned them into a high-signal, low-friction ritual by leaning on agentic AI. As VP of Product Management at HighLevel, Inc., I’ve standardized a set of agent skills that compress preparation time, surface the right insights, and keep PMs, engineers, and designers focused on decisions—not document wrangling.

    "Learn how our teams use agent skills with claude, cursor and codex to run product reviews as PMs, engineers, and designers. Here are 5 killer use cases for builder."

    Below, I walk through the five skills I rely on most in our weekly cadence—each one mapped to a clear product management outcome. They’re simple to set up, easy to govern, and aligned with core practices like continuous discovery, product roadmapping and sprint planning, and eval-driven development.

    Skill 1 — Backlog triage with signal extraction: I point an agent at fresh tickets, customer notes, and experiment results to cluster themes, tag impact, and flag regressions. Using a retrieval-first pipeline and Agent Analytics, the assistant ranks items by value, effort, and risk so our meeting starts with a prioritized, explainable shortlist instead of a raw queue.

    Skill 2 — PRD and spec synthesizer: Ahead of the review, an agent drafts a one-page PRD update from design diffs, git history, and decision logs. With Claude Code and Cursor, it highlights interface changes, acceptance criteria, and open questions, linking back to sources. The result is a crisp, auditable brief that keeps product trios aligned without re-litigating context.

    Skill 3 — Experiment and metrics analyzer: An analytics agent pulls A/B testing readouts, checks minimum detectable effect assumptions, and annotates anomalies. It turns raw telemetry into a narrative: what moved, by how much, and whether we trust it. This makes our discussion about tradeoffs, not spreadsheets, and speeds commitments on next steps.

    Skill 4 — Voice-of-customer synthesizer: The assistant clusters interviews, support threads, and NPS verbatims into jobs-to-be-done and pain themes. It proposes opportunity solution tree updates and calls out places where our roadmap diverges from customer signal. That keeps continuous discovery alive in the room—even when time is tight.

    Skill 5 — Roadmap and sprint planning co-pilot: After decisions, an agent converts outcomes into scoped backlog items, engineering tasks, and stakeholder updates. It drafts sprint goals, flags dependency risks, and aligns work to objectives. Because it’s grounded in the meeting record, it preserves intent while removing ambiguity.

    Under the hood, prompt engineering patterns and guardrails keep these workflows predictable: a retrieval-first pipeline for context, eval-driven development for quality checks, and role-specific prompts for PMs, engineers, and designers. With Claude Code I generate structured diffs and test scaffolds; with Cursor I accelerate code-review summaries; and with codex I bootstrap utility scripts that keep the loop tight between insights and implementation.

    The payoff is tangible: higher decision velocity, fewer meetings to “re-clarify,” and clearer accountability across the product organization. Just as important, governance and privacy-by-design are built in—every agent logs rationale, cites sources, and respects data boundaries—so leaders can scale AI workflows confidently.

    If you’re looking to level up your product reviews, start with these five skills, measure impact with Agent Analytics, and iterate. Small automations compound quickly, and the more consistently you run them, the more your team’s attention shifts from preparing content to making better product decisions.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image