Tag: Agent Analytics

  • AI Evals for Product Managers: How I Measure Agent Quality—A Beginner’s Playbook

    AI Evals for Product Managers: How I Measure Agent Quality—A Beginner’s Playbook

    I’ve led multiple AI agent launches, and the single most reliable way I’ve found to ship with confidence is to treat evaluations as a product capability, not a side project. When we make AI quality measurable, predictable, and comparable over time, we move faster, reduce risk, and build trust with customers and stakeholders.

    Learn how product managers use AI evaluations to measure agent quality. Covers traces, LLM judges, offline evals, online evals, and how to connect evals to product outcomes.

    Why does this matter so much in product management? Because agent quality is only meaningful when it drives adoption, satisfaction, and revenue. I use eval-driven development to align the day-to-day iteration of prompts, policies, and workflows with business outcomes like activation, retention, and Net Recurring Revenue (NRR). That alignment turns AI quality from an abstract notion into a roadmap lever.

    First, traces. Traces are the spine of evaluation for agentic AI: they capture inputs, intermediate steps, tools invoked, and final responses. I instrument traces to make reasoning visible—what the agent tried, where it hesitated, and why it chose a path. With that visibility, I can compare prompts, policies, and tools, and I can teach the team to fix the root cause instead of patching symptoms. This is also where Agent Analytics becomes real: we move from anecdotes to observable behavior trends across cohorts and use cases.

    Next, LLM judges. I use model-as-judge to score qualities like helpfulness, coherence, or adherence to brand and policy. The trick is calibration. I pair LLM judges with a small, high-quality human-labeled set to ground the scale, then monitor drift as models, prompts, or data shift. LLM judges help me evaluate at speed, but I still spot-check edge cases and highly regulated flows to balance efficiency with risk controls.

    Offline evals come first. Before I expose users to changes, I run fixed test suites representing core scenarios, failure modes, and edge cases. I include golden examples, adversarial prompts, and domain-specific queries. Metrics cover task success, factuality, safety, latency, and cost. This is where prompt engineering and retrieval quality are tuned; if I’m using a retrieval-first pipeline, I evaluate evidence quality separately from generation so improvements are attributable and reproducible.

    Online evals follow to validate real-world performance. I roll changes out behind feature flags and use A/B testing to compare variants under production conditions. I track conversation outcomes, tool success rates, fallbacks to human support, and user satisfaction. These online signals close the loop on whether an offline improvement actually compounds value in the product—critical for product-led growth.

    Connecting evals to product outcomes is non-negotiable. I map quality signals to a driver tree: from per-turn scores (helpfulness, safety, latency) up to session-level outcomes (task completion, deflection, revenue intent), and finally to product KPIs (activation, retention, NRR). With this structure, I can set thresholds for launch gates, prioritize roadmap items that move the biggest levers, and build dashboards that leadership understands at a glance.

    A few lessons learned. Start with a minimal but durable test set and grow it as you discover new failure modes. Version everything—prompts, tools, and datasets—so you can reproduce wins. Beware metric drift when you swap models or update prompts. Blend human review where the cost of error is high. Above all, make evaluations part of your AI workflows and sprint rituals so quality improves continuously, not sporadically.

    If you’re just getting started, begin with traces and a small offline suite, add LLM judges for scale, then prove impact with a focused online experiment. Within a few cycles, you’ll have a living evaluation system that guides decisions, accelerates delivery, and gives your team—and your customers—confidence in every AI release.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Supercharge Insights with Amplitude Agent Connectors: Connect Notion, Slack, Linear & More

    Supercharge Insights with Amplitude Agent Connectors: Connect Notion, Slack, Linear & More

    I’ve led enough multi-tool product organizations to know how quickly momentum erodes when insights and actions live in different places. When my teams bounce between Notion, Atlassian, Slack, Linear, and analytics dashboards, we pay a real tax in context switching. That’s why I’m excited about what Amplitude is enabling with Agent Connectors—bringing our daily work and our data-driven decisions into one fluid, agentic AI workflow.

    Connect Notion, Atlassian, Slack, Linear, and more to Amplitude's Global Agent. Get richer analysis and take action across tools without leaving Amplitude.

    Practically, this means I can treat Amplitude analytics as a unified analytics platform where analysis and execution finally meet. Instead of exporting charts or copying insights into docs, I can drive Agent Analytics directly from the same surface where I manage behavioral analytics, reducing friction and accelerating decisions. For my product strategy, that’s a meaningful shift—from “insight later” to “insight-to-action now.”

    Here’s how I’d use it on a typical day: I ask the agent to synthesize signals from recent feature usage, spotlight anomalies, and then draft a concise summary for our Slack channel. In the same flow, I can prompt it to reference our Notion specs for context and queue next steps in Linear, keeping Atlassian stakeholders looped in without any extra swiveling between tabs. The value isn’t just faster execution; it’s tighter alignment across teams because the analysis and the plan live together.

    From an operating model perspective, this is how I scale AI workflows responsibly. I can define clear prompts, approval paths, and ownership so the agent augments—not replaces—expert judgment. Data governance and permissions remain front and center: the agent sees what your teams are allowed to see, and we maintain auditability on critical workflow steps. The outcome is a trustworthy, repeatable system that compounds learning over time.

    If you’re exploring agentic AI for product teams, start small and instrument your ROI. Pick one or two connectors (Slack and Notion are great first choices), define a measurable workflow—like pushing weekly retention insights and creating prioritized follow-ups in Linear—and iterate using continuous discovery. In my experience, the first wins appear as reduced time-to-insight, fewer meetings to align, and faster cycle time from observation to shipped change.

    The big picture is simple: bring your work to your analytics, and your analytics to your work. With Agent Connectors, Amplitude’s Global Agent helps close the loop from understanding behavior to taking action—without leaving the place where your insights are born.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • The Ultimate Knowledge Management Playbook to Supercharge Your AI Service Agent and Scale Support

    The Ultimate Knowledge Management Playbook to Supercharge Your AI Service Agent and Scale Support

    AI in customer service is no longer experimental—it’s the standard. In my work leading product and customer experience teams, I’ve seen the shift firsthand, and the stakes have never been higher for getting the foundations right.

    Fin’s 2026 Customer Service Transformation Report found that 82% of senior leaders say their teams invested in AI for customer service over the last 12 months, with 87% planning to invest in 2026. Those investments pay off with 24/7 availability, multilingual support, major time savings, and faster resolutions. But there’s an unsung hero behind every AI-first support experience: knowledge management.

    A Service Agent is only as good as what we give it to work with. If we’re using an Agent, like Fin, to resolve customer queries end to end, it needs an extensive pool of knowledge to draw from. We have to feed it accurate answers on our product, features, policies, and troubleshooting. Without these, the Agent can’t do its job—and our team ends up handling repetitive queries that should be automated.

    Monochrome headshot beside a prominent Fin quote about customer support, urging time investment in knowledge and processes to create compounding impact and fewer future cases for service teams.
    A Fin-branded quote pairs with a friendly black-and-white portrait to champion smarter support. It reminds readers that time spent building knowledge and processes today compounds into fewer tickets and smoother operations.

    In this guide, I’ll walk you through two phases of the journey. Phase 1 is about building a high-quality knowledge base from scratch or overhauling what you have. Phase 2 is about maintaining, optimizing, and scaling that knowledge so your AI performance keeps compounding over time.

    Definition: Knowledge management is the process of creating, organizing, sharing, and maintaining knowledge in your business.

    Fin-branded quote graphic showing a smiling person in a collared shirt beside large text about feeding an AI knowledge base, supporting a guide on knowledge management for service agents.
    Fin’s quote card blends a friendly headshot with a message to think outside the box and tap new information sources to power an AI knowledge base—ideal inspiration for service teams leveling up knowledge management.

    Your help center is the obvious example, but it’s only the tip of the iceberg. Effective knowledge management also means creating resources like FAQs, troubleshooting guides, onboarding and best-practice docs, internal support guidance, and learning materials that cover everything from everyday how‑tos to complex billing and account questions.

    It means identifying content gaps—missing troubleshooting steps, unclear policy explanations, outdated feature details, or unanswered edge cases—before your customers find them. It means implementing systems so both your Agent and your support reps can access the right information at the right time. And it means developing processes so your content stays in lockstep with product updates, policy changes, and bug fixes.

    Monochrome quote graphic for Fin with a professional headshot on the left and guidance on testing first deployments to mirror the customer experience; for knowledge management and service agents.
    From Fin's guide to knowledge management, this monochrome quote card urges teams to test their first deployment themselves so agents feel the same journey customers do, turning insights into faster, higher-quality support.

    Your knowledge base now fuels your entire support experience, not just self-serve. It’s the key to accurately answering complex questions, reducing handle time, and delighting customers across channels.

    Here’s the blunt truth I share with every team: your Agent is only as strong as what you feed it. A lack of information, messy structure, or stale documentation will tank accuracy and trust. No large language model (LLM) knows your business like you do. It doesn’t understand your customers’ needs, pain points, and use cases. That knowledge is unique to you and your organization, meaning you need to be the one to map it all out and make it available to your Agent.

    Screenshot of a customer service knowledge base page titled 'Procedure: Damaged food order', showing step-by-step guidance with verification steps, an IF rule block, tags, and Test, Save, and Set live controls in a minimalist desktop UI.
    Equip service agents with a clear playbook for damaged delivery reports. This procedure page outlines when to use the guide, how to verify evidence, and the next action to reorder—ready to test, save, and set live.

    Every investment in knowledge also has compounding results. Think of it as a flywheel: when you improve your knowledge base, your Agent solves more cases and generates better data. That data shows you what to add, update, or refine next. The sooner you plant the seeds, the sooner you’ll harvest the returns.

    Consider a simple calculation. If it takes 30 minutes to write a troubleshooting article for a common issue, that half hour often saves hours for your support reps, who no longer need to handle that query. You can estimate impact by multiplying the average time to compose a response by the frequency of the query. For customers, multiply the number of customers who ask this question by their average time to resolution to quantify time saved. Then monitor Agent involvement rate, resolution rate, and automation rate to see the compounding effect.

    Illustration of a sales agent using an AI-powered knowledge management dashboard on a laptop, with chat bubbles, documents, and analytics icons for faster answers and improved customer messaging.
    Give every seller instant, trusted answers with an AI-powered knowledge base that unifies docs, FAQs, and playbooks into a single source of truth—accelerating ramp, boosting call confidence, and improving every customer conversation.

    Phase 1: Building your knowledge base is about getting your content durable and AI-ready. I start by prioritizing what to include, where to source it, and how to audit and triage before go‑live.

    Data-driven tools can surface the right starting points. For example, platforms like Fin can surface knowledge gaps from real customer conversations where help content is missing, unclear, duplicated, or contradictory. A centralized knowledge hub then becomes your single source of truth for both customer-facing and internal content, with audience controls to ensure your Agent only uses the right materials for the right users.

    Black-and-white headshot on the left with a Fin-branded quote on the right about AI learning and improving customer support; clean, minimal graphic for knowledge management content.
    AI elevates service when teams treat deployment as a learning loop. This Fin-branded quote visual introduces our ultimate guide to knowledge management for service agents—iterate from day one to improve customer outcomes and teammate efficiency.

    Here’s how I prioritize content for the first wave. Support FAQs come first—billing changes, account updates, feature usage, troubleshooting, and policy questions. I mine the inbox and historical conversations to find the highest-frequency issues and turn them into crisp help articles the Agent can quote.

    Next, I build onboarding and setup guides so new customers reach value fast. I collaborate with customer success and product to document the fastest path to “first win,” and I ensure the Agent can reference those steps in chat and in‑product guidance.

    Black-and-white portrait of a business professional next to a Fin-branded quote urging regular audits and updates to knowledge so AI and service agents provide accurate, valuable support.
    Keep your help content fresh. A Fin quote urges support leaders to audit and update their knowledge base so AI assistants and service agents surface accurate answers that genuinely add value.

    Then I add troubleshooting and advanced guides for deeper issues and power-user workflows. I pull in product managers, engineering, and success managers to capture deeper diagnostics, known limitations, and recommended workarounds—exactly the details that prevent escalations.

    Finally, I create content for specific use cases and customer segments. Different goals and configurations require contextual guidance, so I reflect language customers actually use and tailor examples to their jobs-to-be-done.

    Monochrome headshot of a person on the left with a bold text panel titled Fin on the right, describing how training AI agents and strong knowledge bases improve customer service performance.
    Smarter support starts with better knowledge. A testimonial highlights how Fin learns from website and help center content, showing that robust knowledge bases train AI agents, raise accuracy, and yield compounding gains.

    When sourcing knowledge, I cast a wide net and consolidate it so the Agent and my team can use it reliably. That includes public help articles and troubleshooting guides; internal runbooks, escalation steps, and policy clarifications; curated snippets for short replies and exceptions; past conversations that expose gaps; relevant website pages; and documents like PDFs and DOCX with selectable text.

    Before anything goes live, I run a structured content audit. The goal is twofold: prevent the Agent from learning from outdated information, and expose gaps that will cause escalations. I divide content by product area, assign clear ownership, and set a time‑boxed review window to update, consolidate, or retire content. Shared ownership turns a daunting clean‑up into a manageable sprint.

    Monochrome headshot on the left with Fin branding and a large quote on the right stressing that strong content underpins accurate Service Agent answers and up-to-date support in knowledge management.
    Why can’t knowledge content be an afterthought? This Fin visual pairs a grayscale portrait with a bold message: great Service Agents rely on a strong, current knowledge base to deliver accurate, evolving support. Explore the guide.

    I also walk the customer journey myself—exactly as a new user would—so I can experience the Agent’s responses firsthand and spot missing topics or keywords. Where my platform supports it, I use preview and batch testing to validate coverage across common questions, then simulate more complex workflows to ensure handoffs and steps are properly defined before launch.

    After 30 days of Agent activity, I dive into the data. I look for topics driving handoffs to humans, articles correlated with low resolution rates or CSAT, and content that customers view but still escalate. Those signals tell me exactly what to write or refine next—and where to tighten conversation design or retrieval.

    Black-and-white headshot of a professional beside a large pull-quote about centralizing conversations, customer data, and knowledge on one platform to improve support, presented with Fin branding.
    Centralize your conversations, customer data, and knowledge in one place to sharpen context and speed resolutions. This Fin graphic pairs a monochrome portrait with a bold pull-quote highlighting unified platforms for better support.

    Prioritization is where impact accelerates. I focus first on the content my team shares most: top help articles, troubleshooting steps, onboarding flows, and policies. I study conversation analytics to identify the most common questions, the longest handle times, and the lowest CX scores, then close those gaps with targeted content. I also review high‑view articles that haven’t been updated recently and refresh anything affected by changes to product, policies, or plans.

    Resourcing matters. Building a high-performing Service Agent shouldn’t be a side gig. I explicitly allocate weekly time for frontline reps, support specialists, and product partners to work on content requests and knowledge improvements. A 5–10 hour per‑person cadence is a practical baseline, and it doubles as a powerful way to upskill the team for emerging AI roles.

    Hero banner with the headline 'Get started with the #1 Agent today' over a dark, colorful gradient with soft light flares, plus a centered button labeled 'Start a free trial' for a service agent platform.
    Jumpstart smarter support with the #1 Agent—organize knowledge, speed answers, and automate routine work. Click Start a free trial to see how AI elevates your service team and delivers faster resolutions.

    Writing for AI is writing for customers. I train the Agent to mirror the terms our customers use by analyzing search queries and real conversation language. I avoid internal jargon, expand acronyms, and clarify key concepts to eliminate ambiguity. When a topic invites yes/no answers, I restate the question and add the necessary context so the Agent doesn’t misinterpret shorthand. I always pair images or videos with clear explanatory text so the guidance is accessible and machine‑readable. And I structure content for scanning with crisp headings and short sections, avoiding hidden information that requires clicks to reveal.

    When I have bite‑size answers—common edge cases, policy clarifications, repetitive high‑volume queries—I collect them into focused internal snippets or compact FAQs so the Agent can retrieve and deliver precise answers quickly.

    Phase 2: Knowledge management is where the compounding value kicks in. Once live, I track the metrics that matter: resolution rate (conversations fully resolved by the Agent when it was involved), automation rate (total conversations handled by the Agent across overall volume), time saved (hours of manual work offloaded), Customer Experience (CX) Score comparisons across AI and human conversations, and CSAT parity.

    Then I put those learnings to work. Inevitably, some problems won’t be solvable on day one. That’s a gift—it shows me where to refine workflows, add clarifying steps, and strengthen knowledge depth. The richest insights often come from where the Agent struggles or escalates; those friction points become my highest‑ROI content tickets.

    Knowledge management is never one‑and‑done. As products, customers, and business goals evolve, so must the knowledge. I formalize an ongoing maintenance cadence with clear ownership, review intervals, and time blocks on the calendar. Wherever possible, I use AI‑assisted drafting to propose updates, summarize gaps, and accelerate review without sacrificing quality.

    To sustain momentum, I create a simple intake for content requests—often a lightweight ticket workflow inside our support tools—so anyone in support, success, sales, marketing, engineering, or product can flag gaps and propose improvements. The teams closest to customers usually spot the patterns first; a good intake system ensures we don’t lose those insights.

    I also bake knowledge work into every launch plan. New features, product updates, plans, and policies require Agent‑ready content at launch, not after. I partner with product, support, and product marketing to produce best practices and anticipated FAQs in advance, then I review early conversations post‑launch to spot recurring confusion and fast‑follow content needs.

    Brand consistency builds trust across every touchpoint. I standardize terminology for products, features, plans, and policies so the Agent, the help center, and human reps all speak the same language. I proof for tone, spelling, and grammar, and I use templates so content feels cohesive. I also include clear contact options for customers who need them—what channel to use, when to use it, and what to expect—so we maintain confidence even when escalation is required.

    Clarity about audience matters, too. If certain content applies only to specific roles, plans, or regions, I label it explicitly and, where my platform supports it, target content so the Agent uses the right guidance for the right segment.

    Finally, I connect the dots. When conversations, customer data, and knowledge live in one place, every interaction becomes an insight loop. A connected Agent turns support into a retrieval-first pipeline, making it far easier to diagnose issues, improve accuracy, and continuously raise the bar on customer experience.

    Behind every high-performing Agent is a rigorous, AI-friendly knowledge management practice. Treating knowledge as a core service function—not a project—creates systems that improve with every conversation. That’s how we transform support from a cost center into a compounding engine for customer satisfaction, operational efficiency, and growth.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • The Counterintuitive Playbook for CLI Agents: Why Ruthless Subtraction Beats Feature Creep

    The Counterintuitive Playbook for CLI Agents: Why Ruthless Subtraction Beats Feature Creep

    I’ve learned the hard way that the fastest path to a reliable command-line agent is radical subtraction. "In the last month of developing Amplitude Wizard CLI, we cut more than we added. Learn less is more when it comes to building CLI agents." That decision was less about minimalism and more about product strategy: constraints sharpen behavior, clarify intent, and raise trust.

    When I evaluate agentic AI systems, especially those that act on developer environments, I start by asking what the agent must never do. By establishing hard guardrails first, the design naturally converges on an opinionated, safe, and teachable interface. Every additional flag, tool, or permission expands the blast radius; every removal shortens the path to first success.

    For CLI agents, the most valuable product choice is a narrow toolset with sane defaults. Opinionated workflows reduce cognitive load and failure modes, while clear human override points keep users in control. I prefer a bias toward idempotent actions, reversible changes, and explicit confirmation gates for anything destructive. If a feature can’t explain itself in a single, crisp sentence in the help text, it likely doesn’t belong.

    Security and reliability flow from limits. Progressive permissioning, scoped credentials, and time-bounded tokens prevent the agent from wandering. Dry-run modes build confidence without side effects. When a user can reason about what the agent will and won’t do, adoption accelerates—and support tickets plummet.

    Observability is the other half of trust. I instrument "Agent Analytics" across every run: inputs, tool choices, durations, outcomes, and error patterns. Those signals reveal where the agent gets confused, which steps users abandon, and which prompts need pruning. With that loop in place, "less is more" stops being a philosophy and becomes an evidence-backed operating model.

    I anchor the roadmap in eval-driven development. Before adding a capability, I define a measurable task, a success threshold, and the smallest viable interface to reach it. If the capability can’t lift completion rate, time-to-first-success, or re-run stability, it waits. That simple discipline protects the experience from feature creep and preserves velocity in CI/CD.

    Under the hood, I design for a retrieval-first pipeline and careful context window management. The agent should fetch only the minimally relevant facts, present a compact plan, and execute predictably. Thoughtful prompt engineering helps—but prompts are not a substitute for clear boundaries, deterministic tool contracts, and robust error handling.

    Documentation is product. I maintain docs-as-code with runnable examples that mirror the golden paths. When the docs and the CLI disagree, the CLI changes—never the docs. This creates an internal forcing function: if we can’t document it simply, we probably shouldn’t ship it.

    My litmus test for any proposed addition is simple: does this make the mental model smaller? If not, cut it, make it progressive, or hide it behind a clearly named subcommand. Defaults should be boring, safe, and fast. Advanced power should be opt-in and discoverable without overwhelming new users.

    The paradox of agentic AI is that capability grows as surface area shrinks. By removing distractions, we amplify signal, increase repeatability, and earn the right to add the next carefully chosen step. The result is a CLI agent that feels sharp, dependable, and—most importantly—useful on day one.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Speed-to-Lead Is Dead: How AI Agents End the Wait and Rebuild a High-Velocity Sales Org

    Speed-to-Lead Is Dead: How AI Agents End the Wait and Rebuild a High-Velocity Sales Org

    A prospect lands on our site, skims pricing, watches a demo, and clicks “contact sales.” For years, that’s where momentum died. They waited, and we built entire sales motions around managing that delay.

    We optimized for “speed-to-lead,” made it the hallmark of a high-performing sales development org, hired more SDRs, tuned routing rules, added shift coverage, and stared at response-time dashboards. Typical SLA targets were one hour for best-fit leads, four hours for core MQLs, forty-eight hours for everyone else. Those were considered good numbers.

    No one questioned the premise because the lag felt structural—shift scheduling, routing delays, and humans working 9–5. The fastest teams could only shrink the gap; nobody could remove it.

    An AI Agent closes it completely.

    When a prospect arrives today, the conversation can begin immediately. That single change reshapes how I design a sales org—how we staff it, what our team prioritizes, and the metrics we hold ourselves accountable for.

    Step outside our dashboards and look at the buyer experience. We spend heavily to drive traffic, then push visitors into forms and queues that add friction precisely when purchase intent peaks.

    Intent is highest the moment someone seeks out our product. If an SDR follows up two or three hours later, that buyer’s in another meeting, the urgency has faded, and the moment is gone. We still call it a lead; the buyer has already moved on.

    What AI changes

    Agents eliminate the structural constraints that made speed-to-lead a problem—shift scheduling, routing delays, CRM batch processing, the SDR being on another call. None of it applies anymore because every single lead can be engaged immediately, at any hour and in any language.

    The impact goes beyond response time. When an Agent engages at peak intent, qualification, discovery, and even an initial demo moment can unfold in a single, continuous conversation. The gated funnel collapses. There’s no reason to qualify someone today, schedule discovery for Thursday, and demo the following week when the conversation is already happening.

    The constraint the industry built around simply isn’t there anymore. We’re already seeing it with Fin, a Customer Agent. As sales leaders, we need to frame this differently.

    If speed-to-lead is no longer the constraint, the knock-on effects reach every part of the org.

    Minimalist hero graphic with the headline 'Add Fin to your sales team today,' a glossy 3D blue spiral at center, and a black 'Start free trial' button, promoting Fin for Sales as an AI customer agent.
    Introduce Fin for Sales to your team with this clean hero banner: bold headline, signature blue spiral, and a clear 'Start free trial' call to action—inviting readers to explore an AI customer agent built for revenue.

    SDRs focus on moving deals forward. Instead of frontline triage, they double down on phone-based selling and relationship building, complex deal navigation, and multi-threaded engagement across stakeholders—the high-leverage work that used to get crowded out by the inbox.

    Pipeline gets more relevant. The old model rewarded volume: capture as many form fills as possible, respond fast, and sort quality later. When an Agent engages at the moment of intent, it qualifies during the conversation. Low-fit leads get filtered out before they reach the team, and high-fit prospects arrive with context—needs, timeline, stakeholders—instead of just a name and email.

    You measure outcomes, not response time. When first response is instant, different metrics matter. I anchor on three questions:

    1) Is the Agent doing the work? Completion rate, qualification rate, and contact capture rate indicate whether conversations reach clear outcomes and produce usable handoffs to the team.

    2) Is the work producing pipeline? Meetings booked and pipeline created through Agent-handled conversations are the leading indicators of revenue, not how fast someone followed up.

    3) Are buyers having a good experience? Conversation-level satisfaction matters more than ever because the Agent is the first interaction prospects have with your company. The experience it delivers is the first impression you make.

    These three questions reveal whether the motion is working. Time-to-first-response can’t.

    Sales orgs built hiring plans, workflows, and performance metrics around beating intent decay. That made sense when the lag was unavoidable. It isn’t anymore.

    An Agent is always on. It engages the moment a prospect arrives on your site, qualifies them in real time, and routes them to the right outcome without waiting for someone to be free. The lag the industry built itself around doesn’t exist when the conversation starts immediately.

    The companies leaning into this are investing in what happens after the conversation starts: how well the Agent qualifies, where it creates pipeline, and what SDRs should actually spend time on. What matters now is not how fast you respond, but what the conversation produces.

    Speed-to-lead made sense when the delay was structural. It isn’t anymore. If you’re re-architecting go-to-market, instrument Agent Analytics, revisit SDR charters, and tighten CRM integration so every qualified handoff is instant, traceable, and revenue-linked.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Prompt Like a Pro: Three Battle-Tested Tips for Amplitude Global Agent Success

    Prompt Like a Pro: Three Battle-Tested Tips for Amplitude Global Agent Success

    When I guide teams building agentic AI features, I’ve seen a single prompt turn Amplitude Global Agent into either a world-class analyst or a well-meaning rambler. The difference isn’t magic—it’s method. With the right structure and iteration, we consistently get faster, clearer insights that stand up to product and analytics scrutiny.

    AI has gotten really good, but success still depends on the quality of your prompts. Explore three best practices for prompting in Amplitude Global Agent.

    Tip 1 — Define the role, goal, and guardrails. I begin every prompt by stating the agent’s role (for example: “You are a product analyst”), the business objective (“identify activation drop-offs by cohort”), and the boundaries (“use only Amplitude analytics events and properties provided; return JSON with metric, segment, timeframe”). This simple pattern reduces ambiguity, improves context window management, and yields outputs I can compare across runs.

    Tip 2 — Ground the model with concrete context and examples. Agent outputs improve dramatically when I supply the exact data it should reference: event names, properties, segments, filters, and timeframes. I often include a short example—one ideal question and one ideal answer—to anchor tone, structure, and depth. Think retrieval-first pipeline: feed the agent authoritative snippets (definitions, dashboards, prior queries) rather than hoping it guesses. That’s how I cut hallucinations and make results reproducible for LLMs for product managers.

    Tip 3 — Iterate with measurement, not vibes. I version prompts, A/B test variants, and log inputs/outputs so I can score quality with lightweight evals (accuracy against known answers, clarity, and actionability). Over time, a small library of “winning” prompts emerges for common AI workflows—activation analysis, retention cohorts, anomaly detection—so the team can move from tinkering to repeatable performance. This is where Agent Analytics practices pay off: we inspect outcomes, not just outputs.

    A practical starter structure I use: Role and Audience; Objective and Success Criteria; Data Context (events, properties, segments, timeframe); Constraints (sources, methods, privacy); Output Format (tables/JSON, fields, length); Examples (one good Q/A); and Fallbacks (what to do when data is insufficient). Even written as plain language, that scaffold reliably steers Amplitude Global Agent to precise, defensible answers.

    The emotional arc here is familiar: when the agent nails a complex funnel question in one pass, the team gets that “oh wow” moment; when it meanders, morale dips. Clear prompting turns those spikes of delight into a steady cadence of wins—less rework, faster learning loops, and cleaner handoffs from discovery to delivery. In short, invest in prompt engineering once, and you compound gains across every analysis session.

    If you’re just getting started, pick one critical question (for example, activation or retention), apply the three tips above, and commit to two to three prompt iterations with scoring. Within a single sprint, you’ll have a robust template you can reuse and adapt—helping Amplitude Global Agent deliver trustworthy insights at the speed your product strategy demands.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Beyond Accuracy: How I Evaluate AI Customer Service Agents That Delight and Scale

    When teams evaluate AI Agent options for customer service, I often see the rigor aimed at the wrong subset of criteria. After leading and observing dozens of proof of concept (POC) efforts with our customers and prospects, I understand why performance—accuracy scores, resolution rates, and benchmark tests on curated datasets—soaks up most of the attention. But those indicators alone won’t guarantee success once you leave the sandbox and face real customers.

    If your POC only proves that the AI “works,” you’re missing the bigger picture. Here’s what else I look for to make the best long-term decision.

    How does it handle your real-world setup?

    Performance is table stakes, but it has to reflect the messiness of an actual support environment. The best-performing Agents don’t just get answers right—they exhibit resilient, human-like behavior under pressure. I watch how the Agent behaves when it doesn’t know an answer: does it recover or spiral? Does it stay on track through multi-step requests, and how gracefully does it hand off to human agents? If your knowledge base depends on a retrieval-first pipeline, test cross-source retrieval and grounding—not just single-document lookups.

    When I build evaluation scenarios, I put the Agent through its paces with a broad, realistic mix:

    • Multi-turn queries that require the Agent to carry context across a conversation, not just answer isolated questions.
    • Vague or fragmented inputs, like typos, grammatical errors, and incomplete questions, because that’s how customers actually write.
    • Edge cases and sensitive scenarios, like billing disputes, frustrated customers, and questions that sit at the boundary of what the Agent is trained on.
    • Different phrasings of the same question. An Agent that handles one version well but fails on a rephrasing has a knowledge problem, not a performance problem.
    • Queries that require pulling from multiple knowledge sources. Real issues are rarely answered by a single help article, and an Agent that can only handle single-source questions will hit a ceiling fast.
    • Multilingual conversations, if your customer base requires it. Performance can vary significantly across languages and it’s better to discover that in testing than in production.

    This preparation is worth the effort. Any Agent can look impressive in a demo; what matters is how it holds up as part of your team, serving your customers in production.

    What does it feel like to interact with the Agent?

    Two AI Agents can post the same quantitative scores—resolution rates, containment rate, and more—and still deliver very different customer experiences. Resolution rate tells me whether the Agent finishes conversations; it says nothing about how customers felt during them. I deliberately assess the experience, not just the outcome, because conversation design shapes trust and brand perception.

    Here’s what I look for to ensure the AI Agent is enjoyable to interact with:

    • Is the tone natural and on-brand, or does it feel robotic and generic?
    • Does it build trust early in the conversation, or does it create friction that makes customers want to immediately request a human?
    • When it doesn’t know the answer, does it handle that gracefully?
    • When it hands off to a human, is that transition seamless, or does the customer feel abandoned?

    As George Dilthey at Clay put it when evaluating their AI setup: “Keep what’s important to your business up front and center. For us, that was transparency and control over the customer experience.”

    That framing is exactly right. The Agent represents your brand in every conversation. Customers don’t experience “accuracy,” they experience conversations. An Agent that’s technically accurate but tonally off-brand will erode customer trust over time.

    I make the experience dimension explicit in my POCs. I have people on my team—and when possible, a small cohort of real customers—interact with the Agent under realistic conditions. Then I ask how it felt, not just whether it worked.

    Can you keep improving it after launch?

    This is the dimension most teams don’t evaluate at all, and it’s possibly the most important one. Choosing an Agent that works today and ensures you can continuously improve the customer experience over time requires more than a functional demo. You’re buying a system that must get better every week, not just during the first sprint.

    The feedback loop

    Can your team easily review conversations and identify where the Agent is underperforming? Can you pinpoint specific gaps (missing knowledge, incorrect tone, poor handoff decisions) and act on them quickly? The faster the loop between “something isn’t working” and “we’ve fixed it,” the more value compounds over time. In practice, that means instrumenting conversations, leveraging Agent Analytics, tagging misroutes and tone slips, and running targeted evals on known failure modes.

    The speed of iteration

    When you identify a gap, how quickly can you address it? This is partly a question of tooling (how easy is it to update knowledge, refine guidance, adjust behavior?) and partly a question of team capability. The teams getting the most out of AI are the ones that have changed how they operate and made continuous improvement a part of their everyday work. They’ve committed to going all-in for the long term, not just the first few weeks when launching their AI Agent. We treat this as eval-driven development: automate evaluations that mirror real tickets, tighten prompt engineering and retrieval settings, and ship small fixes daily.

    The vendor partnership

    The vendor behind the Agent matters just as much as the solution itself. You’re choosing a partner for transformation that will help you evolve how your business delivers customer experience. Ask:

    • How does customer feedback influence the product roadmap, and can they show you examples?
    • If you have feedback on limitations or weaknesses, do they engage transparently or get defensive?
    • What kind of support will you get post-launch?
    • Are they shaping where AI customer experience is going, or reacting to what others are building?

    How a vendor responds to those questions tells you more about the long-term relationship than any benchmark result.

    What a good POC proves

    If your POC only proves “the AI works,” you haven’t done enough. A strong proof of concept tests performance in realistic conditions, evaluates the experience from the customer’s perspective, and validates the system that will support continuous improvement after launch. Done well, it sets you up for long-term operational success and builds organizational AI readiness—not just a flashy demo.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Stop Losing Users: How a Second Message and Prompt Audit Drive 2–3x Retention

    Stop Losing Users: How a Second Message and Prompt Audit Drive 2–3x Retention

    Default prompts are quietly sabotaging agent retention. I learned this the hard way while reviewing early funnels for our voice and chat agents—engagement looked great at the greeting, but the moment the agent stopped after a single reply, the conversation flatlined. The fix wasn’t a fancy LLM trick; it was a disciplined second message and a rigorous audit of defaults across every entry point.

    When an AI agent opens with a generic, low-friction greeting and then waits, users hesitate. Cognitive load rises, intent stays fuzzy, and drop-off follows. A thoughtful second message—delivered quickly, with clarity and options—reduces ambiguity and gives people a low-effort path to progress. It’s a small behavioral nudge that pays off in outsized retention gains.

    Here’s the pattern that consistently works for me. First, keep the initial default prompt short, confident, and specific to the channel and task domain. Then ship a fast follow-up if the user hesitates for a few seconds. That second message should clarify what the agent can do, present 2–3 concrete choices, and invite free-form input. I’ve repeatedly seen this simple sequence unlock a 2–3x retention lift in early sessions, especially for first-time users.

    Auditing default prompts is where the leverage lives. I inventory every ingress—web widget, IVR, SMS, in-app, help center—and catalogue the exact default system, developer, and user-facing prompts. Then I inspect turn-1 and turn-2 transcripts in Agent Analytics to quantify where users stall: time-to-first-intent, clarification rate, option selection rate, and completion. This makes the drop-off visible and turns “vibes” into data we can A/B test.

    Designing the second message is a conversation design exercise, not a copy tweak. My recipe: empathize with the user’s likely uncertainty, constrain scope so the agent appears capable, and apply choice architecture. For voice AI agents, I keep it shorter, use confirmation questions, and bias toward read-back for accuracy. For chat, I include tappable options and examples that mirror top intents. The goal is momentum without feeling pushy.

    Operationally, I run controlled A/B tests on default and second-message variants, sized to a realistic minimum detectable effect. I segment by source (ad, organic, support), device, and use case, because the winning prompt for sales qualification rarely matches the one for customer support. With proper instrumentation in our analytics stack, we track retention curves over the first 3–5 sessions, not just single-session reply rates, to avoid optimizing for chatter over outcomes.

    Strong prompt engineering underpins the experience. I keep system prompts stable and explicit about persona, tone, and refusal behavior; manage the context window so examples don’t drown live intent; and use a retrieval-first pipeline when domain knowledge matters. The most expensive mistake I see is shipping defaults like “How can I help you?” without guardrails or examples—great for demos, bad for real users.

    If you’re starting fresh, begin with a prompt audit this week: list all defaults, map them to top intents, and pair each with a channel-appropriate second message. Instrument the funnel, launch two variants, and set a crisp success metric (e.g., turn-2 continuation rate to task start, then task completion). This is one of those rare changes that is simple to ship and compounds across onboarding, activation, and long-term retention.

    The takeaway is straightforward: don’t let your best work stall after the first reply. A disciplined second message and a focused default prompt audit will lift engagement, reduce ambiguity, and create the kind of early momentum that sustains retention over time.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Supercharge Core Web Vitals with Amplitude’s Global Agent: Faster Rankings, Happier Users

    Supercharge Core Web Vitals with Amplitude’s Global Agent: Faster Rankings, Happier Users

    I measure product health by a simple equation: speed plus clarity equals trust. That’s why I prioritize Core Web Vitals and search performance together—because the fastest path to better UX and higher rankings is a closed loop between measurement, diagnosis, and action. Standardizing on Amplitude’s Global Agent with Amplitude AI Agents let my teams compress that loop from weeks to hours, and in many cases, to minutes.

    Learn how to track your web vitals and page rankings faster with Amplitude AI Agents and improve your site’s user experience and SEO rankings. That goal sounds ambitious, but with the right instrumentation and analytics workflow, it becomes a repeatable operating rhythm rather than a one-off project.

    Here’s what changed for us with Amplitude’s Global Agent: a single, consistent way to capture performance signals across pages and journeys, unified context for every session, and a lightweight footprint that doesn’t get in the way of speed. By centralizing measurement, we eliminated blind spots and gave product, growth, and engineering one shared truth for Core Web Vitals and behavioral analytics.

    My practical playbook is straightforward: 1) Establish a performance baseline for Core Web Vitals on key templates and critical user paths. 2) Segment results by device, location, acquisition channel, and content type to surface where users actually feel the friction. 3) Connect those vitals to downstream behaviors—scroll depth, engagement, and conversion—so we prioritize fixes that move business outcomes, not just lab scores. 4) Use feature flags and A/B testing to ship improvements safely and quantify uplift. 5) Close the loop with Agent Analytics to keep learnings visible and actionable.

    Operationally, we rely on anomaly detection to flag regressions early, CI/CD guardrails to prevent performance slips at deploy time, and observability plus session replay to accelerate root-cause analysis. This combination reduces mean time to resolution, protects page experience during fast iteration cycles, and helps us avoid trading UX for speed—or vice versa.

    The strategic benefit is compounding: better Core Web Vitals improve user perception and increase engagement, which strengthens SEO signals and, ultimately, page rankings. With a unified analytics platform in place, we can spotlight the few improvements that create outsized gains, then scale those patterns across the site with confidence.

    If your roadmap includes faster pages, stronger rankings, and happier users, align your teams around this simple loop: measure precisely, diagnose quickly, experiment safely, and learn continuously. Amplitude’s Global Agent and Amplitude AI Agents give you the instrumentation and insight to make that loop your competitive advantage.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Stop Asking AI Anything: The 3 Outcome-Based Prompts That Unlock Real Product Insights

    Stop Asking AI Anything: The 3 Outcome-Based Prompts That Unlock Real Product Insights

    Too often I watch teams ping a global agent with vague AMAs and then wonder why they get generic summaries instead of decisive guidance. When I lead product reviews, I push the team to treat AI like a partner in decision-making, not a trivia engine. That simple mindset shift transforms how quickly we move from questions to confident action.

    AI isn’t built for AMA (ask me anything). Get recommendations for outcome-based questions for the best results with Amplitude AI.

    In practice, outcome-based prompting means I don’t ask an agent to “analyze the data.” I ask it to help me reach a specific product decision, grounded in behavioral analytics and connected to our outcomes vs output OKRs. To make that concrete, I always frame my prompts around three things.

    First, I state the outcome and metric. I name the business goal and the exact measure in Amplitude analytics that will validate success—activation rate, funnel conversion from A to B, or 8-week retention. I’ll reference the relevant events, segments, or driver trees so the agent has a crisp target. This is where product strategy meets measurement discipline.

    Second, I define the context and constraints. I specify the user cohort, the timeframe, and the surface area I care about—new self-serve signups in the last 30 days, first-session behavior on web only, or EU traffic where data governance rules apply. On a unified analytics platform, this context lets an agentic AI narrow its search to the highest-signal slices of behavioral analytics rather than pattern-matching across noise.

    Third, I declare the decision and deliverable. I tell the agent exactly what I will do next and the format I need to act: a ranked list of levers for an A/B testing plan, a recommended prompt engineering template for in-app guides, or a one-page brief I can hand to the growth team. Clear decisions lead to clear outputs; vague intents lead to vague answers.

    Operationally, I turn these three elements into reusable prompt templates, and I track their performance with Agent Analytics. I review traces to see which inputs drive the best recommendations, and I refine prompts the same way I iterate on product copy. For LLMs for product managers, this is the craft: small, testable improvements that compound into outsized impact.

    Here’s a quick example. When I needed to lift user activation, I asked for a prioritized set of friction points blocking first-value within 24 hours for new self-serve accounts, based on last month’s data. I defined activation as completing event X within Y hours, asked the agent to analyze top drop-offs in the funnel, and requested an action plan with two experiment ideas and success thresholds. The response mapped behaviors to interventions, connected to retention analysis, and gave me a prompt engineering snippet for the onboarding nudge we shipped the same week.

    If your AI workflow still starts with “What does the data say?”, you’ll keep getting broad narratives. Start with outcomes, sharpen the context, and specify the decision you will make. That’s how Amplitude analytics, paired with agentic AI, stops being interesting and starts being indispensable.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Proven 3-Step Playbook to Quantify AI Agent ROI: Boost Revenue, Cut Costs, Reduce Risk

    AI agents are only as valuable as the measurable outcomes they deliver. In my role leading product strategy at HighLevel, I’ve learned that the fastest way to earn executive trust is to translate agent performance into clear revenue impact, cost savings, and risk reduction. The challenge isn’t enthusiasm for AI; it’s creating a disciplined, repeatable way to prove business value.

    Here’s the three-step playbook my teams and I use to quantify the value of agentic AI, align stakeholders, and scale what works.

    Step 1 — Define value outcomes and success criteria. Start with a driver tree that ties agent outcomes to company-level goals. For revenue, target conversion lift, average order value, and expansion (e.g., trial-to-paid, self-serve upsell). For cost, focus on containment/deflection rate, reduced handle time, and lower cost to serve. For risk, measure error rates, hallucinations, security/policy violations, and customer complaint rate. Convert these into outcomes vs output OKRs, set baselines, and pre-commit to thresholds for launch, scale, or rollback. This ensures the team is accountable to business KPIs, not vanity metrics.

    Step 2 — Instrument comprehensively and establish baselines. Instrument the full journey: prompts, responses, human-in-the-loop events, escalations, feedback, and downstream conversions. Capture both leading indicators (time-to-first-value, containment rate, self-serve completion) and lagging outcomes (NRR, churn, LTV/CAC). Use behavioral analytics, session replay, product tours, and in-app guides to contextualize what users do before and after agent interactions. Baselines matter—freeze a control period so improvements are truly incremental.

    Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.

    Step 3 — Experiment, attribute, and risk-adjust. Treat every agent capability like a hypothesis. Run A/B tests or holdouts with a precomputed minimum detectable effect so you can ship confidently. Attribute outcomes to the agent by linking events to conversions and support deflection, and calculate ROI as (incremental revenue + cost avoided – total operating cost, including model/API, labeling, and oversight). Apply AI risk management by tracking false positives/negatives, escalation rate, and policy breaches; adjust ROI with a risk score so the “cheapest” agent isn’t inadvertently the riskiest. This is eval-driven development in practice: define success, measure, iterate.

    Operationalizing the playbook requires crisp reporting. Stand up Agent Analytics dashboards in your unified analytics platform that roll up per-agent KPIs, funnel performance, cohort trends, and experiment results. Review them in QBRs and with frontline teams to connect numbers to lived customer experience. When metrics improve, amplify with product-led growth motions—targeted in-app guides and lifecycle nudges to get more users into high-value agent flows.

    What does this look like in the real world? Early on, we celebrated “tickets deflected” and missed that some conversations quietly increased churn risk. After we adopted this three-step approach, we saw the full picture: a modest dip in deflection quality was offset by a larger lift in expansion revenue and a meaningful drop in time-to-resolution. The risk-adjusted ROI was unambiguous, and the CFO greenlit broader rollout.

    If you’re building or scaling AI agents, anchor on outcomes, instrument ruthlessly, and insist on experimentation. With the right measurement discipline, you’ll know exactly which agents deserve more investment, which need redesign, and which should be retired. The result is a portfolio of agents that reliably drive adoption, engagement, and durable business value.


    Inspired by this post on Pendo – Best Practices.


    Book a consult png image
  • No More Accidental Agents: How We Engineered Global Agent’s Helpful, Curious Personality

    No More Accidental Agents: How We Engineered Global Agent’s Helpful, Curious Personality

    Most teams ship AI agent personalities by accident—emergent quirks, brittle prompts, and uneven behavior. We refused to let that happen. From day one, we treated personality as a first-class product surface, one that should be designed, instrumented, and iterated with the same rigor as any core capability.

    Learn how we designed Global Agent’s personality and fine-tuned its inquisitiveness and helpfulness using Agent Analytics.

    In my role leading product at HighLevel, Inc., I framed our approach around agentic AI and conversation design: personality is not “flavor text”; it is the control system for how an agent interprets context, asks questions, and decides when to act. Our product strategy prioritized clarity, empathy, and consistency—so the agent would be curious enough to resolve ambiguity without becoming interrogatory, and helpful enough to move work forward without overstepping.

    We made that intent measurable. Using behavioral analytics, we defined operational signals such as clarification-question rate, resolution-path efficiency, and escalation quality. We combined eval-driven development with targeted A/B testing to compare prompt patterns and tool strategies, ensuring each change had a clear hypothesis and measurable outcome.

    To calibrate inquisitiveness, we mapped decision points where the agent should ask follow-ups versus proceed autonomously. Prompt engineering codified those thresholds, while a retrieval-first pipeline reduced unnecessary questions by improving context completeness up front. When the agent did ask, we constrained tone and cadence to keep queries concise, respectful, and progress-oriented.

    To enhance helpfulness, we prioritized precise action-taking and unambiguous guidance. Context window management preserved relevant facts without diluting intent, and guardrails aligned with AI risk management principles ensured the agent stayed within policy, privacy, and compliance boundaries. The result was an assistant that resolved more tasks end-to-end, with fewer stalls and clearer handoffs when human help was warranted.

    Agent Analytics became our nervous system. We instrumented every dialog turn to attribute outcomes to design choices, then used driver trees to connect micro-behaviors to macro results like time-to-resolution and customer satisfaction. This closed-loop view let us ship confidently, knowing which levers improved helpfulness, which sharpened curiosity, and which merely added noise.

    Process mattered as much as tooling. Product trios ran continuous discovery with customers to surface edge cases—ambiguous intents, multi-intent turns, and sensitive scenarios—while our engineering partners operationalized experiments with clean rollback paths. We favored small, testable changes over sweeping rewrites, building momentum and trust with each iteration.

    The payoff is a personality that feels consistent across use cases: curious when clarity is missing, decisive when action is obvious, and transparent when limits are reached. Users experience fewer dead ends, faster resolutions, and a brand voice that shows up the same way every time—because it was defined, measured, and improved on purpose.

    If you’re building agentic AI, don’t leave personality to chance. Treat it like a product: set clear outcomes, instrument deeply with Agent Analytics, and iterate with eval-driven development and A/B testing. That’s how curiosity becomes a feature, helpfulness becomes a habit, and your agent becomes reliably, intentionally excellent.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image