Tag: Net Recurring Revenue (NRR)

  • AI Evals for Product Managers: How I Measure Agent Quality—A Beginner’s Playbook

    AI Evals for Product Managers: How I Measure Agent Quality—A Beginner’s Playbook

    I’ve led multiple AI agent launches, and the single most reliable way I’ve found to ship with confidence is to treat evaluations as a product capability, not a side project. When we make AI quality measurable, predictable, and comparable over time, we move faster, reduce risk, and build trust with customers and stakeholders.

    Learn how product managers use AI evaluations to measure agent quality. Covers traces, LLM judges, offline evals, online evals, and how to connect evals to product outcomes.

    Why does this matter so much in product management? Because agent quality is only meaningful when it drives adoption, satisfaction, and revenue. I use eval-driven development to align the day-to-day iteration of prompts, policies, and workflows with business outcomes like activation, retention, and Net Recurring Revenue (NRR). That alignment turns AI quality from an abstract notion into a roadmap lever.

    First, traces. Traces are the spine of evaluation for agentic AI: they capture inputs, intermediate steps, tools invoked, and final responses. I instrument traces to make reasoning visible—what the agent tried, where it hesitated, and why it chose a path. With that visibility, I can compare prompts, policies, and tools, and I can teach the team to fix the root cause instead of patching symptoms. This is also where Agent Analytics becomes real: we move from anecdotes to observable behavior trends across cohorts and use cases.

    Next, LLM judges. I use model-as-judge to score qualities like helpfulness, coherence, or adherence to brand and policy. The trick is calibration. I pair LLM judges with a small, high-quality human-labeled set to ground the scale, then monitor drift as models, prompts, or data shift. LLM judges help me evaluate at speed, but I still spot-check edge cases and highly regulated flows to balance efficiency with risk controls.

    Offline evals come first. Before I expose users to changes, I run fixed test suites representing core scenarios, failure modes, and edge cases. I include golden examples, adversarial prompts, and domain-specific queries. Metrics cover task success, factuality, safety, latency, and cost. This is where prompt engineering and retrieval quality are tuned; if I’m using a retrieval-first pipeline, I evaluate evidence quality separately from generation so improvements are attributable and reproducible.

    Online evals follow to validate real-world performance. I roll changes out behind feature flags and use A/B testing to compare variants under production conditions. I track conversation outcomes, tool success rates, fallbacks to human support, and user satisfaction. These online signals close the loop on whether an offline improvement actually compounds value in the product—critical for product-led growth.

    Connecting evals to product outcomes is non-negotiable. I map quality signals to a driver tree: from per-turn scores (helpfulness, safety, latency) up to session-level outcomes (task completion, deflection, revenue intent), and finally to product KPIs (activation, retention, NRR). With this structure, I can set thresholds for launch gates, prioritize roadmap items that move the biggest levers, and build dashboards that leadership understands at a glance.

    A few lessons learned. Start with a minimal but durable test set and grow it as you discover new failure modes. Version everything—prompts, tools, and datasets—so you can reproduce wins. Beware metric drift when you swap models or update prompts. Blend human review where the cost of error is high. Above all, make evaluations part of your AI workflows and sprint rituals so quality improves continuously, not sporadically.

    If you’re just getting started, begin with traces and a small offline suite, add LLM judges for scale, then prove impact with a focused online experiment. Within a few cycles, you’ll have a living evaluation system that guides decisions, accelerates delivery, and gives your team—and your customers—confidence in every AI release.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Stop Chasing Churn: How Behavioral Analytics Powers Proactive Retention in SaaS

    Stop Chasing Churn: How Behavioral Analytics Powers Proactive Retention in SaaS

    Churn is a lagging indicator—and by the time I see it in a dashboard, the moment to change a customer’s mind has usually passed. At HighLevel, I’ve learned that durable retention starts long before a cancellation ticket, with product-led growth habits, customer success partnerships, and a clear view of user behavior that flags risk early and often.

    Stop chasing SaaS churn after it happens. Learn how proactive product and service experiences, powered by behavioral analytics, help reduce churn before users leave.

    My operating model is simple: treat retention as a design problem, not a rescue mission. I anchor our strategy in behavioral analytics and retention analysis, translating leading indicators—activation milestones, time-to-first-value, depth of feature adoption, and expansion intent—into outcomes like Net Recurring Revenue (NRR) and cohort-based retention. When these inputs move in the right direction, churn becomes the exception, not the trend.

    To get there, I start with rigorous journey mapping and continuous discovery. We define the exact “aha” moments that signal value realization, instrument events across the funnel, and segment cohorts by persona, plan, and use case. Tools in a unified analytics platform (e.g., Amplitude analytics or Pendo) help us pinpoint where engagement decays, which features predict stickiness, and which friction points block activation. This evidence replaces hunches and lets us prioritize the highest-leverage work.

    From those signals, I build a transparent risk score that anyone can use. It blends usage momentum (DAU/WAU), core feature frequency, anomaly detection on key behaviors, billing and payment health, and support sentiment. When the score crosses a threshold, we trigger plays—inside the product and through customer success—so we’re helping users before they drift, not pleading after they’ve left.

    On the product side, I favor lightweight, contextual interventions: in-app guides tailored to stalled tasks, checklists that shorten time-to-value, adaptive product tours, and tooltip design that clarifies the next best action. We A/B test these experiences with a clear minimum detectable effect (MDE), watching both local metrics (feature completion, error rate) and global metrics (activation, retention). The goal is precision—right nudge, right user, right moment—without adding cognitive load.

    On the service side, we run consultative support and customer success plays keyed to the same behavioral triggers. A sudden drop in core usage may prompt a quick diagnostic call; repeated failed integrations can route to solutions engineering; stalled accounts get value reviews or QBRs focused on outcomes, not feature checklists. Because product and service draw from the same data, customers experience a single, coherent journey.

    Proactive retention also depends on smart packaging and pricing. When value metrics mirror how customers win, plan boundaries reinforce the right behaviors and reduce “silent churn” caused by misaligned tiers. Outcome-based pricing and clear upgrade paths can turn potential risk into expansion rather than attrition.

    Operationally, I keep a weekly retention review with product trios and customer success leaders. We walk driver trees from inputs (activation, engagement depth, support friction) to outputs (NRR, churn), review session replay where confusion spikes, and commit to small, measurable experiments. This cadence compounds learning and keeps us honest about what’s moving the needle.

    If you’re starting fresh, begin with four moves: define an activation milestone tied to value; instrument the few events that prove users are on track; build a basic risk score from those events; and craft three plays—one in-product, one lifecycle message, one success outreach—triggered by that score. You’ll create a flywheel where insights power interventions, and interventions feed better insights.

    Churn will always exist, but it doesn’t have to be a cliff. With behavioral analytics guiding both product and service experiences, we can make retention the natural outcome of how we build, communicate, and support—long before a customer ever thinks about leaving.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Stop Silent Churn: The 8 Best SaaS Prediction Tools for 2026 (Features + Use Cases)

    Stop Silent Churn: The 8 Best SaaS Prediction Tools for 2026 (Features + Use Cases)

    Churn isn’t just a retention problem—it’s a product, go-to-market, and strategy signal that shows up everywhere in the customer journey. Over the past few years, I’ve evaluated and implemented churn prediction tools across high-growth SaaS environments, and the difference between reactive firefighting and proactive, data-driven retention is night and day.

    Compare the top 8 churn prediction tools for SaaS teams. Features, use cases, and how each stacks up, so you can act before customers quietly leave.

    When I assess churn prediction tools for product-led growth, I start with a simple question: will this help my team see risk early enough—and clearly enough—to intervene with precision? The best platforms combine behavioral analytics, retention analysis, and anomaly detection to surface leading indicators before Net Recurring Revenue (NRR) takes a hit.

    First, signal coverage matters. Strong churn models draw from product usage events, CRM integration, support tickets, billing health, and even session replay to capture real-world behavior. I look for native connectors to systems like Intercom, Pendo, and Amplitude analytics, plus flexible ingestion for custom events. Without comprehensive signals, even the smartest models will miss critical moments such as stalled onboarding, shrinking active seats, or feature disengagement.

    Second, I require transparent risk scoring and clear drivers. Black-box scores erode trust with Customer Success and Product teams; explainability builds alignment. Tools that expose driver trees, cohort-based retention analysis, and segment lift help me translate insights into prioritized experiments. When possible, I tie predicted churn segments to A/B testing with a thoughtful minimum detectable effect (MDE) so we can quantify impact quickly and avoid overfitting to noise.

    Third, actionability is non-negotiable. Predictions must trigger targeted AI workflows, in-app guides, and product tours—not just dashboards. My ideal setup routes high-risk cohorts to tailored journeys (e.g., an onboarding rescue path) while notifying the right owner in CRM and Customer Success. Playbooks should be easy to operationalize, measurable, and reversible if the signals change.

    Fourth, I evaluate platform scalability, data governance, and privacy-by-design. Enterprise readiness means clear role-based access, auditability, robust SLAs, and an architecture that can evolve into a unified analytics platform as the product and data footprint grows. I also weigh total cost of ownership, implementation time, and maintenance burden against expected gains in NRR and expansion.

    In my experience, the winning tools are the ones that make it simple to connect predictions to outcomes: reduce onboarding drop-off, increase user activation, prevent seat contraction, and accelerate expansion. They align Product, Customer Success, and Growth around shared metrics, shorten time-to-value, and make proactive retention part of the operating rhythm—not a last-ditch effort at renewal.

    In this 2026 comparison, I’ll outline how each tool handles data breadth, model quality, explainability, and workflow automation. I’ll also share implementation checklists and decision criteria so you can choose the right fit for your stage, stack, and motion—whether you’re primarily product-led growth, sales-led, or hybrid.

    If you’ve ever felt like customers “quietly leave” despite solid top-of-funnel metrics, this guide will help you turn churn signals into concrete actions—and convert at-risk accounts into durable advocates.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image
  • The Surprising Eval Signal That Tripled Retention: How I Connected AI Evals to Product KPIs

    The Surprising Eval Signal That Tripled Retention: How I Connected AI Evals to Product KPIs

    Our retention curve had flattened even as activation ticked up, and that disconnect told me we were missing a leading indicator buried in our AI agent telemetry. I set out to connect our AI evals directly to product retention, not as an academic exercise, but as the basis for focused roadmap bets and stronger product-led growth.

    "Learn how we used Agent Analytics to discover an eval signal that predicts 3X higher user retention."

    Connecting AI evals to retention analysis is deceptively hard. Evals often live in ad-hoc notebooks while behavioral analytics and cohort retention live elsewhere. IDs drift. Signals are noisy. Teams gravitate to fast output over outcome clarity. I leaned into eval-driven development to close that gap and make our AI workflows accountable to business results.

    We began with crisp hypotheses: for example, that higher semantic accuracy and lower escalation rates would correlate with repeat usage. We enumerated a concise eval taxonomy—accuracy, containment, safety, latency, and UX friction—and used Agent Analytics to compute per-user and per-tenant features on a daily cadence. That gave us a reliable, unified analytics platform for AI-specific signal generation.

    Next, we joined those features to our product telemetry in Amplitude analytics using clean user and account identifiers. With that foundation, we created weekly and monthly cohorts, ran retention analysis, and used driver trees alongside simple logistic models to control for plan type, segment, region, and acquisition channel. The goal wasn’t perfection—it was directional clarity strong enough to inform product strategy.

    One eval metric separated itself from the pack. When users hit a specific threshold early in their journey, the model predicted 3X higher user retention compared to peers who didn’t. I still remember overlaying that signal on our cohort chart—the lift was impossible to unsee, and it immediately reframed our activation and onboarding priorities.

    From there, we operationalized. We built in-app guides that nudged new users toward the eval threshold, added a health score to customer success workflows, and put feature flags on model changes until they improved the eval. We validated the effect size with A/B testing and set up anomaly detection to catch regressions before they touched real users.

    If you want a repeatable playbook: define your north-star retention window, shortlist 3–5 eval candidates tied to real user value, ensure rock-solid identifiers across systems, compute daily features in Agent Analytics, model uplift against retention cohorts in Amplitude analytics, then translate the winning signal into onboarding nudges, product tours, and success playbooks. Track second-order outcomes too—support tickets, NPS, and Net Recurring Revenue (NRR)—so you don’t optimize a proxy at the expense of experience.

    I also learned what to avoid. Watch for sample-size traps and label leakage, and remember that segment mix can masquerade as model improvement. Use minimum detectable effect (MDE) calculations to size experiments, add risk scoring to gate launches, and keep a tight feedback loop between product, data science, and customer success.

    The payoff is far more than a tidy dashboard. By grounding our AI strategy in behavioral analytics and measurable retention lift, we turned an abstract eval into a concrete growth lever—and gave our product teams the confidence to move faster with clarity.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Net Recurring Revenue Mastery: How Elite CS Teams Drive Expansion, Retention, and Growth

    Net Recurring Revenue Mastery: How Elite CS Teams Drive Expansion, Retention, and Growth

    Net Recurring Revenue (NRR) is the clearest signal of whether our product, pricing, and customer success motions are compounding value or quietly leaking it. When I review our dashboard, NRR tells me—in one number—how well we retain, expand, and engage customers. It’s the difference between linear progress and durable, compounding growth.

    At its core, NRR answers a simple question: did revenue from our existing customers grow or shrink this period? The standard way I frame it is: NRR = (Starting MRR + Expansion – Contraction – Churn) / Starting MRR. Expansion reflects upsells, cross-sells, and increased usage; contraction and churn capture downgrades and departures. Great teams don’t just watch this number—they engineer it.

    The teams that consistently outperform treat NRR as an outcome of intentional design across the entire customer journey. They align product-led growth with customer success, weaving onboarding, user activation, in-app guides, and lifecycle messaging into one coherent system. They make adoption the star of the show, not an afterthought tucked beneath quarterly targets.

    To scale that system efficiently, I lean on platforms that streamline in-app guidance and rich behavioral analytics. The promise is crisp and concrete: “Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.” When the experience is instrumented end to end, expansion opportunities show up as patterns, not surprises.

    Retention analysis is where the signal gets sharp. I segment cohorts by plan, size, and use case; map their journey; and run driver trees that connect leading indicators (activation depth, feature breadth, time-to-value) to the lagging outcome (NRR). This turns hunches into hypotheses and gives customer success managers a prioritized playbook, not a long wish list.

    Onboarding is the first and most powerful NRR lever. The faster a customer experiences their first win, the more likely they are to adopt core features, invite teammates, and expand. I use in-app guides, product tours, and contextual tooltips to pave the path to value—always grounded in clear jobs-to-be-done, not generic walkthroughs. The goal is simple: remove friction, celebrate progress, and make the next best action obvious.

    Operating cadence matters as much as tooling. I separate the rhythms: QBRs for strategic alignment and expansion planning; OKRs for cross-functional execution and accountability. QBRs anchor the conversation in outcomes and value realized; OKRs ensure product, marketing, and CS move in lockstep to close the gaps those QBRs reveal.

    Pricing and packaging complete the loop. When the value proposition is clear and plans are aligned to outcomes customers care about, expansion feels natural—more capability for more value. Usage insights guide which features to gate, which to bundle, and where to price to maximize retention while unlocking healthy upsell paths.

    None of this works without tight product–CS collaboration. My teams practice continuous discovery—customer interviews, win/loss insights, and in-product feedback—so we improve the experience where it truly matters. Journey mapping turns those insights into experiments, and experiments turn into polished features once the data speaks.

    I build an NRR driver tree into our weekly reviews. Each branch (activation, adoption, multi-seat expansion, downgrade prevention, reactivation) has a clear owner, a measurable hypothesis, and a time-bound experiment. A/B testing guides what we ship broadly, and we define success upfront to avoid moving goalposts after the fact.

    I’ve seen NRR climb meaningfully in a single quarter when we pair rigorous retention analysis with targeted onboarding improvements and value-based packaging. The lift rarely comes from one big bet; it’s the compounding effect of many small, well-instrumented decisions.

    Here’s the 90-day play I return to: first, baseline NRR by segment and identify the top three drivers of expansion and the top three causes of contraction. Next, streamline onboarding with in-app guides and product tours that accelerate time-to-value and drive user activation. Then, craft expansion plays aligned to real outcomes (additional seats, advanced workflows, new use cases), and operationalize them via QBRs. Finally, preempt downgrades with early-warning alerts, targeted education, and a clear path from “stuck” to “successful.”

    NRR is a team sport. When product, customer success, and go-to-market align around adoption and outcomes, growth compounds, risk declines, and every customer interaction becomes a chance to create more value—today and in every renewal to come.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image