Over the past quarter, I’ve been obsessed with a simple question: how do real people actually prompt AI agents when the stakes are high and the clock is ticking? We analyzed 27K sessions with Amplitude's Global Agent using our Agent Analytics tool. Here's what we found out about how real users are prompting our agent. That single line belies months of careful instrumenting, qualitative review, and product debates—and it forever changed how I design agent experiences.
The clearest pattern I saw: users don’t craft “perfect” prompts—they co-create with the agent. Most sessions began with a broad intent, then tightened through rapid, iterative turns. The winning structure emerged as context, command, and constraints. When our agent acknowledged context first, clarified the command, and reflected constraints back, users responded with noticeably more confidence. It reinforced what great prompt engineering already teaches, but grounded in lived behavior across thousands of journeys.
Trust was the next breakthrough. People wanted transparency on capabilities, a concise first answer, and an easy path to deeper detail and sources. They frequently asked the agent to show its work, summarize trade-offs, or restate assumptions in plain language. Instrumenting observability into the agent’s reasoning artifacts—without overwhelming the user—proved foundational for building credibility session by session.
On task complexity, users fared best when the agent orchestrated a few small, verifiable steps rather than one heroic leap. Retrieval-first pipeline patterns consistently reduced confusion and rework, especially when paired with strong context window management. The more the agent proactively chunked the problem, validated intermediate outputs, and offered next-best actions, the smoother the journey—and the more reusable the prompts became.
UX nudges mattered as much as model quality. Inline examples (“Try this”), one-click refinements (“Shorter,” “Add a table,” “Cite sources”), and lightweight guardrails kept momentum high without boxing users in. When the agent made uncertainty explicit and offered safe fallbacks, abandonment dropped and users explored more ambitiously. The experience felt less like “querying a model” and more like collaborating with a capable teammate.
From a product management lens, these insights shape how I prioritize agentic AI. I’m doubling down on: scaffolded prompts that lead with context and constraints; transparent citations and assumptions; multi-step plans that the user can edit; and evaluation loops that A/B test prompt templates, tool strategies, and response formats. I’m also investing in analytics that connect session patterns to activation, speed-to-value, and retention so we can run eval-driven development, not opinion-driven debates.
If you’re building agents into a core product workflow, start by designing for iterative co-creation, not one-shot brilliance. Offer progressive disclosure, keep the first answer tight, and make verification effortless. Shape the model with retrieval-first strategies, manage your context window like a scarce resource, and treat observability as a feature, not a debug tool. Most of all, let real usage guide your roadmap—these 27K sessions reminded me that the best agent UX is learned alongside our users, not imagined in isolation.
Inspired by this post on Amplitude – Perspectives.
MCP is the acronym I keep hearing in every product conversation—and for good reason. When teams like Miro and Atlassian lean in, it signals a real shift in how we design, ship, and scale value. From my vantage point leading product at HighLevel, I see MCP less as a feature and more as an operating advantage: a way to align strategy, execution, and governance so product teams move faster with higher confidence.
When I evaluate a platform like MCP, I start with three questions. First, does it advance our product strategy and sharpen competitive differentiation? Second, does it strengthen product-led growth by improving activation, onboarding, and retention? Third, does it help us drive outcomes vs output OKRs so we consistently measure what matters, not just what ships?
Execution discipline makes or breaks any MCP investment. I design measurement upfront: instrument A/B testing, define activation milestones, and monitor retention cohorts. In parallel, I use Pendo for in-app guides and product tours to accelerate adoption and reduce time-to-value, then connect this data back to roadmap decisions so each release compounds learning instead of creating noise.
On the operating model, I apply a rigorous build vs buy lens and stress-test platform scalability, reliability, and integration surfaces. Stakeholder management is critical—security, SRE, and solutions engineering must be partners from day one. I anchor teams in product trios and continuous discovery so we learn with customers in the loop, not after the fact.
At Pendomonium 2026, Pendo CPO Rahul Jain brought together four product leaders who are building with MCP. Read or watch their conversation to learn more.
My practical playbook for MCP: choose one high-signal use case, define clear success metrics, and run a tightly scoped pilot with visible executive sponsorship. Treat governance and data hygiene as first-class requirements. Close the loop weekly with qualitative insights from customer interviews and quantitative telemetry from experiments. Only then scale to adjacent workflows, keeping a steady focus on measurable customer value and repeatable delivery.
Whether you’re an emerging startup or an established enterprise, the opportunity is the same: turn MCP curiosity into durable capability. With disciplined measurement, thoughtful stakeholder alignment, and a relentless outcomes mindset, MCP can become a lever for product management leadership—not just another acronym in the stack.
I spend a meaningful portion of my week helping teams operationalize AI workflows, and one theme comes up over and over: how to share context files and skills seamlessly across devices and with colleagues. Hosting Claude Code office hours has only reinforced it—sharing context and skills is the single biggest blocker to reliable, repeatable outcomes.
I hear from leaders driving AI adoption who have built robust, high-signal context systems and carefully crafted skills. Their challenge isn’t creating value—it’s distributing it. They need a way to make the same trusted workflows available to teammates and to keep everything in sync across laptops, desktops, and phones.
I hit the same wall myself. I work across multiple devices (a Mac Mini for day-to-day, a MacBook Air on the road, and an iPhone) and I collaborate with a full-time admin. I wanted my context and skills to be consistent everywhere, for both of us. In this piece, I’ll share my setup—what I store where, how I share it across devices and with my team, the trade-offs of each option, and how I keep everything current. We’ll cover four different syncing services: git/GitHub, Obsidian Sync, Dropbox and iCloud.
If you’re new to this series, this is the eighth installment. Earlier pieces provide foundational context: Claude Code: What It Is, How It's Different, and Why Non-Technical People Should Use It; Stop Repeating Yourself: Give Claude Code a Memory; How to Use Claude Code Safely: A Non-Technical Guide to Managing Risk; How to Choose Which Tasks to Automate with AI (+50 Real Examples); How to Build AI Workflows with Claude Code (Even If You're Not Technical); How to Use Claude Code: A Guide to Slash Commands, Agents, Skills, and Plug-ins; and Context Rot: Why AI Gets Worse the Longer You Chat (And How to Fix It).
The day it really hit me was right before my interview with Claire Vo on How I AI. I was staying in an AirBnB with only my laptop, and I planned to demo my /today command along with my context file structure. Minutes before the session, I realized the latest version of my /today command wasn’t on that machine. I was able to remote into my Mac Mini and grab it—crisis averted—but it was a wake-up call. I needed a more reliable, shareable approach for syncing context and skills across devices and with my admin.
I started by testing the tools I already used—Dropbox, iCloud, and GitHub—to see what might fit. Each got me partway there, but each also introduced friction that mattered in daily use.
First, absolute file paths don’t travel well. I began with Dropbox but quickly ran into cross-linking headaches. Good context systems rely on rich interlinking—index files point to other context files, and those context files link to each other. When Claude creates a link from one context file to another, it tends to use the full file path: /Users/ttorres/Library/CloudStorage/Dropbox. That worked on my Mac Mini and MacBook (same user name), but not on my phone—and not for my admin. I tried to force relative links (~/Dropbox), but couldn’t get Claude to do it consistently, which led to broken links. This isn’t unique to Dropbox; Claude prefers full paths because they’re reliable on a single machine, but they’re brittle across devices and useless when sharing with colleagues. Claude is trained to use relative file paths when working within a git repository, but I struggled to get it to work reliably in Dropbox.
Second, skills live in a user directory by default. By default, skills live in ~/.claude/skills. Most sync services aren’t designed to share your ~/ folder. iCloud is the exception, but then you’re limited to Apple devices—no Windows or Android. There is a workaround: set up a claude folder in Dropbox and create a symlink from ~/.claude to your synced claude folder, so all skills, commands, and settings live in Dropbox. Then, on each device (yours or a colleague’s), you set up a symlink to that folder so Claude can find the files. This works, but I was running into another limitation that made Dropbox a poor fit.
Third, Obsidian on iOS doesn’t sync cleanly with Dropbox. I rely on Obsidian’s file browser alongside my notes to navigate context quickly. Storing vaults in Dropbox gave me parity across my Mac Mini and MacBook Air, but I couldn’t get the iOS Obsidian app to reliably load my Dropbox vaults. That friction was a dealbreaker for on-the-go work.
At that point, I explored git/GitHub. GitHub is cloud storage for git repositories. A git repository is a folder of shared files used so engineers can collaborate on the same code base. Each person clones a local copy, works locally, then pushes changes back to the hosted repo on GitHub; others pull to update. Git’s merge and conflict tooling is excellent. Git is the powerhouse of file syncing and version control. It easily handles syncing context and skills, Claude behaves better with relative links in a git repo, and I can open the repo in my IDE with a clean file browser. For me, that checked all the boxes—until I factored in my admin. Git has a learning curve, requires manual pull/push hygiene, and often assumes an IDE workflow. That overhead was too heavy for a non-technical collaborator.
The turning point was Obsidian Sync. A colleague suggested it, and it ended up being the sweet spot. Obsidian is a markdown reader; files are stored locally in a normal folder you can open in Finder or File Explorer. There’s no proprietary format—you can read files with any text editor, and Claude can access them via bash commands. Obsidian Sync is simpler than git: open a note and it syncs in the background. I can access the same vaults across my Mac Mini, MacBook Air, and iPhone, and I can share a vault with my admin so we can both create and access notes.
Because we’re in different time zones and rarely edit the same note simultaneously, limited conflict handling hasn’t been an issue. Obsidian’s internal link notation also means one note can link to another and those links just work across devices. Claude can follow these links, so the brittle file path problem disappears.
Here’s where I landed. After a lot of trial and error, I have a setup that works across my devices and for my admin, who uses both a Windows desktop and a Mac laptop. I keep my core context in Obsidian vaults synced with Obsidian Sync, which preserves portability, link integrity, and ease of use. For skills, I avoid scattering files in machine-specific locations and instead centralize what Claude needs to reference in shared, human-readable folders. If you require advanced version control with branching and reviews, git/GitHub is excellent. If your priority is low-friction, cross-device access for non-technical teammates, Obsidian Sync is a practical, reliable choice. And if you must use Dropbox or iCloud, consider symlinks and be vigilant about relative paths—just know that absolute paths won’t travel well.
Lately, I keep hearing a familiar question: with AI making it so easy to generate ideas and build products, do we still need product managers? My answer is unequivocal—yes. Tools accelerate delivery, but they don’t build trust, reconcile competing incentives, or create the shared understanding teams need to ship outcomes. Product work is relationship work.
I recently listened to “Product Work Is Relationship Work – All Things Product with Teresa & Petra,” and it echoed what I see every day in high-performing product organizations. If you prefer to watch, here’s the episode on YouTube: https://www.youtube.com/embed/d-0f8uAfc8w?feature=oembed
Listen to this episode on: Spotify | Apple Podcasts
While AI can help build things faster, it can’t replace the relationship work required to align stakeholders, navigate competing priorities, and create shared understanding across teams. That’s the hard, human part of product management—and it’s not going away.
In my experience, product teams stall when collaboration becomes transactional. We jump to negotiation (“What can you commit by Friday?”) before establishing context (“What problem are we solving and why now?”). When I slow down to get curious—about constraints, incentives, and assumptions—momentum actually increases because we’re rowing in the same direction.
Stakeholder alignment often breaks down when we conflate advocacy with exploration. We argue our viewpoint as if it were the only lens that matters, rather than making space to surface how others see the system. I’ve found the distinction between “dialogue vs. discussion,” rooted in work by Chris Argyris and elaborated in The Fifth Discipline by Peter Senge, to be a powerful reset. Dialogue builds shared understanding; discussion decides. You need both, in the right order.
Language matters in the room. The improv principle “Yes, and” is deceptively simple but transformative. When a designer, engineer, or executive feels heard (“Yes”) and we build on their idea (“and”), we create psychological safety without sacrificing critical thinking. I use “Yes, and” to explore perspectives before we converge on decisions—especially with product trios and senior stakeholders.
Here are the moves I rely on to keep collaboration relational and outcomes-focused. First, we align on outcomes before solutions. I explicitly separate outcomes vs output OKRs so we’re clear on what success looks like, independent of the features we ship. That clarity reduces rework and speeds up decision-making later.
Second, we operationalize curiosity with continuous discovery. I schedule recurring, lightweight touchpoints with customers and internal stakeholders so insights compound. When learning is continuous, debates quiet down—evidence does the heavy lifting.
Third, we invest in relationship rituals. Regular 1:1s with key partners, stakeholder maps that capture motivations, and pre-reads that frame trade-offs all prevent misalignment from surfacing in the last mile. These small habits pay huge dividends in trust and speed.
Fourth, I’m explicit about mode-switching in meetings: are we advocating a position or exploring perspectives? Calling the mode out loud prevents people from mistaking questions for opposition and keeps the conversation productive.
Fifth, we use “Yes, and” to move from possibility to practicality. We explore generously, then converge rigorously—ranking options by impact, effort, and risk so decisions are transparent and fair.
If stakeholder alignment, team dynamics, or product “politics” slow your team down, this conversation offers a practical reframe. You’ll move faster when you build the relational tissue first—because alignment is an accelerant, not a tax.
Resources & Links:
Follow Teresa Torres: https://ProductTalk.org
Follow Petra Wille: https://Petra-Wille.com
Mentioned in this episode:
Petra’s Coaching Packages
Work by Chris Argyris on organizational learning and dialogue vs. discussion
The Fifth Discipline: The Art and Practice of the Learning Organization by Peter Senge
Improv principle “Yes, and”: Saying “Yes, and” — A principle for improv, business & life and Yes, and …
Have thoughts on this episode or examples from your team? Leave a comment below—I’d love to learn what’s working (and what’s not) in your stakeholder landscape.
The best signal often comes from the least scalable work.
I’ve learned this the hard way—and the rewarding way. When I’m closest to customers, rolling up my sleeves with the team, I uncover nuanced, high-signal insights that no dashboard or aggregate report can reveal. Those insights, when treated with rigor and discipline, become the backbone of a durable product strategy and true product management leadership.
At Intercom, that is at the heart of how we operate on “swarms.” Swarms are cross-functional teams of Fin experts focused on ensuring customers succeed when trialing Fin. Each team consists of engineers, data scientists, and a product manager, all focused on optimizing Fin for our customers.
Working in these teams gives us deep insights into the needs of individual customers, but they can also form the foundation of new Fin features. Let me explain.
I frame the journey from insight to impact in three levels: “Level 1: Swarms – where the signal comes from,” “Level 2: Cockpit – where the signal starts to scale,” and “Level 3: Product – where the signal reaches maximum leverage.” This model blends continuous discovery with pragmatic solutions engineering and creates a clear path from hands-on learning to product-led growth.
Level 1: Swarms – where the signal comes from. The goal is simple: help Fin resolve more conversations and help customers understand and use the product. Swarms partner with customers to define their goals and how Fin fits into their workflows. We map out an automation roadmap by analyzing their conversations, determining the APIs and Procedures they need, and the level of automation they can achieve. We then support them in implementing it and reaching that outcome. This involves ongoing analysis to identify optimizations to their configuration and the next best actions for increasing automation levels, such as improving knowledge base content or deploying new APIs.
During a swarm, the feedback loop is fast. We test something, ship something, and quickly see whether the metric moves. That speed and depth is what makes swarms so valuable. It’s also what makes them hard to scale. I’ve felt the thrill of watching a key metric bend within hours—and the constraint of knowing that kind of attention doesn’t scale to every account.
For example, we developed an automation taxonomy to predict the level of automation a customer can achieve. Initially, this analysis was manual and took more than half a day to run, with time required to prep and visualize the data. But the effort was worthwhile. For one customer, we predicted an automation rate of 70% and they achieved exactly that.
By working closely with customers, we learn what drives success, but this work is inherently hands-on and doesn’t scale on its own. So the real challenge is figuring out how to turn what we learn in those high-touch engagements into systems, tools, and product changes that benefit far more customers. That’s the inflection point where AI workflows and product strategy meet.
Level 2: Cockpit – where the signal starts to scale. Not every customer should need swarm-level attention. The way we bridge that gap is by making the swarm analyses repeatable and shareable. Once we can run the same analysis across customers, we can start turning bespoke swarm learnings into reusable signals. This is where Cockpit comes in.
Transform customer signals into action: this dashboard tracks support conversation volume, taxonomy percentages by type, and topic demand across account settings, billing, integration, and more to guide scalable feature bets.
We take patterns learned in swarms and encode them into internal tooling inside our insights web app, Cockpit. Instead of analysis being a bespoke project, it becomes a workflow. For example, we scaled the automation taxonomy and this has enabled us to quickly understand automation potential for all customers.
Now, a customer success manager (CSM) can pick a customer, see their automation potential and current performance, understand the biggest issues, and propose next actions. This is how we scale the impact of swarm learnings through CSMs and Sales. It allows far more customers to benefit from the same patterns we see in high-touch work, without requiring direct data science involvement every time.
Cockpit also functions as a valuable proving ground. It gives us a way to test ideas across a much broader set of customers and see what generalizes before we consider taking anything further. In other words, we transform sharp, local signal into broadly useful guidance—an essential step in any AI Strategy that aims to balance precision with scale.
Level 3: Product – where the signal reaches maximum leverage. The real payoff comes when the patterns we have validated internally become part of the product itself. Instead of helping one customer directly, or helping many customers through internal teams, we deliver a feature directly to customers so they can improve Fin’s performance on their own. Today, the automation taxonomy is a part of Insights and accessible to customers who have this feature.
Another example is CX Score. It started with close work alongside Intercom’s Customer Support team to understand performance with Fin, initially through predicted CSAT and resolution. Over time, this work evolved into CX Score: a scalable way to measure conversation quality across all customers.
The product stage is fundamentally different from Cockpit because of the constraints. Cockpit provides a platform for our customer analyses/tools but it doesn’t need to scale as far as product. What moves into product has to work for every customer, without configuration, at scale, so it has to generalize. That bar is what protects long-term quality while unlocking product-led growth.
That’s why the move from Cockpit to product isn’t automatic. We’re not just asking whether something is useful, but whether it’s broadly useful, robust, and scalable enough to run across the entire customer base. As a product leader, I push for this discipline because it’s where customer success, engineering excellence, and business outcomes converge.
The loop. The model is simple. Swarms generate the best signal, grounded in real customer problems. Cockpit operationalizes that signal so CSMs and Sales can use it across many customers. Product takes the patterns that truly generalize and turn them into scalable features that enhance every customer’s experience.
This loop allows a small swarm data science function to have impact beyond a small set of high-touch accounts, resulting in a stream of continuous improvements across all three levels and an ever-increasing level of automation for our customers. Practically, it’s a repeatable playbook for product management leadership: start with high-signal discovery, prove repeatability, and only then scale through product. Done well, it compounds learning, accelerates time-to-value, and aligns the entire organization around measurable outcomes.
I set out to solve a deceptively simple problem: help our teams ask product questions in plain English and get trustworthy, analysis-grade answers—fast. That required more than a powerful model; it demanded agents that genuinely understand the language of product analytics, from behavioral analytics nuances to the messy reality of event taxonomies, funnels, and cohorts. In this post, I share how we engineered agentic AI that speaks our domain fluently and turns questions into decisions.
The core challenge wasn’t data volume or dashboard sprawl; it was semantics. Different teams said “activation,” “onboarding,” or “first value” and meant overlapping but distinct things. Our PMs, analysts, and engineers navigated a maze of synonyms across Amplitude analytics, Pendo, and our unified analytics platform. Generic LLMs stumbled on these nuances, so we built a shared ontology—driver trees anchored to a clear North Star—with canonical definitions for activation, retention, and conversion, plus consistent event naming and cohort logic.
We started with a rigorous metric catalog: every KPI linked to its drivers, exact formulas, cohorts, and time windows; every event mapped to a product taxonomy; every dashboard and SQL snippet versioned with ownership and lineage. That catalog became the ground truth for agents. We embedded data governance and privacy-by-design from the start—permissioning for fields and queries, PII redaction, and scoped access that reflected how product teams actually work.
Next, we built a retrieval-first pipeline to ground the agents in our corpus before generation. We indexed metric definitions, dashboards, experiment readouts, runbooks, and high-signal Slack threads so the agent could cite relevant artifacts, not just predict plausible text. With careful context window management and prompt engineering, the agent retrieves definitions and prior analyses, then plans multi-step actions: run a query, compare cohorts, check “minimum detectable effect (MDE)” for an A/B test, and summarize findings with references.
Architecturally, we treated this as “Agent Analytics”: an orchestrator that selects tools based on intent—querying Amplitude analytics or Pendo for behavioral paths and funnels, hitting our warehouse for cohort tables, or pulling experiment metadata and anomaly detection alerts. Tool use is permission-aware, auditable, and designed to fail safe. The agent’s outputs include citations back to the exact definitions, dashboards, and SQL used, so reviewers can validate and iterate.
Quality came from eval-driven development, not intuition. We built a gold set of representative product questions (activation inflections, retention analysis by segment, funnel drop-offs after feature launches) and scored the agent on faithfulness to definitions, numerical accuracy, latency, and actionability. We incorporated regression checks to catch drifts after schema changes, and we tuned prompts to reduce overconfident answers and push for clarifying questions when context was missing.
Safety and reliability were non-negotiable. We layered AI risk management with role-based access, guardrails that block destructive queries, and risk scoring for unfamiliar joins or sudden spikes in metric deltas. The agent logs every step—what it retrieved, which tools it called, and why—so analysts can replay and refine the chain of thought with transparent provenance.
The payoff: product teams now self-serve nuanced questions in minutes instead of days, and our analysts spend more time on discovery than report wrangling. Retention analysis improved as the agent standardized cohort logic; conversion investigations accelerated thanks to consistent funnel definitions; and cross-functional decisions aligned around the same driver trees and shared language. Most importantly, the agent turned ambiguous asks into structured analyses that stand up to scrutiny.
For fellow product leaders, my lesson is simple: start with semantics, not models. A crisp ontology, disciplined taxonomy, and clear ownership will outperform a flashy stack riddled with ambiguity. Avoid technology FOMO; favor retrieval-first grounding, small sharp tools, and continuous discovery with your product trios. When your organization speaks a common analytics language, agents can finally think with you, not just for you.
Next, we’re extending the agent’s planning skills to recommend experiment designs, estimate power and “minimum detectable effect (MDE),” and propose driver-tree-informed bet sizing. We’re also tightening feedback loops so every accepted answer, edit, or override strengthens the retrieval corpus and evaluations. The vision: a calm, reliable layer that makes rigorous product analytics feel conversational—and helps teams move from questions to confident action.
Inspired by this post on Amplitude – Best Practices.
Net Recurring Revenue (NRR) is the clearest signal of whether our product, pricing, and customer success motions are compounding value or quietly leaking it. When I review our dashboard, NRR tells me—in one number—how well we retain, expand, and engage customers. It’s the difference between linear progress and durable, compounding growth.
At its core, NRR answers a simple question: did revenue from our existing customers grow or shrink this period? The standard way I frame it is: NRR = (Starting MRR + Expansion – Contraction – Churn) / Starting MRR. Expansion reflects upsells, cross-sells, and increased usage; contraction and churn capture downgrades and departures. Great teams don’t just watch this number—they engineer it.
The teams that consistently outperform treat NRR as an outcome of intentional design across the entire customer journey. They align product-led growth with customer success, weaving onboarding, user activation, in-app guides, and lifecycle messaging into one coherent system. They make adoption the star of the show, not an afterthought tucked beneath quarterly targets.
To scale that system efficiently, I lean on platforms that streamline in-app guidance and rich behavioral analytics. The promise is crisp and concrete: “Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.” When the experience is instrumented end to end, expansion opportunities show up as patterns, not surprises.
Retention analysis is where the signal gets sharp. I segment cohorts by plan, size, and use case; map their journey; and run driver trees that connect leading indicators (activation depth, feature breadth, time-to-value) to the lagging outcome (NRR). This turns hunches into hypotheses and gives customer success managers a prioritized playbook, not a long wish list.
Onboarding is the first and most powerful NRR lever. The faster a customer experiences their first win, the more likely they are to adopt core features, invite teammates, and expand. I use in-app guides, product tours, and contextual tooltips to pave the path to value—always grounded in clear jobs-to-be-done, not generic walkthroughs. The goal is simple: remove friction, celebrate progress, and make the next best action obvious.
Operating cadence matters as much as tooling. I separate the rhythms: QBRs for strategic alignment and expansion planning; OKRs for cross-functional execution and accountability. QBRs anchor the conversation in outcomes and value realized; OKRs ensure product, marketing, and CS move in lockstep to close the gaps those QBRs reveal.
Pricing and packaging complete the loop. When the value proposition is clear and plans are aligned to outcomes customers care about, expansion feels natural—more capability for more value. Usage insights guide which features to gate, which to bundle, and where to price to maximize retention while unlocking healthy upsell paths.
None of this works without tight product–CS collaboration. My teams practice continuous discovery—customer interviews, win/loss insights, and in-product feedback—so we improve the experience where it truly matters. Journey mapping turns those insights into experiments, and experiments turn into polished features once the data speaks.
I build an NRR driver tree into our weekly reviews. Each branch (activation, adoption, multi-seat expansion, downgrade prevention, reactivation) has a clear owner, a measurable hypothesis, and a time-bound experiment. A/B testing guides what we ship broadly, and we define success upfront to avoid moving goalposts after the fact.
I’ve seen NRR climb meaningfully in a single quarter when we pair rigorous retention analysis with targeted onboarding improvements and value-based packaging. The lift rarely comes from one big bet; it’s the compounding effect of many small, well-instrumented decisions.
Here’s the 90-day play I return to: first, baseline NRR by segment and identify the top three drivers of expansion and the top three causes of contraction. Next, streamline onboarding with in-app guides and product tours that accelerate time-to-value and drive user activation. Then, craft expansion plays aligned to real outcomes (additional seats, advanced workflows, new use cases), and operationalize them via QBRs. Finally, preempt downgrades with early-warning alerts, targeted education, and a clear path from “stuck” to “successful.”
NRR is a team sport. When product, customer success, and go-to-market align around adoption and outcomes, growth compounds, risk declines, and every customer interaction becomes a chance to create more value—today and in every renewal to come.
I set out to solve a deceptively simple problem: help our teams ask product questions in plain English and get trustworthy, analysis-grade answers—fast. That required more than a powerful model; it demanded agents that genuinely understand the language of product analytics, from behavioral analytics nuances to the messy reality of event taxonomies, funnels, and cohorts. In this post, I share how we engineered agentic AI that speaks our domain fluently and turns questions into decisions.
The core challenge wasn’t data volume or dashboard sprawl; it was semantics. Different teams said “activation,” “onboarding,” or “first value” and meant overlapping but distinct things. Our PMs, analysts, and engineers navigated a maze of synonyms across Amplitude analytics, Pendo, and our unified analytics platform. Generic LLMs stumbled on these nuances, so we built a shared ontology—driver trees anchored to a clear North Star—with canonical definitions for activation, retention, and conversion, plus consistent event naming and cohort logic.
We started with a rigorous metric catalog: every KPI linked to its drivers, exact formulas, cohorts, and time windows; every event mapped to a product taxonomy; every dashboard and SQL snippet versioned with ownership and lineage. That catalog became the ground truth for agents. We embedded data governance and privacy-by-design from the start—permissioning for fields and queries, PII redaction, and scoped access that reflected how product teams actually work.
Next, we built a retrieval-first pipeline to ground the agents in our corpus before generation. We indexed metric definitions, dashboards, experiment readouts, runbooks, and high-signal Slack threads so the agent could cite relevant artifacts, not just predict plausible text. With careful context window management and prompt engineering, the agent retrieves definitions and prior analyses, then plans multi-step actions: run a query, compare cohorts, check “minimum detectable effect (MDE)” for an A/B test, and summarize findings with references.
Architecturally, we treated this as “Agent Analytics”: an orchestrator that selects tools based on intent—querying Amplitude analytics or Pendo for behavioral paths and funnels, hitting our warehouse for cohort tables, or pulling experiment metadata and anomaly detection alerts. Tool use is permission-aware, auditable, and designed to fail safe. The agent’s outputs include citations back to the exact definitions, dashboards, and SQL used, so reviewers can validate and iterate.
Quality came from eval-driven development, not intuition. We built a gold set of representative product questions (activation inflections, retention analysis by segment, funnel drop-offs after feature launches) and scored the agent on faithfulness to definitions, numerical accuracy, latency, and actionability. We incorporated regression checks to catch drifts after schema changes, and we tuned prompts to reduce overconfident answers and push for clarifying questions when context was missing.
Safety and reliability were non-negotiable. We layered AI risk management with role-based access, guardrails that block destructive queries, and risk scoring for unfamiliar joins or sudden spikes in metric deltas. The agent logs every step—what it retrieved, which tools it called, and why—so analysts can replay and refine the chain of thought with transparent provenance.
The payoff: product teams now self-serve nuanced questions in minutes instead of days, and our analysts spend more time on discovery than report wrangling. Retention analysis improved as the agent standardized cohort logic; conversion investigations accelerated thanks to consistent funnel definitions; and cross-functional decisions aligned around the same driver trees and shared language. Most importantly, the agent turned ambiguous asks into structured analyses that stand up to scrutiny.
For fellow product leaders, my lesson is simple: start with semantics, not models. A crisp ontology, disciplined taxonomy, and clear ownership will outperform a flashy stack riddled with ambiguity. Avoid technology FOMO; favor retrieval-first grounding, small sharp tools, and continuous discovery with your product trios. When your organization speaks a common analytics language, agents can finally think with you, not just for you.
Next, we’re extending the agent’s planning skills to recommend experiment designs, estimate power and “minimum detectable effect (MDE),” and propose driver-tree-informed bet sizing. We’re also tightening feedback loops so every accepted answer, edit, or override strengthens the retrieval corpus and evaluations. The vision: a calm, reliable layer that makes rigorous product analytics feel conversational—and helps teams move from questions to confident action.
Inspired by this post on Amplitude – Best Practices.
Internal Products Are Hard; Commercial Products Are Harder. That line captures years of hard-won lessons from leading both internal platforms and market-facing SaaS at HighLevel. I’ve seen how the two demand different muscles—even when the tech stack, talent, and timelines look the same on paper.
When I talk about internal products, I mean services and solutions that our own employees use to take care of customers—customer-enabling tools and services, agent consoles, fulfillment and billing workflows, operations dashboards, and the underlying platforms that keep them fast, compliant, and resilient. These tools don’t generate revenue directly, but they quietly determine customer experience, gross margin, and how quickly we can ship, resolve issues, and scale.
Commercial products, by contrast, add a second challenge layer. Beyond discovery, usability, and reliability, we must conquer positioning, pricing and packaging, competitive differentiation, sales enablement, procurement hurdles, and ongoing customer success motion. The surface area for failure is bigger, and the time-to-signal on product-market fit is slower and noisier.
Here’s how I decide where to invest. First, I anchor on outcomes, not output. If the business priority is net revenue retention, faster onboarding, or reduced cost-to-serve, internal products often provide the highest-leverage path. If the priority is new revenue, new market entry, or a must-have differentiator, we lean commercial. I make the trade explicit in outcomes vs output OKRs so we can defend the decision when pressure mounts.
Second, I run a clear build vs buy calculus. For internal needs, the default is buy if a mature, configurable solution exists that meets our security, data governance, and integration requirements. I only build when the workflow is core to our differentiation, the TCO of customization is lower than vendor sprawl, or we can capture unique proprietary advantage. For commercial products, I avoid embedding third-party IP in a way that caps differentiation or compresses margins as we scale.
Third, I insist on continuous discovery. Internal audiences are not a captive market—they’re discerning experts with real jobs to do. I treat them like customers, with structured customer interviews, journey mapping, and opportunity solution trees. I rely on empowered product teams and product trios to validate problems and reduce solution risk before we commit engineering time.
Fourth, I frame commercial vs internal work with capacity guardrails. In most planning cycles, I reserve explicit allocation for platform scalability and internal tooling, separate from feature bets. Without this, internal products become backlog filler, which guarantees we’ll pay the interest later in churn, SLA breaches, and slower delivery.
Execution differs too. For internal products, change management is the make-or-break. I plan enablement as a first-class deliverable: clear rollouts, in-app guides, training, and feedback loops with frontline champions. I track adoption, time-to-resolution, error rate, and satisfaction for internal users with the same rigor we apply to external users.
For commercial products, I design the discovery-to-GTM handshake early. Pricing and packaging must reflect value drivers discovered in research, not what’s easiest to meter. Sales and solutions engineering need crisp narratives, objection handling, and proof points. Customer success needs activation plans and health signals tied directly to leading indicators of retention.
Across both, I instrument the product and process. I lean on feature flags and progressive delivery to manage risk, and I protect SLOs with error budgets so teams balance reliability with iteration speed. CI/CD isn’t a badge—it’s how we earn the right to ship continuously without eroding trust.
Common pitfalls recur. Teams skip UX for employee tools because “they have to use it”—which backfires as shadow workflows and rework. Leaders underfund internal platforms, then wonder why velocity stalls. On the commercial side, teams over-index on features and under-invest in positioning and onboarding, leading to poor activation and elongated sales cycles.
What’s the payoff? When we treat internal products as products, we unlock scale: shorter handling times, fewer escalations, clearer accountability, and higher customer satisfaction. When we approach commercial products with the same discovery rigor plus smart GTM, we compress time-to-value and amplify differentiation. The craft is knowing which lever to pull when—and having the discipline to measure what matters.
My rule of thumb is simple. If the goal is operational excellence that compounds across the entire customer journey, invest in internal products with the same intensity you reserve for revenue-generating features. If the goal is market expansion or category leadership, invest in commercial products with a tight discovery-to-GTM loop. In either case, clarity of outcomes, disciplined discovery, and empowered teams win the day.
Every planning cycle, I feel the drumbeat: “Show me the AI ROI—this quarter.” The pressure is real, especially when boards and CFOs expect immediate payback. Yet when I review stalled initiatives across teams and peers, the pattern is consistent: most companies treat AI like a feature to ship, not a system to manage. That mindset almost guarantees we measure the wrong things, declare victory (or failure) too early, and miss the durable value AI can create.
Here’s the core problem I see: we leap to solution and skip the counterfactual. Without a baseline, a clear control, or a defined “what would have happened otherwise,” we’re guessing. We also fixate on lagging, financial KPIs that move slowly (revenue, cost, risk), then use outputs—not outcomes—as OKRs. If we don’t align on outcomes vs output OKRs upfront, the best team in the world can still optimize for activity over impact.
My AI Strategy starts from a simple truth: value shows up along three vectors—revenue, cost, and risk—on different timelines. In the near term, we must validate leading indicators (adoption, engagement, activation) that ladder to those vectors through a transparent driver tree. Over time, those drivers compound into the lagging KPIs finance cares about. When we make the driver tree explicit, everyone can see how model precision, response time, and workflow integration roll up to conversion lift, case deflection, time-to-resolution, or reduced exposure.
To make this rigorous, I run a five-step playbook. First, define the decision and business outcome in plain terms. Second, instrument the baseline with behavioral analytics on a unified analytics platform—tools like Amplitude analytics or Pendo help expose friction points we’ll later target. Third, create a counterfactual using A/B testing and specify a minimum detectable effect (MDE) so we know how long to run and how much traffic we need. Fourth, quantify costs (training, inference, integration, change management) and include AI risk management, privacy-by-design, and data governance up front. Fifth, lock a measurement plan that connects leading indicators to lagging ROI through the driver tree.
Most AI initiatives don’t fail on model quality—they fail on adoption. If the workflow isn’t smoother, trust isn’t earned, or value isn’t obvious, users revert. That’s why I invest early in onboarding, in-app guides, product tours, and thoughtful tooltip design to reduce the time-to-first-value. Then I watch user activation, retention analysis, and task completion to ensure the assistive experience is not just novel—it’s habit-forming.
For generative use cases, eval-driven development is non-negotiable. I maintain offline evaluations for accuracy and safety, and online evaluations for business impact. Retrieval-first pipeline health, context window management, and prompt engineering affect reliability; so do latency and grounding quality. We ship behind feature flags, measure guardrail effectiveness, and tighten feedback loops from human-in-the-loop reviews into model updates—continuously.
On the business side, I avoid “AI theater” by structuring benefits like a CFO. Revenue: increased conversion or expansion driven by better recommendations, faster sales cycles, or higher trial activation. Cost: case deflection, agent time saved, fewer escalations, and lower rework. Risk: reduced exposure via automated checks, anomaly detection, and consistent policy application. If any claim can’t be tied to measured deltas—via A/B testing or strong quasi-experiments—it doesn’t go in the deck.
Build vs buy deserves the same discipline. I map platform scalability, governance requirements, and total cost of ownership against time-to-impact. Teams often underestimate integration and maintenance drag; a pragmatic mix of bought components with thin custom layers can accelerate outcomes while keeping options open. The goal isn’t to own every layer—it’s to own the learning loop and the differentiated experience.
I also remind teams that tooling should serve the strategy, not replace it. I’ve seen concise, effective messaging that captures the point: “Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.” The words are compelling because they reflect the three-vector value model and the adoption imperative. The same standard should apply to any AI initiative we propose.
If you’re under pressure to prove ROI, shift the conversation: lead with the driver tree, specify your counterfactual, and anchor on leading indicators you can move in weeks—not quarters. Then connect those to the lagging KPIs finance expects over time. When we manage AI like a product—grounded in evidence, experimentation, and user-centered adoption—we don’t have to force ROI. We compound it.
Product roadmaps should not be promises etched in stone; they are portfolios of bets made under uncertainty. When I build a roadmap, I’m not predicting the future—I’m designing a system that helps the team learn faster than the market changes, allocate capital wisely, and create alignment across engineering, design, go-to-market, and leadership.
The best roadmaps I’ve seen and shipped anchor on outcomes rather than features. “Outcomes vs output OKRs” is more than a slogan; it’s how we translate strategy into measurable impact. I start by defining a small set of outcome metrics that matter—such as activation rate, time-to-first-value, or expansion revenue—and attach clear key results and guardrails to each theme. This reframes prioritization from “what can we build?” to “what must change in customer behavior?” and gives empowered product teams real autonomy.
I organize the roadmap into time horizons—Now, Next, Later—with explicit confidence levels. Near-term items have higher confidence and more specificity; mid- and long-term bets are thematic with wider time windows. This approach reduces false precision and builds trust because stakeholders can see both the intent and the uncertainty. When dates matter, I use windows and service level expectations rather than single deadlines, and I pair each initiative with a lightweight risk scoring so we can discuss uncertainty explicitly rather than implicitly.
Continuous discovery keeps the roadmap honest. I partner in tight “product trios” across product, design, and engineering to run rapid customer interviews, opportunity sizing, and assumption tests before we commit significant delivery capacity. The opportunity solution tree is my favorite artifact here; it visualizes the path from outcomes to opportunities to experiments and solutions, making trade-offs and sequencing transparent. By the time something moves into sprint planning, we’ve already reduced key uncertainties and clarified the narrowest viable slice we can ship.
Uncertainty demands options. I plan initiatives as options with stage gates and explicit kill criteria rather than as single monolithic projects. For every significant theme, I outline base, best, and worst-case scenarios with pre-decided triggers for when we escalate, pivot, or stop. This practice prevents sunk-cost fallacy and keeps the team focused on evidence. We treat scope as a knob, not a switch, and we bias toward small, sequential bets that compound learning.
Capacity is strategy. I routinely reserve a discovery buffer—typically 10–20%—and a contingency buffer for integration, security, and performance risks that always show up late. I ruthlessly control work-in-progress to limit thrash and protect the team’s ability to respond when new information arrives. When we must navigate dependencies, I use thin vertical slices and decouple via contracts or feature flags so discovery momentum doesn’t stall while platforms evolve underneath.
Prioritization under uncertainty benefits from explicit models. I combine value, effort, and confidence with risk scoring to surface where the unknowns are hiding. Driver trees help us connect top-level outcomes to leading indicators, so we can place bets where they have the highest causal leverage. I also lean on the Kano Model and qualitative signals to avoid over-investing in performance attributes while neglecting excitement features that unlock differentiation and word-of-mouth.
The most effective stakeholder management is narrative-first. For executives, I present a one-page outcomes roadmap that shows themes, expected shifts in key results, and the learning plan. For teams, I provide a more detailed plan that links discovery insights, assumptions-to-test, and decision points. I make room for a “what we’re not doing” section to reduce noise and prevent shadow backlogs from reappearing in every meeting. Most importantly, I socialize change before it happens, explaining the evidence and the trade-offs so adjustments feel like progress, not whiplash.
Measurement closes the loop. We instrument experiments and releases with leading indicators tied to the driver tree and review them on a predictable cadence. If movement stalls, we diagnose whether we have a targeting problem (wrong audience), a value problem (weak proposition), or a friction problem (broken journey). That discipline lets us iterate with purpose instead of chasing vanity metrics or isolated anecdotes.
Here’s a concrete example of roadmapping through uncertainty. Suppose our Q3 objective is to “Increase user activation” with key results to raise the Week-1 activation rate from 32% to 45% and cut time-to-first-value by 30%. In discovery, customer interviews reveal confusion in the first-run setup and a missing integration that advanced users expect. We map an opportunity solution tree and identify two high-leverage opportunities: simplifying the first 10 minutes and offering a guided setup for the integration. We then shape two minimal bets: an in-app guide to streamline the first three tasks and an integration wizard behind a feature flag. Each bet has an explicit decision rule and a two-sprint runway. We ship the guide first, confirm a statistically significant lift via A/B testing, then expand scope. The integration wizard underperforms initial expectations, so we pause, revisit the assumptions, and re-allocate buffer to the stronger path. The roadmap updates in real time, and everyone understands why.
When uncertainty spikes—new competitor, pricing shock, platform deprecation—I shift the roadmap cadence to rolling-wave planning. We shorten planning horizons, increase the frequency of readouts, and elevate discovery allocations temporarily. We also create thematic “containment zones” where we explore multiple options in parallel with small budgets until one path justifies scale. This allows us to stay responsive without abandoning strategy.
Good governance accelerates, it doesn’t slow. A lightweight product council that reviews outcomes, risks, and cross-functional dependencies prevents surprise escalations and ensures we keep shipping what matters. We avoid death-by-approval by agreeing in advance on decision rights and thresholds—for example, a product trio can pivot a bet within a theme up to a certain budget or timeline impact without additional approval, as long as it improves the outcome likelihood.
If you’re evolving your roadmap practice, start with three moves. First, reframe your plan in outcomes and publish a driver tree that connects those outcomes to the few leading indicators you believe move them. Second, stand up a continuous discovery cadence with a visible opportunity solution tree and an assumptions-to-test backlog. Third, implement time windows and confidence levels for all mid- and long-term items, and pair each major initiative with explicit kill criteria. You’ll feel the difference in a single quarter: clearer trade-offs, faster learning, and more predictable delivery—despite uncertainty.
In the end, a roadmap that thrives in uncertainty is an agreement about how we learn and decide together. It aligns the organization on outcomes, it funds options—not fantasies—and it gives empowered product teams room to maneuver. That’s how top product teams plan for uncertainty and still deliver with confidence.
Lately, it feels like every morning brings a new AI launch, a dazzling demo, or a must-try tool. I love the pace of innovation, but the constant stream can trigger counterproductive FOMO if I’m not intentional. As a product leader, I’ve learned to turn that anxiety into a disciplined learning system—one that keeps me curious without letting novelty hijack my focus.
That’s exactly why this conversation with Petra Wille and Teresa Torres resonated with me. They explore how to stay experimental in the AI era without chasing every shiny object. Their perspective aligns closely with my own operating cadence: start with real problems, go deep on a small set of tools, and create explicit boundaries between work, learning, and play.
Listen to this episode on: Spotify | Apple Podcasts
Here’s the mindset I apply. I don’t start with tools—I start with problems. When I encounter concrete friction in a workflow or see a credible opportunity to improve an outcome, that’s my trigger to explore a new capability. This mirrors the continuous discovery habit of prioritizing opportunities over solutions, and it’s how I avoid performing “innovation theater.”
To keep exploration healthy, I time-box my learning. I block recurring windows specifically for experiments, reading, and hands-on trials so they don’t overrun my core product work. During these blocks, I’ll set a clear question, run a tight test, and capture what I learned. No rabbit holes, no endless tinkering.
I also separate “interesting” from “actionable.” Plenty of inputs are worth awareness, but very few deserve immediate action. I bookmark the rest for later. This simple filter reduces cognitive load and keeps my backlog—from ideas to proofs of concept—well-governed.
Social media can amplify technology hype cycles, so I establish boundaries. I batch consumption, mute low-signal channels, and prioritize practitioner communities over performative threads. The goal isn’t to be first; it’s to be right for my customers, my team, and our strategy.
When choosing what to try next, I use a practical rubric. Does the tool target a real friction I’ve seen in discovery or delivery? Can it plug cleanly into our AI workflows without unsustainable glue work? Do we have a safe, compliant way to test it? Is there a plausible path from trial to compounding value? If the answer isn’t a confident yes to most of these, I wait.
Depth beats breadth. I’d rather take one promising tool into a real use case, instrument it, and measure outcomes than skim ten trending demos. That tighter loop produces sharper intuition, clearer product bets, and better partner decisions. A quick opportunity solution tree helps me connect user pain to outcomes before I let any solution onto the field.
In the episode, Petra Wille and Teresa Torres talk candidly about managing FOMO, deciding which tools to explore, and designing intentional learning systems. They discuss why starting with a problem is more valuable than starting with a tool, how social media amplifies technology FOMO, and why going deeper with fewer tools can lead to better learning. If you’ve ever felt like you’re falling behind because you haven’t tried the latest AI tool yet, this conversation will help you rethink how you approach learning and experimentation.
If you’re curious about what came up, here are some of the tools and communities mentioned: Claude Code, OpenClaw (formerly Clawdbot, Moltbot), NotebookLM, Product Talk, ElevenLabs, Lenny’s Newsletter Community, and even a nod to Bridgerton for a touch of levity.
My takeaway is simple but powerful: curiosity doesn’t require constant experimentation. The best product managers cultivate a balanced system—grounded in product discovery, energized by focused experiments, and protected by clear boundaries—so we can learn faster while staying pointed at outcomes that matter.
Discussion Question: How do you decide which new tools or technologies are worth exploring—and which ones you can safely ignore?