AI headlines are everywhere—and many claim they know exactly what’s coming next. In product management, I’m often asked to make single-point predictions about gen ai and LLMs for product managers. I resist that temptation because confident forecasts are seductive—and usually wrong.
Listening to Teresa Torres and Petra Wille unpack why certainty fails reinforced what I practice with my product trios: scenario planning. Instead of betting on one future, I explore several plausible ones, define the signals that would confirm or disconfirm each, and translate those insights into product strategy and product roadmapping and sprint planning we can adapt as evidence evolves.
Their argument mirrors what I see with customers and stakeholders: people are bad at predicting the future, and overconfidence creates fragility. Early adopters don’t represent everyone, so when we extrapolate from enthusiasts to the mainstream, we waste time and erode trust by building the wrong things.
Here’s how I apply this to avoid technology FOMO and make sharper AI Strategy decisions. I treat every bold claim as one possible future, then ask, “what else could happen?” I push extremes—AI everywhere vs. AI as invisible utility; GUIs vanish vs. GUIs evolve; centralized vs. edge compute—and hunt for the needs that stay true across scenarios. Those invariants anchor empowered product teams to outcomes, not outputs, and they help us stage bets responsibly.
Listen to this episode on: Spotify | Apple Podcasts
My key takeaways: Confident predictions are often wrong. Early adopters don’t represent everyone. Treat predictions as one possible future. Scenario planning > trying to be right. Focus on patterns, not hype.
In short: We’re in a period of change—but no one can predict exactly how it plays out. Strong predictions often ignore uncertainty.
A better approach in practice: Treat every prediction as a scenario. Ask: what else could happen? Use multiple futures to guide decisions.
As you evaluate roadmaps, watch for traps like “My experience = everyone’s future” thinking, over-indexing on early adopters, and ignoring real-world constraints like budgets, compliance, and change management.
Tactically, we run quick scenario exercises, push ideas to extremes to explore implications, and extract the underlying insight (not the exact prediction). This complements continuous discovery and helps us write outcomes vs output OKRs that are resilient to uncertainty.
00:00 – The problem with future predictions
04:00 – Why experts get it wrong
06:00 – Scenario planning explained
12:00 – Early adopters vs. reality
20:00 – AI, GUIs, and extreme takes
27:00 – Using scenarios in product work
34:00 – Final thoughts
Resources & Links:
Follow Teresa Torres: https://ProductTalk.org
Follow Petra Wille: https://Petra-Wille.com
Mentioned in this episode:
Claude Code
What did I miss—or what scenarios are you considering for your team? Leave a comment below and let’s compare notes.
I’m fascinated by how fast truly AI-native companies can move when the problem is urgent, the founders have deep domain credibility, and the culture is built around customer obsession from day one. Artemis, an AI-native security platform, just emerged from stealth with $70M in combined seed and Series A funding, assembled a 30-person team in seven months, and made a bold promise to “stay on a texting basis with every customer, even at scale.” As a product leader, I see this as a masterclass in AI Strategy, go-to-market focus, and disciplined execution in cybersecurity.
At its core, Artemis is operating in what I’d call an “AI vs AI” security war: increasingly, we’re defending against adversaries who leverage models just as aggressively as we do. That shifts the job from rule-writing to intelligence orchestration, threat detection and response at machine speed, and continuous evaluation. It also explains why AI-native companies are outperforming their AI-enabled counterparts—when intelligence is the product, the org must be built around model quality, data pipelines, and rapid iteration, not as a bolt-on.
Founder-market fit is the early signal I look for, and here it’s unmistakable. Shachar Hirshberg’s “AWS and Palo Alto” playbook and Dan Shiebler’s path “From Twitter to Abnormal” create a rare combination: deep infrastructure and enterprise security know-how paired with production-grade machine learning at scale. When those experiences intersect, you get crisp problem statements, faster learning loops, and credibility with the exact ICP that feels the pain first.
Timing the leap to build is more art than science, but I listen for three cues: customers describing the problem in quantified terms, a wedge that can deliver value within one buying cycle, and a data advantage that compounds. Artemis clearly identified a high-urgency buyer and ignored adjacent segments that would dilute focus—an underrated act of courage that accelerates product-market fit.
Hiring for AI fluency is a different exercise than traditional software roles. I don’t just screen for model familiarity; I screen for product thinking under uncertainty, a bias for eval-driven development, and the ability to explain tradeoffs to security teams. Practical prompts help: “How would you diagnose precision/recall tradeoffs under evolving threat patterns?” or “Show me how you’d design a red/blue evaluation harness for a new detection.” The best candidates can translate model metrics into business outcomes and customer trust.
Building a 30-person AI-native team in stealth requires ruthless clarity on the handful of roles that compound: forward deployed engineers who can ship with customers, solutions engineering that feeds learning back into the model, and product managers who treat data as the primary surface area. Culture-wise, I anchor on two rituals: weekly customer debriefs with actual artifacts (alerts, misclassifications, escalations) and a written log of hypotheses, evals, and next bets—so the entire team can reason from the same evidence.
AI implementation reshapes the dashboard. Beyond the usual business KPIs, I watch a second layer: model precision/recall by scenario, alert fatigue reduction, time-to-first-signal on emerging threats, drift and data freshness, and latency under load. When these improve, downstream product metrics—activation, expansion, NRR—almost always follow. Observability isn’t an afterthought; it’s the control center for trust in AI-driven cybersecurity.
ICP discipline is non-negotiable. Artemis focused on the segment with the highest urgency-to-adopt and the clearest data pathways, and deliberately ignored a seemingly attractive adjacent ICP that would slow learning. I’ve made that trade myself: it feels painful in the short term but pays off in faster cycles, cleaner roadmap decisions, and better founder-led GTM.
Closing the first customers is where the magic happens—and where the most surprising signals of early product-market fit emerge. It’s rarely about feature breadth. It’s about whether customers escalate, volunteer data, and invite your team into their workflows. In founder-led sales, the most valuable insights come from the objections you lose on. I document every “no,” cluster them by root cause, and turn the top two into experiments within a sprint.
I also believe the first product should make founders a little uncomfortable—just enough to prove the thesis in the messiest, fastest path possible. In AI security, that often means prioritizing the smallest end-to-end loop that can stop or downgrade a real threat, even if the initial UX is rough. If the loop works, you’ll earn the right to harden it.
Co-founder dynamics matter as much as the roadmap. I liked the question “Should we be arguing more?” because it reframes conflict as a system. My rule: disagree in writing with a time box, escalate only the principle in dispute (not the plan), and commit to the decision with a pre-agreed review point. This keeps speed without calcifying bad calls.
On structure, I’m convinced AI-native beats AI-enabled for this market. Organize around data, evaluations, and deployment rather than traditional feature teams. Blend product, research, and solutions into durable, customer-facing units. Consider forward deployed engineers who can ship safely in live environments and bring back the sharpest, most actionable learning. It’s the only way to keep pace with adversaries that iterate as fast as you do.
The broader landscape provides context and competition. I benchmark capabilities and go-to-market motions against players like Abnormal, CrowdStrike, and Palo Alto Networks, with respect for the automation lineage from Demisto (now Cortex XSOAR). Cloud scale and data gravity from Amazon Web Services (AWS) matter, while model innovations from OpenAI and Anthropic raise the offensive and defensive bar. And Artemis is staking a claim in that intersection—where security outcomes, model excellence, and frontline customer intimacy meet.
If you care about AI risk management, threat detection and response, and building empowered product teams that can win in this “AI vs AI” environment, the lessons here are clear: hire for AI fluency, not just titles; instrument the model like a business; let founder-led GTM shape your roadmap; and keep the customer close enough that you can text them—because that’s how you outlearn the market.
Scaling a real-world marketplace from scrappy to dominant takes a different kind of product leadership. Reflecting on Christopher Payne’s decade leading DoorDash as President and COO — growing from roughly 70 employees to the dominant food delivery platform in the US — I’m struck by how much of that success hinged on mastering an atoms-based business while still operating with software-level rigor. As a VP of Product Management, I see the same patterns in my own work: relentless clarity on inputs, a bias for builder-executives, and a cadence that keeps leaders close to product details without becoming bottlenecks.
Running an atoms-based business versus a pure software company forces you to obsess over operational physics: unit economics, quality control, on-time reliability, and dense local liquidity. It’s precisely where traditional “bits” executives can stumble. What’s worked for me is a simple “plate spinning” framework for executive attention: identify the five or six plates that must never stop — customer experience, marketplace health, quality and safety, product velocity, platform reliability, and P&L — then schedule recurring deep dives to keep those plates spinning. If a plate wobbles, I drop in, fix the root cause, re-instrument the inputs, and zoom back out.
Hiring at hypergrowth speed only works when you bias toward a “builder mentality.” I look for executives who run toward fuzzy problems, write clearly, and can prove they’ve shipped value with incomplete information. Prior industry experience can be a liability when you’re reinventing the market; first-principles thinkers outlearn domain experts who try to port yesterday’s playbooks. In executive hiring, I’ve found structured work samples and narrative memos far more predictive than marathon interview loops — companies routinely spend too much time on job interviews and too little time evaluating how candidates think and execute.
Great executives never outgrow the details. Staying close doesn’t mean micromanaging — it means sampling the customer journey and instrumenting the system so you can feel where it hurts. In my own practice, I rotate through frontline touchpoints weekly: support transcripts, NPS verbatims, failed checkout sessions, and reliability dashboards. Small signals often reveal systemic issues. A single ciabatta bread moment — the kind of edge-case substitution that seems trivial — can expose broken handoffs, unclear policies, and misaligned incentives across the marketplace.
Top-down goal setting beats bottom-up when you’re aiming for category leadership. Bottom-up targets tend to regress to comfort; they calibrate to today’s constraints, not tomorrow’s possibilities. I set ambitious, top-down outcomes (not output), frame the non-negotiables, and map driver trees to clarify the input metrics that matter. Then I ask empowered product teams to pressure-test the plan, propose approaches, and own the how. This preserves ambition while unlocking creativity — a practical balance of clarity and autonomy that outcomes vs output OKRs were designed to achieve.
One-size-fits-all management is a myth. Early-stage teams need hands-on coaching and fast decisions; later-stage teams need mechanisms that scale: crisp PRDs, pre-mortems, and operating cadences that separate strategy, planning, and execution. The mark of a high-functioning executive team is not uniform style — it’s high candor, fast escalation paths, and visible commitment after debate. In tough moments, a little charisma goes a long way; in practice, that’s not theatrics, it’s steady optimism, simple language, and consistent follow-through that keeps people moving forward.
The hypergrowth skill stack for executives is surprisingly learnable: ruthless prioritization under uncertainty, narrative writing that aligns cross-functionally, structured delegation with clear “inspection points,” and a weekly rhythm that protects maker time. I leverage a cadence of business reviews (inputs > outputs), customer-scent checks, and decision logs so we can move fast without losing the thread. CEO and executive time management is the ultimate forcing function — if we can’t show where our attention maps to goals, the team won’t either.
Some of my enduring lessons echo the best of Amazon and eBay: customer obsession beats competitor obsession, input metrics beat lagging vanity metrics, and simple mechanisms beat heroics. From Jeff Bezos’s playbook I borrow the insistence on written narratives, single-threaded ownership, and clarity on what will not change. Those principles remain the backbone of platform scalability and resilient product strategy, especially when markets get noisy.
AI is about to flatten organizations. With agentic AI, retrieval-first pipelines, and AI workflows embedded into product development, managers can widen their span without losing fidelity. I see LLMs for product managers accelerating discovery, PRD drafting, and experiment analysis — while raising the bar on decision quality. The implication for leadership: fewer layers, more transparency, and even greater pressure to define sharp, top-down outcomes that teams can autonomously pursue.
If I had to compress this into a playbook, it’s this: set audacious, top-down goals; keep your “plate spinning” calendar sacred; write more than you talk; hire builders, not resume archetypes; sample the customer journey every week; and build mechanisms that make the right thing easier than the heroic thing. That’s how you scale product management leadership from dozens to thousands — in atoms, in bits, and in the messy, exhilarating space where they meet.
Product teams rarely fail because they don’t ship enough features; they fail because they don’t learn fast enough. That’s the core tension I manage every day: when to build to learn and when to build to earn. Navigating that balance is how we protect focus, accelerate time-to-value, and ultimately deliver durable business impact.
Over the years, I’ve seen at least two major ways to develop product: build to learn and build to earn. The first is discovery-led and evidence-seeking; the second is delivery-led and value-capturing. Both are essential. The real craft is knowing which mode to be in, when to switch, and how to keep stakeholders aligned around outcomes instead of output.
The project model remains the default in many organizations—even in the age of AI—and it’s all about output. Stakeholders or executives assemble a prioritized roadmap of features and projects, and teams ship against it. This can create momentum, but without clear outcome metrics and customer validation, it’s easy to drift into a feature factory that looks productive while missing the mark on user value and business results.
When I build to learn, I emphasize continuous discovery. That means using customer interviews to surface unmet needs, running lightweight prototypes to test desirability and usability, and deploying A/B testing to quantify impact. I map assumptions, risks, and opportunities with an opportunity solution tree, and I timebox experiments so we learn fast and cheap. The standard is evidence, not opinions—especially my own. The goal is simple: reduce uncertainty before we scale.
When I build to earn, the objective shifts to capturing value with confidence. Here I align teams to outcomes vs output OKRs, commit to clear acceptance criteria, and ensure product roadmapping and sprint planning reflect the highest-leverage bets we validated in discovery. Delivery excellence matters: crisp definition, reliable release trains, observability, and a strong feedback loop to confirm we’re moving activation, conversion, or retention in the intended direction.
Deciding when to transition from learning to earning is all about thresholds of evidence. I look for leading indicators that our solution reliably solves the target problem, shows a measurable lift in key behaviors, and can be delivered with acceptable risk. If we can’t articulate the expected outcome and how we’ll measure it, we’re not ready to scale. If we can, we invest, monitor impact, and keep guardrails in place to avoid scope drift.
The operating model that makes this sustainable is simple and disciplined. I rely on empowered product teams organized as product trios (product, design, engineering) to run dual tracks of discovery and delivery. We socialize learning with stakeholders early and often to strengthen trust and stakeholder management. We elevate strategy by linking every roadmap item to a problem statement, a testable hypothesis, and a quantified outcome—no orphan features, no vanity launches.
In the AI era, speed can tempt us back into shipping-by-idea. I use gen AI for product prototyping and insight synthesis, and I lean on LLMs for product managers to accelerate discovery work—without treating AI as a shortcut to validation. Our AI Strategy clarifies where AI augments discovery, where it powers the product, and how we evaluate risk, so we move faster without compromising rigor or ethics.
My rule of thumb: spend just enough time building to learn to achieve conviction, then shift decisively to building to earn—while preserving a small discovery cadence to keep learning alive. This rhythm protects focus, compounds insight, and makes growth more predictable. It’s how we avoid the output trap, deliver meaningful outcomes, and create products that customers love and the business celebrates.
Turning a rambling stream of consciousness into a clean task list while someone is still talking has been a longtime product dream of mine. With Ramble, Todoist brought that dream to life by using live audio AI to capture tasks in real time—no transcription step required. The result is a voice-to-task flow that feels natural, fast, and surprisingly disciplined.
As I listened to the Doist team—Ernesto Garcia (Front-end Product Engineer), Thomas Jost (Backend Software Engineer), and Hugo Fauquenoi (Product Manager)—walk through their approach, I heard a blueprint for building pragmatic GenAI features. What began as a two-to-three month AI exploration became one of their most technically deliberate releases: a “Gemini-powered pipeline that makes tool calls while the user is still speaking, surfacing tasks on screen in real time without any text output from the model.”
The breakthrough started with user research. People weren’t merely dictating tasks; they were doing a “brain dump” first—often into pen and paper or even ChatGPT voice—and only then committing items to Todoist. Meeting users where they already are reframed the problem: don’t force structure upfront; capture fluid thought and translate it into actionable tasks instantly.
That insight led to a bold architectural choice: skip transcription entirely and process raw audio directly with a Gemini live audio model. By removing the brittle middleman of text, the team reduced latency and kept the model focused on one job—turning intent into structured actions. It’s a crisp example of AI workflows designed for reliability over novelty.
The real magic is in the real-time “tool calls.” As the user speaks, the model triggers add task, edit task, and delete task operations immediately. For high-friction contexts like driving, they paired visual task cards with subtle sound effects as confirmation cues. It’s thoughtful conversation design that respects attention and safety without sacrificing speed.
Teaching the model to capture tasks literally—without over-interpreting or trying to complete the work—required careful prompt engineering for voice and temperature tuning. Drawing a bright line between “capture versus do” kept the experience trustworthy. In my own AI Strategy work, I’ve found that establishing explicit agentic guardrails early prevents unintended autonomy later.
Dates were the sleeper challenge. The team had to inject the current date, normalize to days vs. months, and always output dates in English for the natural language parser—while preserving the user’s original language for everything else. If you’ve ever shipped date handling across locales, you’ll appreciate how many edge cases hide in “Taming Dates and Time.”
Quality didn’t hinge on intuition alone. They built an LLM-judge eval system using real employee recordings from 100+ people across 35 countries in 20+ languages to catch prompt regressions. That’s eval-driven development done right: representative data, repeatable scoring, and tight feedback loops as models and prompts evolve.
For project and label matching, they chose direct context injection over RAG. Instead of building a retrieval pipeline, they injected the full project/label list into the system prompt. With smart context window management and a sharply constrained task schema, this was both simpler and more accurate. Sometimes the fastest path to product-market fit is removing moving parts, not adding them.
One product principle stood out: easy correction beats perfect first-time accuracy. Natural language interfaces earn trust when users can fix misfires in a tap or two. That bias toward quick recovery over false precision is how you ship AI that feels useful from day one.
Looking ahead, the roadmap is compelling: multimodal task capture from images and text blobs, Apple Watch support, and automation integrations. As voice AI agent patterns mature, this “tool-only architecture” sets a solid foundation for going from capture to coordinated execution—without losing the simplicity that makes Ramble shine.
If you want to hear the full conversation, you can listen on Spotify or Apple Podcasts. It’s a masterclass in building focused GenAI features that trade cleverness for clarity—and still delight.
Resources & Links: Todoist • Doist • Google Vertex AI (Gemini)
Over the past quarter, I’ve been obsessed with a simple question: how do real people actually prompt AI agents when the stakes are high and the clock is ticking? We analyzed 27K sessions with Amplitude's Global Agent using our Agent Analytics tool. Here's what we found out about how real users are prompting our agent. That single line belies months of careful instrumenting, qualitative review, and product debates—and it forever changed how I design agent experiences.
The clearest pattern I saw: users don’t craft “perfect” prompts—they co-create with the agent. Most sessions began with a broad intent, then tightened through rapid, iterative turns. The winning structure emerged as context, command, and constraints. When our agent acknowledged context first, clarified the command, and reflected constraints back, users responded with noticeably more confidence. It reinforced what great prompt engineering already teaches, but grounded in lived behavior across thousands of journeys.
Trust was the next breakthrough. People wanted transparency on capabilities, a concise first answer, and an easy path to deeper detail and sources. They frequently asked the agent to show its work, summarize trade-offs, or restate assumptions in plain language. Instrumenting observability into the agent’s reasoning artifacts—without overwhelming the user—proved foundational for building credibility session by session.
On task complexity, users fared best when the agent orchestrated a few small, verifiable steps rather than one heroic leap. Retrieval-first pipeline patterns consistently reduced confusion and rework, especially when paired with strong context window management. The more the agent proactively chunked the problem, validated intermediate outputs, and offered next-best actions, the smoother the journey—and the more reusable the prompts became.
UX nudges mattered as much as model quality. Inline examples (“Try this”), one-click refinements (“Shorter,” “Add a table,” “Cite sources”), and lightweight guardrails kept momentum high without boxing users in. When the agent made uncertainty explicit and offered safe fallbacks, abandonment dropped and users explored more ambitiously. The experience felt less like “querying a model” and more like collaborating with a capable teammate.
From a product management lens, these insights shape how I prioritize agentic AI. I’m doubling down on: scaffolded prompts that lead with context and constraints; transparent citations and assumptions; multi-step plans that the user can edit; and evaluation loops that A/B test prompt templates, tool strategies, and response formats. I’m also investing in analytics that connect session patterns to activation, speed-to-value, and retention so we can run eval-driven development, not opinion-driven debates.
If you’re building agents into a core product workflow, start by designing for iterative co-creation, not one-shot brilliance. Offer progressive disclosure, keep the first answer tight, and make verification effortless. Shape the model with retrieval-first strategies, manage your context window like a scarce resource, and treat observability as a feature, not a debug tool. Most of all, let real usage guide your roadmap—these 27K sessions reminded me that the best agent UX is learned alongside our users, not imagined in isolation.
Inspired by this post on Amplitude – Perspectives.
MCP is the acronym I keep hearing in every product conversation—and for good reason. When teams like Miro and Atlassian lean in, it signals a real shift in how we design, ship, and scale value. From my vantage point leading product at HighLevel, I see MCP less as a feature and more as an operating advantage: a way to align strategy, execution, and governance so product teams move faster with higher confidence.
When I evaluate a platform like MCP, I start with three questions. First, does it advance our product strategy and sharpen competitive differentiation? Second, does it strengthen product-led growth by improving activation, onboarding, and retention? Third, does it help us drive outcomes vs output OKRs so we consistently measure what matters, not just what ships?
Execution discipline makes or breaks any MCP investment. I design measurement upfront: instrument A/B testing, define activation milestones, and monitor retention cohorts. In parallel, I use Pendo for in-app guides and product tours to accelerate adoption and reduce time-to-value, then connect this data back to roadmap decisions so each release compounds learning instead of creating noise.
On the operating model, I apply a rigorous build vs buy lens and stress-test platform scalability, reliability, and integration surfaces. Stakeholder management is critical—security, SRE, and solutions engineering must be partners from day one. I anchor teams in product trios and continuous discovery so we learn with customers in the loop, not after the fact.
At Pendomonium 2026, Pendo CPO Rahul Jain brought together four product leaders who are building with MCP. Read or watch their conversation to learn more.
My practical playbook for MCP: choose one high-signal use case, define clear success metrics, and run a tightly scoped pilot with visible executive sponsorship. Treat governance and data hygiene as first-class requirements. Close the loop weekly with qualitative insights from customer interviews and quantitative telemetry from experiments. Only then scale to adjacent workflows, keeping a steady focus on measurable customer value and repeatable delivery.
Whether you’re an emerging startup or an established enterprise, the opportunity is the same: turn MCP curiosity into durable capability. With disciplined measurement, thoughtful stakeholder alignment, and a relentless outcomes mindset, MCP can become a lever for product management leadership—not just another acronym in the stack.
I’ve learned that the smallest slice of your support queue often dictates the majority of your operating cost, customer memory, and automation ceiling. In product reviews and CX ops deep-dives, I see the same pattern: the “easy” tickets pad your resolution counts, but the complex, multi-step queries quietly own your handle time and your brand trust. If you care about compounding impact, your customer support AI strategy has to target that hardest percentage first.
Complex queries are a small percentage of your queue, but they consume a disproportionate share of your team’s time.
Take a typical queue: password resets outnumber refund disputes ten to one, but a reset takes five minutes and a dispute takes thirty. The “rare” query accounts for over a third of total handling time. The same pattern holds for account investigations, subscription changes, and billing disputes.
How you handle complex queries is also what customers actually remember about their support experience. When someone is dealing with a damaged order or a billing dispute, the stakes are higher, and a fast, good resolution is what separates a forgettable interaction from one that builds lasting trust.
Most AI Agents automate the easy, informational queries well. The question for your automation rate is whether they can handle the hard ones. That’s where agentic AI and robust AI workflows make or break your outcomes.
We’ve gotten really good at informational queries – the hard part is what comes next. I’ve seen teams invest deeply here, and for good reason: it lifts containment quickly and cheaply. But to break through the plateau, you have to execute actions across systems, not just answer with text.
We’ve invested deeply in informational Q&A. We built Apex, a specialized customer service model trained on billions of support interactions, as Fin’s core answering engine. Beneath that sits a custom retrieval model, a purpose-built reranker, and a unified RAG pipeline, all trained specifically for customer service. Fin resolves issues at a higher rate than general-purpose frontier models, with fewer hallucinations and at lower cost.
But informational Q&A only covers queries where text is the answer. Most Agents can handle that. Far fewer let you configure complex, multi-step actions without a forward-deployed engineer setting it up for you, which creates a gap.
Every query your team handles falls into one of three categories:
Informational: “Can you ship transatlantic by priority next day?” Answered with text from your knowledge base.
Personalized: “Where is my order?” Requires data unique to that user.
Action-led: “My order arrived damaged, I need a refund.” Requires doing something: checking a return window, cross-referencing transaction data, making a judgment call – reading from multiple systems and acting across them.
From Jan to Apr 2026, the trend moves steadily upward, pausing briefly before a sharp late surge. A clear snapshot of momentum for customer service KPIs, finance results, and the impact of new procedures.
These complex queries, the ones that require multi-step processes across systems, aren’t edge cases; they’re the reason your support team exists. This is the gap Fin Procedures was built to close.
It works in practice, and the trajectory matters for product strategy and ops planning.
Procedures is live, it’s scaling, and the results are clear. Since launching in managed availability, Procedures has handled over 1.5 million conversations, and volume is doubling month over month across hundreds of apps in fintech, e-commerce, gaming, healthcare, and SaaS.
When customers hit complex, multi-step queries, the experience is dramatically better when Fin can do the work end-to-end. We tested this with a randomized 5% holdout – conversations where Procedures would normally run, but didn’t. CSAT was 28.93% higher when Procedures ran, a statistically significant result.
A product, not a services engagement. I’ve sat through too many “automation” projects that were really solutions engineering gigs: workshops, custom scripts, then a queue of change requests when policies shift. It’s fragile and slow.
The B2B AI industry has a consultingware problem. It’s not databases being forked anymore, it’s prompts. The economics of maintaining bespoke setups per customer don’t work. Either the application falls behind new models, or the vendor changes the model and quality degrades invisibly.
In my view, an agentic AI platform should be a product your team owns end to end: a natural language editor – literally paste your existing SOPs – branching logic, data connectors, and AI-powered simulations for testing. Your CX ops team configures this, iterates on it, owns it. If you need help, a forward-deployed team can assist, but they’re optional, not a dependency. You always have control.
And because it’s a unified product, improvement compounds. When the vendor optimizes a prompt, every customer’s Procedures get better. When they upgrade the model, they can A/B test across the entire customer base and know it’s better before rolling out. You can’t do that when every customer has a bespoke prompt. The consulting model isn’t just expensive, it’s structurally unable to compound.
Today, Fin Procedures is available to every Intercom customer – no waitlist or managed rollout, ready for all 8,000+ customers.
We’re iterating fast based on real customer feedback. Here’s what’s landed since the last major update, and why it matters for reliability and governance:
AI-powered Procedure review: Flags broken logic, missing references, and unreachable conditions before you deploy.
Kick off your journey with the #1 Agent—an AI partner designed to turn resolutions into real outcomes. Tap “Start a free trial” to explore faster, smarter customer service and see how Fin delivers value from day one.
Procedure failure reporting: A new reporting dimension that lets you drill into conversations where Procedures failed, so you can diagnose and fix.
Version history with rollback: Track every change, compare versions, roll back if needed.
Data connector health monitoring: See at a glance if your integrations are healthy, degraded, or failing.
Optional data connector parameters: Fin only asks customers for information when it’s actually needed, instead of prompting for every field.
Email Simulation support: Test how your Procedures behave across chat and email before going live.
Agent in the Loop (Beta) unlocks the next tranche of automation. Even with Procedures, two things hold teams back from automating their most complex queries: missing integrations and policies that require a human sign-off on sensitive decisions.
“Agent in the Loop” is built for both. Need Fin to check your internal admin tools but haven’t built a data connector yet? Put a human checkpoint at that step. Fin handles the conversation, gathers context, and pauses, surfacing a structured summary for a human agent to verify or act, then resumes. You get automation on the 80% that doesn’t need the integration.
For compliance – identity verification, high-value refunds – Fin does the legwork, a human makes the final call and then hands it back to Fin. This works natively in the Intercom Inbox and via Slack. Some competitors don’t have an inbox-native variant at all, meaning humans need to leave their primary workspace to review AI actions.
Procedures are also built to let you collaborate with all your teammates – both human agents and AI Agents. Fin can work with them directly inside a Procedure, using APIs and webhooks to loop in another teammate mid-flow, hand off context, and pick back up once they’re done.
Making it easier, faster. Procedures is already self-serve, but the next step is making Procedure creation, testing, and maintenance significantly more streamlined and easy to do, with less manual editing and more AI-assisted building and debugging. There’s a lot coming in this space over the next few months – and it aligns perfectly with a retrieval-first pipeline and stronger governance at scale.
The hardest percentages matter the most. The biggest unlock for your automation rate won’t be answering more FAQs, it will be handling the complex, multi-step queries that consume your team’s time and define what customers remember about their experience with you.
That means working with an Agent that goes beyond answering questions and executes processes. A product your team owns and configures, not a service you buy and hope gets maintained. And a platform where every improvement compounds across every customer. That’s Procedures. Available now, for everyone.
Lately, I keep hearing a familiar question: with AI making it so easy to generate ideas and build products, do we still need product managers? My answer is unequivocal—yes. Tools accelerate delivery, but they don’t build trust, reconcile competing incentives, or create the shared understanding teams need to ship outcomes. Product work is relationship work.
I recently listened to “Product Work Is Relationship Work – All Things Product with Teresa & Petra,” and it echoed what I see every day in high-performing product organizations. If you prefer to watch, here’s the episode on YouTube: https://www.youtube.com/embed/d-0f8uAfc8w?feature=oembed
Listen to this episode on: Spotify | Apple Podcasts
While AI can help build things faster, it can’t replace the relationship work required to align stakeholders, navigate competing priorities, and create shared understanding across teams. That’s the hard, human part of product management—and it’s not going away.
In my experience, product teams stall when collaboration becomes transactional. We jump to negotiation (“What can you commit by Friday?”) before establishing context (“What problem are we solving and why now?”). When I slow down to get curious—about constraints, incentives, and assumptions—momentum actually increases because we’re rowing in the same direction.
Stakeholder alignment often breaks down when we conflate advocacy with exploration. We argue our viewpoint as if it were the only lens that matters, rather than making space to surface how others see the system. I’ve found the distinction between “dialogue vs. discussion,” rooted in work by Chris Argyris and elaborated in The Fifth Discipline by Peter Senge, to be a powerful reset. Dialogue builds shared understanding; discussion decides. You need both, in the right order.
Language matters in the room. The improv principle “Yes, and” is deceptively simple but transformative. When a designer, engineer, or executive feels heard (“Yes”) and we build on their idea (“and”), we create psychological safety without sacrificing critical thinking. I use “Yes, and” to explore perspectives before we converge on decisions—especially with product trios and senior stakeholders.
Here are the moves I rely on to keep collaboration relational and outcomes-focused. First, we align on outcomes before solutions. I explicitly separate outcomes vs output OKRs so we’re clear on what success looks like, independent of the features we ship. That clarity reduces rework and speeds up decision-making later.
Second, we operationalize curiosity with continuous discovery. I schedule recurring, lightweight touchpoints with customers and internal stakeholders so insights compound. When learning is continuous, debates quiet down—evidence does the heavy lifting.
Third, we invest in relationship rituals. Regular 1:1s with key partners, stakeholder maps that capture motivations, and pre-reads that frame trade-offs all prevent misalignment from surfacing in the last mile. These small habits pay huge dividends in trust and speed.
Fourth, I’m explicit about mode-switching in meetings: are we advocating a position or exploring perspectives? Calling the mode out loud prevents people from mistaking questions for opposition and keeps the conversation productive.
Fifth, we use “Yes, and” to move from possibility to practicality. We explore generously, then converge rigorously—ranking options by impact, effort, and risk so decisions are transparent and fair.
If stakeholder alignment, team dynamics, or product “politics” slow your team down, this conversation offers a practical reframe. You’ll move faster when you build the relational tissue first—because alignment is an accelerant, not a tax.
Resources & Links:
Follow Teresa Torres: https://ProductTalk.org
Follow Petra Wille: https://Petra-Wille.com
Mentioned in this episode:
Petra’s Coaching Packages
Work by Chris Argyris on organizational learning and dialogue vs. discussion
The Fifth Discipline: The Art and Practice of the Learning Organization by Peter Senge
Improv principle “Yes, and”: Saying “Yes, and” — A principle for improv, business & life and Yes, and …
Have thoughts on this episode or examples from your team? Leave a comment below—I’d love to learn what’s working (and what’s not) in your stakeholder landscape.
The best signal often comes from the least scalable work.
I’ve learned this the hard way—and the rewarding way. When I’m closest to customers, rolling up my sleeves with the team, I uncover nuanced, high-signal insights that no dashboard or aggregate report can reveal. Those insights, when treated with rigor and discipline, become the backbone of a durable product strategy and true product management leadership.
At Intercom, that is at the heart of how we operate on “swarms.” Swarms are cross-functional teams of Fin experts focused on ensuring customers succeed when trialing Fin. Each team consists of engineers, data scientists, and a product manager, all focused on optimizing Fin for our customers.
Working in these teams gives us deep insights into the needs of individual customers, but they can also form the foundation of new Fin features. Let me explain.
I frame the journey from insight to impact in three levels: “Level 1: Swarms – where the signal comes from,” “Level 2: Cockpit – where the signal starts to scale,” and “Level 3: Product – where the signal reaches maximum leverage.” This model blends continuous discovery with pragmatic solutions engineering and creates a clear path from hands-on learning to product-led growth.
Level 1: Swarms – where the signal comes from. The goal is simple: help Fin resolve more conversations and help customers understand and use the product. Swarms partner with customers to define their goals and how Fin fits into their workflows. We map out an automation roadmap by analyzing their conversations, determining the APIs and Procedures they need, and the level of automation they can achieve. We then support them in implementing it and reaching that outcome. This involves ongoing analysis to identify optimizations to their configuration and the next best actions for increasing automation levels, such as improving knowledge base content or deploying new APIs.
During a swarm, the feedback loop is fast. We test something, ship something, and quickly see whether the metric moves. That speed and depth is what makes swarms so valuable. It’s also what makes them hard to scale. I’ve felt the thrill of watching a key metric bend within hours—and the constraint of knowing that kind of attention doesn’t scale to every account.
For example, we developed an automation taxonomy to predict the level of automation a customer can achieve. Initially, this analysis was manual and took more than half a day to run, with time required to prep and visualize the data. But the effort was worthwhile. For one customer, we predicted an automation rate of 70% and they achieved exactly that.
By working closely with customers, we learn what drives success, but this work is inherently hands-on and doesn’t scale on its own. So the real challenge is figuring out how to turn what we learn in those high-touch engagements into systems, tools, and product changes that benefit far more customers. That’s the inflection point where AI workflows and product strategy meet.
Level 2: Cockpit – where the signal starts to scale. Not every customer should need swarm-level attention. The way we bridge that gap is by making the swarm analyses repeatable and shareable. Once we can run the same analysis across customers, we can start turning bespoke swarm learnings into reusable signals. This is where Cockpit comes in.
Transform customer signals into action: this dashboard tracks support conversation volume, taxonomy percentages by type, and topic demand across account settings, billing, integration, and more to guide scalable feature bets.
We take patterns learned in swarms and encode them into internal tooling inside our insights web app, Cockpit. Instead of analysis being a bespoke project, it becomes a workflow. For example, we scaled the automation taxonomy and this has enabled us to quickly understand automation potential for all customers.
Now, a customer success manager (CSM) can pick a customer, see their automation potential and current performance, understand the biggest issues, and propose next actions. This is how we scale the impact of swarm learnings through CSMs and Sales. It allows far more customers to benefit from the same patterns we see in high-touch work, without requiring direct data science involvement every time.
Cockpit also functions as a valuable proving ground. It gives us a way to test ideas across a much broader set of customers and see what generalizes before we consider taking anything further. In other words, we transform sharp, local signal into broadly useful guidance—an essential step in any AI Strategy that aims to balance precision with scale.
Level 3: Product – where the signal reaches maximum leverage. The real payoff comes when the patterns we have validated internally become part of the product itself. Instead of helping one customer directly, or helping many customers through internal teams, we deliver a feature directly to customers so they can improve Fin’s performance on their own. Today, the automation taxonomy is a part of Insights and accessible to customers who have this feature.
Another example is CX Score. It started with close work alongside Intercom’s Customer Support team to understand performance with Fin, initially through predicted CSAT and resolution. Over time, this work evolved into CX Score: a scalable way to measure conversation quality across all customers.
The product stage is fundamentally different from Cockpit because of the constraints. Cockpit provides a platform for our customer analyses/tools but it doesn’t need to scale as far as product. What moves into product has to work for every customer, without configuration, at scale, so it has to generalize. That bar is what protects long-term quality while unlocking product-led growth.
That’s why the move from Cockpit to product isn’t automatic. We’re not just asking whether something is useful, but whether it’s broadly useful, robust, and scalable enough to run across the entire customer base. As a product leader, I push for this discipline because it’s where customer success, engineering excellence, and business outcomes converge.
The loop. The model is simple. Swarms generate the best signal, grounded in real customer problems. Cockpit operationalizes that signal so CSMs and Sales can use it across many customers. Product takes the patterns that truly generalize and turn them into scalable features that enhance every customer’s experience.
This loop allows a small swarm data science function to have impact beyond a small set of high-touch accounts, resulting in a stream of continuous improvements across all three levels and an ever-increasing level of automation for our customers. Practically, it’s a repeatable playbook for product management leadership: start with high-signal discovery, prove repeatability, and only then scale through product. Done well, it compounds learning, accelerates time-to-value, and aligns the entire organization around measurable outcomes.
Internal Products Are Hard; Commercial Products Are Harder. That line captures years of hard-won lessons from leading both internal platforms and market-facing SaaS at HighLevel. I’ve seen how the two demand different muscles—even when the tech stack, talent, and timelines look the same on paper.
When I talk about internal products, I mean services and solutions that our own employees use to take care of customers—customer-enabling tools and services, agent consoles, fulfillment and billing workflows, operations dashboards, and the underlying platforms that keep them fast, compliant, and resilient. These tools don’t generate revenue directly, but they quietly determine customer experience, gross margin, and how quickly we can ship, resolve issues, and scale.
Commercial products, by contrast, add a second challenge layer. Beyond discovery, usability, and reliability, we must conquer positioning, pricing and packaging, competitive differentiation, sales enablement, procurement hurdles, and ongoing customer success motion. The surface area for failure is bigger, and the time-to-signal on product-market fit is slower and noisier.
Here’s how I decide where to invest. First, I anchor on outcomes, not output. If the business priority is net revenue retention, faster onboarding, or reduced cost-to-serve, internal products often provide the highest-leverage path. If the priority is new revenue, new market entry, or a must-have differentiator, we lean commercial. I make the trade explicit in outcomes vs output OKRs so we can defend the decision when pressure mounts.
Second, I run a clear build vs buy calculus. For internal needs, the default is buy if a mature, configurable solution exists that meets our security, data governance, and integration requirements. I only build when the workflow is core to our differentiation, the TCO of customization is lower than vendor sprawl, or we can capture unique proprietary advantage. For commercial products, I avoid embedding third-party IP in a way that caps differentiation or compresses margins as we scale.
Third, I insist on continuous discovery. Internal audiences are not a captive market—they’re discerning experts with real jobs to do. I treat them like customers, with structured customer interviews, journey mapping, and opportunity solution trees. I rely on empowered product teams and product trios to validate problems and reduce solution risk before we commit engineering time.
Fourth, I frame commercial vs internal work with capacity guardrails. In most planning cycles, I reserve explicit allocation for platform scalability and internal tooling, separate from feature bets. Without this, internal products become backlog filler, which guarantees we’ll pay the interest later in churn, SLA breaches, and slower delivery.
Execution differs too. For internal products, change management is the make-or-break. I plan enablement as a first-class deliverable: clear rollouts, in-app guides, training, and feedback loops with frontline champions. I track adoption, time-to-resolution, error rate, and satisfaction for internal users with the same rigor we apply to external users.
For commercial products, I design the discovery-to-GTM handshake early. Pricing and packaging must reflect value drivers discovered in research, not what’s easiest to meter. Sales and solutions engineering need crisp narratives, objection handling, and proof points. Customer success needs activation plans and health signals tied directly to leading indicators of retention.
Across both, I instrument the product and process. I lean on feature flags and progressive delivery to manage risk, and I protect SLOs with error budgets so teams balance reliability with iteration speed. CI/CD isn’t a badge—it’s how we earn the right to ship continuously without eroding trust.
Common pitfalls recur. Teams skip UX for employee tools because “they have to use it”—which backfires as shadow workflows and rework. Leaders underfund internal platforms, then wonder why velocity stalls. On the commercial side, teams over-index on features and under-invest in positioning and onboarding, leading to poor activation and elongated sales cycles.
What’s the payoff? When we treat internal products as products, we unlock scale: shorter handling times, fewer escalations, clearer accountability, and higher customer satisfaction. When we approach commercial products with the same discovery rigor plus smart GTM, we compress time-to-value and amplify differentiation. The craft is knowing which lever to pull when—and having the discipline to measure what matters.
My rule of thumb is simple. If the goal is operational excellence that compounds across the entire customer journey, invest in internal products with the same intensity you reserve for revenue-generating features. If the goal is market expansion or category leadership, invest in commercial products with a tight discovery-to-GTM loop. In either case, clarity of outcomes, disciplined discovery, and empowered teams win the day.
Product roadmaps should not be promises etched in stone; they are portfolios of bets made under uncertainty. When I build a roadmap, I’m not predicting the future—I’m designing a system that helps the team learn faster than the market changes, allocate capital wisely, and create alignment across engineering, design, go-to-market, and leadership.
The best roadmaps I’ve seen and shipped anchor on outcomes rather than features. “Outcomes vs output OKRs” is more than a slogan; it’s how we translate strategy into measurable impact. I start by defining a small set of outcome metrics that matter—such as activation rate, time-to-first-value, or expansion revenue—and attach clear key results and guardrails to each theme. This reframes prioritization from “what can we build?” to “what must change in customer behavior?” and gives empowered product teams real autonomy.
I organize the roadmap into time horizons—Now, Next, Later—with explicit confidence levels. Near-term items have higher confidence and more specificity; mid- and long-term bets are thematic with wider time windows. This approach reduces false precision and builds trust because stakeholders can see both the intent and the uncertainty. When dates matter, I use windows and service level expectations rather than single deadlines, and I pair each initiative with a lightweight risk scoring so we can discuss uncertainty explicitly rather than implicitly.
Continuous discovery keeps the roadmap honest. I partner in tight “product trios” across product, design, and engineering to run rapid customer interviews, opportunity sizing, and assumption tests before we commit significant delivery capacity. The opportunity solution tree is my favorite artifact here; it visualizes the path from outcomes to opportunities to experiments and solutions, making trade-offs and sequencing transparent. By the time something moves into sprint planning, we’ve already reduced key uncertainties and clarified the narrowest viable slice we can ship.
Uncertainty demands options. I plan initiatives as options with stage gates and explicit kill criteria rather than as single monolithic projects. For every significant theme, I outline base, best, and worst-case scenarios with pre-decided triggers for when we escalate, pivot, or stop. This practice prevents sunk-cost fallacy and keeps the team focused on evidence. We treat scope as a knob, not a switch, and we bias toward small, sequential bets that compound learning.
Capacity is strategy. I routinely reserve a discovery buffer—typically 10–20%—and a contingency buffer for integration, security, and performance risks that always show up late. I ruthlessly control work-in-progress to limit thrash and protect the team’s ability to respond when new information arrives. When we must navigate dependencies, I use thin vertical slices and decouple via contracts or feature flags so discovery momentum doesn’t stall while platforms evolve underneath.
Prioritization under uncertainty benefits from explicit models. I combine value, effort, and confidence with risk scoring to surface where the unknowns are hiding. Driver trees help us connect top-level outcomes to leading indicators, so we can place bets where they have the highest causal leverage. I also lean on the Kano Model and qualitative signals to avoid over-investing in performance attributes while neglecting excitement features that unlock differentiation and word-of-mouth.
The most effective stakeholder management is narrative-first. For executives, I present a one-page outcomes roadmap that shows themes, expected shifts in key results, and the learning plan. For teams, I provide a more detailed plan that links discovery insights, assumptions-to-test, and decision points. I make room for a “what we’re not doing” section to reduce noise and prevent shadow backlogs from reappearing in every meeting. Most importantly, I socialize change before it happens, explaining the evidence and the trade-offs so adjustments feel like progress, not whiplash.
Measurement closes the loop. We instrument experiments and releases with leading indicators tied to the driver tree and review them on a predictable cadence. If movement stalls, we diagnose whether we have a targeting problem (wrong audience), a value problem (weak proposition), or a friction problem (broken journey). That discipline lets us iterate with purpose instead of chasing vanity metrics or isolated anecdotes.
Here’s a concrete example of roadmapping through uncertainty. Suppose our Q3 objective is to “Increase user activation” with key results to raise the Week-1 activation rate from 32% to 45% and cut time-to-first-value by 30%. In discovery, customer interviews reveal confusion in the first-run setup and a missing integration that advanced users expect. We map an opportunity solution tree and identify two high-leverage opportunities: simplifying the first 10 minutes and offering a guided setup for the integration. We then shape two minimal bets: an in-app guide to streamline the first three tasks and an integration wizard behind a feature flag. Each bet has an explicit decision rule and a two-sprint runway. We ship the guide first, confirm a statistically significant lift via A/B testing, then expand scope. The integration wizard underperforms initial expectations, so we pause, revisit the assumptions, and re-allocate buffer to the stronger path. The roadmap updates in real time, and everyone understands why.
When uncertainty spikes—new competitor, pricing shock, platform deprecation—I shift the roadmap cadence to rolling-wave planning. We shorten planning horizons, increase the frequency of readouts, and elevate discovery allocations temporarily. We also create thematic “containment zones” where we explore multiple options in parallel with small budgets until one path justifies scale. This allows us to stay responsive without abandoning strategy.
Good governance accelerates, it doesn’t slow. A lightweight product council that reviews outcomes, risks, and cross-functional dependencies prevents surprise escalations and ensures we keep shipping what matters. We avoid death-by-approval by agreeing in advance on decision rights and thresholds—for example, a product trio can pivot a bet within a theme up to a certain budget or timeline impact without additional approval, as long as it improves the outcome likelihood.
If you’re evolving your roadmap practice, start with three moves. First, reframe your plan in outcomes and publish a driver tree that connects those outcomes to the few leading indicators you believe move them. Second, stand up a continuous discovery cadence with a visible opportunity solution tree and an assumptions-to-test backlog. Third, implement time windows and confidence levels for all mid- and long-term items, and pair each major initiative with explicit kill criteria. You’ll feel the difference in a single quarter: clearer trade-offs, faster learning, and more predictable delivery—despite uncertainty.
In the end, a roadmap that thrives in uncertainty is an agreement about how we learn and decide together. It aligns the organization on outcomes, it funds options—not fantasies—and it gives empowered product teams room to maneuver. That’s how top product teams plan for uncertainty and still deliver with confidence.