I’m sharing a focused set of insights on analytics, experimentation, and personalization designed to help teams ship smarter, reduce risk, and accelerate outcomes. Drawing on years of leading product teams, I translate complex data practices into practical playbooks you can apply immediately to improve user activation, conversion, and retention.
My approach starts with a strong measurement foundation. I lean on a unified analytics platform—often powered by tools like Amplitude analytics—to centralize product, marketing, and customer success signals. With clear event taxonomies, consistent governance, and trustworthy dashboards, teams gain a single source of truth to prioritize the right problems and sequence roadmap bets with confidence.
Experimentation turns insight into evidence. I emphasize A/B testing discipline, including minimum detectable effect (MDE), guardrail metrics, and pre-registered hypotheses. This repeatable system lifts decision quality, shortens feedback loops, and aligns cross-functional partners around what actually moves the needle, not what merely sounds promising.
Personalization compounds the value of experimentation by delivering the right value to the right segment at the right moment. Thoughtful in-app guides and product tours—rooted in behavioral signals—nudge users through friction points and increase the likelihood of early wins. The result is a more intuitive path to first value, stronger user activation, and healthier long-term engagement.
Retention is the ultimate scoreboard. I rely on retention analysis, cohorting, and leading-indicator metrics to connect feature usage to durable outcomes. When paired with product-led growth motions, teams can identify activation thresholds, build habit loops, and scale what works without overextending sales or support capacity.
If you’re getting started, begin with a crisp instrumentation plan, shared definitions, and a lightweight review ritual. Use continuous discovery practices, opportunity solution tree mapping, and driver trees to tie data signals to real user problems. From there, iterate: test small, learn fast, and scale what is proven. Over time, this system becomes a flywheel for product strategy—fewer debates, more evidence, better products.
In this series, I distill the frameworks, templates, and real-world lessons that have consistently improved outcomes for product teams: how to structure experiment backlogs, how to read funnel breakpoints, how to detect false positives quickly, and how to operationalize analytics for day-to-day decisions. Expect practical guidance you can copy, adapt, and run with immediately.
Inspired by this post on Amplitude – Perspectives.
For years, I’ve watched high-performing product teams run into the same wall: the gap between insight and action. Dashboards multiply, yet decisions stall. That final mile—where we interpret trends, prioritize tradeoffs, and ship changes—remains the last bottleneck. It’s not a data problem; it’s a bandwidth and focus problem.
Amplitude's AI Analytics Platform takes the next step: agents that investigate, monitor, and act so your team can build what actually matters.
From my seat leading product at HighLevel, I see “agentic AI” as a structural upgrade to the product operating system. Instead of waiting on human cycles to discover anomalies, craft hypotheses, and trigger the next experiment, Agent Analytics can continuously investigate user behavior, monitor mission-critical metrics, and initiate actions—closing the loop from observation to outcome. That shift transforms analytics from a passive reference layer into an active, decision-making teammate.
Practically, this matters because empowered product teams win on speed and focus, not on the volume of reports. When agents surface the most material opportunities—say, a sudden drop in activation for a high-value cohort or a retention dip tied to a recent release—we compress time-to-insight and, more importantly, time-to-action. The result is fewer context switches, fewer meetings, and more cycles invested in building meaningful value.
The most compelling use cases are those that compound: continuous discovery that highlights friction in onboarding flows, proactive retention analysis on at-risk segments, automated experiment prioritization aligned to outcomes vs output OKRs, and closed-loop alerts that trigger workflows in your CRM or in-app guides to accelerate product-led growth. With a unified analytics platform feeding these agents, we can move from reactive analytics to anticipatory product strategy.
Of course, leverage requires guardrails. I anchor adoption in three pillars: clear decision rights for agents (what they can autonomously act on vs. recommend), transparency in reasoning (so PMs can audit how conclusions were reached), and explicit alignment to key outcomes (activation, retention, expansion). Done right, this is not a replacement for product judgment—it’s an amplifier for it.
If I were rolling this out today, I’d set a success dashboard that tracks: time-to-insight, time-to-action, percentage of initiatives initiated by agents, impact on North Star metrics, and the reduction in manual analysis hours. I’d also implement lightweight prompts and playbooks—LLMs for product managers—that standardize how we ask better questions and interpret agent outputs.
The promise here is simple but profound: eliminate the last bottleneck by giving your teams a partner that never sleeps, never tires, and never loses the plot. When agents investigate, monitor, and act, we spend less time arguing about the data and more time building the right things, faster.
Inspired by this post on Amplitude – Best Practices.
Most product teams—and especially well-run product trios—know they should be interviewing customers. More teams than ever are actually doing it. That’s the good news.
The bad news? Many teams still struggle with what comes next. Turning raw recordings into a structured opportunity space that truly guides product discovery can feel overwhelming.
In my experience, interview synthesis is cognitively demanding work. You have to extract the key moments from each conversation, translate those moments into clear opportunities, and then organize those opportunities into a coherent view of your opportunity space. It’s no surprise I hear teams say, "We need to stop interviewing so we can catch up on what we’ve already learned." Too often, they pause—and never start again.
Recordings pile up. Maybe there are scattered notes. But nothing gets turned into an opportunity solution tree. The team hasn’t synthesized what they’ve learned, so the research isn’t actionable. That’s the gap I want to help close.
What if you could go from 3 interviews to a draft OST in minutes?
My AI goals are straightforward: 1) build tools that help you learn discovery and 2) build tools that help you do discovery. The learning tools are coming through on-demand courses. Today, I’m excited to share the first big step on the "do" side.
I’m excited to see an expanded partnership with Vistaly—the opportunity solution tree tool many of you already use—to bring AI-powered discovery tools directly into their platform.
Great synthesis happens in two steps: first, you synthesize each interview separately; then you synthesize across interviews. Most AI tools skip the first step and jump straight to cross-interview analysis—exactly how teams lose the nuance and context that make research actionable.
This approach does both. You upload three interviews for the same product outcome. The AI extracts the key moments and opportunities from each one separately. Then it synthesizes across those interviews and generates a first draft of your opportunity solution tree for you. Three interviews in. A draft OST out.
Here’s what this is—and what it isn’t. You’ve probably heard criticism of tools that promise "one-click opportunity solution trees." Those tools ask you to describe your market, click a button, and get a tree. The point of an opportunity solution tree is not to have one—it’s to synthesize what you’re learning from real customers so your team can align on the best path forward. A one-click tree built from made-up data is useless.
Turn interviews into insights in minutes with Vistaly. This hero screen invites you to enroll in beta and showcases an opportunity solution tree that maps outcomes to opportunities and actionable solutions.
This approach is fundamentally different. It starts with your real customer interviews. The AI does the heavy lifting of extracting key moments and opportunities from those conversations and organizing them into a draft opportunity solution tree. But it’s a draft—you review it, refine it, and reorganize it. You bring your judgment and context to the work.
My vision for AI-aided cross-interview synthesis is simple: AI identifies common opportunities across interviews, suggests a tree structure, and facilitates the team’s review. Historically, it’s been hard to give AI access to an opportunity solution tree in a way that preserves structure and context. The integration with Vistaly solves that problem by building this capability directly into the tool where your tree already lives.
In my own experiments using Claude, the AI surfaced opportunities I missed—and I caught things it missed. The highest-quality synthesis came from combining both perspectives. Research (see here and here) backs this up: Experts working with AI outperform both experts working alone and AI working alone. That’s the model we’re building toward—AI generates the draft, you bring the expertise.
I have mixed feelings about AI doing discovery work for us because there is real value in doing the synthesis yourself. But I also know that a draft OST you actually refine is better than a perfect process you never get to. This is about raising the floor—helping more teams get to a structured opportunity space, even if they aren’t doing every step manually.
We’re looking for a small group of alpha partners to help shape this product. To apply, sign up for a free Vistaly account and upload three customer interviews for the same outcome or product space.
We’ll select alpha partners from the applicants. We want a range of interview styles, experience levels, and product spaces. Selected partners will get access to the AI-powered synthesis tools and will work closely with the team to shape the product. Even if you aren’t selected for the alpha, your application puts you at the front of the line when we enter beta.
A few things to know as you apply: Your three interviews should be for the same outcome, goal, or product space, so the tool can generate a meaningful OST. You don’t need to be a Vistaly user today—the account is free. You don’t need to be an expert interviewer either; we’re looking for a range of experience levels, though we’re particularly interested in story-based customer interviews.
This is just the beginning. The vision is a full AI-powered discovery suite inside Vistaly—from interview analysis to complete interview snapshots to opportunity solution trees and beyond. We’ll learn alongside our alpha partners and share what we discover as we go.
If you’ve been looking to bridge the gap between your customer interviews and your opportunity space, this is your chance to help shape how that works. Apply for the alpha today.
Every week, product and data leaders ask me the same question: can AI agents truly shoulder enterprise analytics without sacrificing trust, governance, or speed? I’ve spent the past year putting agentic AI through its paces in real product workflows, and I’ve distilled what works into a practical, task-driven evaluation approach you can adopt immediately.
Learn how to evaluate AI analytics agents with a task-based framework across analytics tasks. See how Amplitude’s Global Agent scores.
When I say “enterprise analytics,” I’m talking about far more than chatty dashboards. The bar includes consistent metric definitions, privacy-by-design, RBAC and data governance, audit trails, low-latency decision support, and repeatable outcomes across retention analysis, funnels, cohorts, A/B testing, instrumentation planning, and anomaly detection—ideally within a unified analytics platform.
My task-based framework evaluates eight capability pillars I expect from an enterprise-ready Agent Analytics solution: task coverage and depth across common product analytics workflows; data fidelity and governance (lineage, access controls, PII handling); instruction-following and reasoning transparency; evaluation rigor and reliability (repeatability, error modes, regressions); security and compliance posture; latency and cost efficiency; integration into existing product strategy workflows (e.g., CRM integration, CI/CD-linked instrumentation, experiment platforms); and human-in-the-loop controls for approvals and guardrails.
Operationally, I define canonical tasks that reflect day-to-day product management: codify a North Star metric; perform retention analysis by cohort; generate and explain a funnel with drop-off drivers; recommend an event taxonomy and tracking plan; analyze an A/B test with minimum detectable effect (MDE) considerations; and propose a driver tree that maps inputs to outcomes. Each task comes with ground-truth datasets, acceptance criteria, and edge cases to stress the agent—an eval-driven development practice I’ve found indispensable.
I then score maturity across four levels. L0: a pure chat UI that summarizes existing charts. L1: a retrieval-first pipeline that grounds responses in your analytics catalog and metric store. L2: a tool-using agent that is schema-aware, can write safe SQL, and reconciles results to canonical definitions. L3: a governance-aware autonomous workflow that executes analytics tasks end-to-end with approvals, audit logs, feature flags, and rollback plans. Most teams discover they’re between L1 and L2; reaching L3 requires serious investment in data governance and eval automation.
Risk management is non-negotiable. I require strict data governance and privacy-by-design controls, including scoped credentials, PII redaction, policy-aware retrieval, and comprehensive observability (query traces, prompt/response logs, lineage). Feature flags and approval gates prevent unintended metric redefinitions. Red-teaming tasks expose prompt injection, schema drift, and hallucination failure modes before they hit production stakeholders.
Where do agents shine today? Rapid exploration, SQL generation from schema context, summarizing experimentation results, and turning natural-language questions into actionable charts. Where do they struggle? Ambiguous metric semantics, under-specified experiment designs, and edge-case-heavy analyses where ground truth depends on organizational nuance. The cure is disciplined product management: codify definitions, maintain a living analytics taxonomy, and continuously harden your eval suite.
In the context of product analytics stacks, Amplitude analytics is a common anchor for product teams, and many are evaluating “Amplitude’s Global Agent” to accelerate insight generation. In my framework, I look for how well it grounds to canonical metrics, handles retention and funnel tasks, explains trade-offs, and respects governance boundaries—before I consider expanded autonomy. I share the full task matrix and scoring rubric so you can replicate the assessment in your environment.
If you’re getting started, pick your top ten high-frequency analytics tasks and define crisp success metrics for each (accuracy, explainability, latency, and reusability). Build a small eval harness with golden datasets, assertions, and regression tests. Favor a retrieval-first pipeline tied to your taxonomy and metric store, add human-in-the-loop approvals for sensitive actions, then pilot with a cross-functional tiger team. Measure time-to-insight, analyst hours saved, and stakeholder trust—then iterate.
Enterprise analytics isn’t a single feature; it’s a system of definitions, workflows, and governance. With a task-based, eval-driven approach, agentic AI can become a reliable partner—not just a novel interface. If you’re evaluating options, apply this framework first, then expand scope as reliability and trust climb.
Inspired by this post on Amplitude – Best Practices.
I write from a place many product leaders know well—the moment when the data you need to make decisions simply doesn’t exist, and you have to build the capability from the ground up. That firsthand experience with gaps in analytics shaped how I think about product strategy, product discovery, and the relentless pursuit of product-market fit lessons.
In my work, I lean on continuous discovery to surface the most meaningful problems, then translate those insights into outcomes vs output OKRs that keep teams focused on impact. When we anchor roadmaps to real user behavior and business results, we avoid vanity metrics and create a durable plan that compounds learning over time.
Execution matters just as much as insight. I rely on rigorous A/B testing, clear minimum detectable effect (MDE) thresholds, and retention analysis to separate signal from noise. This discipline ensures that every iteration—whether it’s a small UX nudge or a bold bet—moves us closer to measurable value for customers and the business.
None of this works without empowered product teams. I build around product trios that partner tightly across design, engineering, and product, and I foster a product-led growth mindset so we earn activation, engagement, and expansion through the experience itself. The goal is to create a system where learning is fast, ownership is clear, and the user’s job-to-be-done stays front and center.
On the tooling side, I favor a unified analytics platform so insights are consistent from discovery to deployment. Whether I’m instrumenting funnels with Amplitude analytics or stitching together qualitative and quantitative inputs, the principle is the same: give teams trustworthy, real-time visibility so they can make better decisions, faster.
If you’re looking to operationalize these practices, you’ll find practical playbooks, decision frameworks, and real-world examples here—built for leaders who want clarity, speed, and confidence in how they discover, ship, and scale products.
Inspired by this post on Amplitude – Best Practices.
I hear the same refrain from product leadership peers everywhere: we’re overwhelmed. Shrinking headcount, constant AI disruption, economic uncertainty, and relentless context switching make it feel like we’re carrying two jobs—setting strategy while shielding our teams. I recently listened to an episode of All Things Product that zeroes in on what a real support system for product leaders looks like, and it resonated deeply with my day-to-day.
Want to listen to the conversation yourself? Find it on Spotify or Apple Podcasts.
Here’s the core tension I see (and felt early in my own leadership journey): product leaders tend to underinvest in themselves. We hold onto work because it feels faster, safer, or “just easier if I do it.” But that pattern quietly taxes strategy, slows learning, and caps team throughput. The hidden cost of “doing it all yourself” is real.
Early in my tenure leading product, I tried to keep every plate spinning—roadmap reviews, stakeholder prep, user research, executive updates—while protecting my team’s focus. I was busy and useful, but not maximally valuable. The turning point came when I started building a lightweight support stack: a few hours of executive assistant help each week, targeted research support for bet sizing, and a personal cadence with a leadership coach. The result wasn’t just more time; it was better time.
One provocative point that landed hard: product leaders rarely have executive assistants—and that’s a problem. If your calendar is your operating system, an EA is an extension of your leverage. Mine now handles scheduling, meeting hygiene, prep packets, and post-meeting artifacts. That shift moved me from “calendar triage” to “strategic curation.” It also reinforced a core principle: delegation is a leadership skill, not a weakness. When I delegate outcomes (not just tasks), my team learns, ownership grows, and we ship decisions faster.
Support for strategy work shouldn’t stop at the calendar. Research and data enable better bets. Lightweight research ops, access to product analytics, and brief synthesis sprints keep me anchored in evidence without drowning in artifacts. Paired with a strong community of practice, I get a steady stream of comparative patterns—how other leaders delegate, scope advisory boards, or run decision reviews—which short-circuits trial-and-error.
Coaches were framed as shortcuts for clarity, accountability, and skill-building—and I agree. A good coach compresses cycles, sharpens decision quality, and holds the mirror up when you drift into doer mode. Two quotes captured the mindset perfectly: “You are a pro athlete. It makes sense to think about how you scale your impact without adding more to your calendar.” — Petra Wille. “As you get busier, it becomes more important to focus on the value only you can bring.” — Teresa Torres.
There’s also a helpful nudge to let go of perfectionism: “80% done by someone else is 100% awesome.” — Dan Martell (quoted). In practice, that means I accept great drafts from others, then add the 10–20% only I can contribute—context, narrative, and the sharp edges of the decision.
What about AI? The conversation hits a practical middle ground I share: use AI where it compounds leverage—meeting summaries, research synthesis starters, doc outlines, and backlog triage. But keep humans where judgment, alignment, and context truly matter—strategy framing, stakeholder management, and the final decision-making loops. In other words, apply an AI Strategy that respects product leadership’s uniquely human work.
Key themes I took away: why product leaders struggle to scale themselves; the true cost of “doing it all yourself”; why not having executive assistants limits impact; delegation as a core leadership capability; how to identify and protect the work only you can uniquely do; using research and data to inform strategy; coaches as accelerators for clarity and accountability; communities of practice as a force multiplier; adopting a “professional athlete” mindset; when AI helps—and when humans still matter; and the liberating mantra that “80% done by someone else is 100% awesome.”
If you’re wondering where to begin, start small and practical. Audit your time: what work truly requires you? Experiment with small amounts of support (even a few hours a week). Delegate outcomes, not just tasks. Keep the hands-on work you love—but be intentional. Use peers, coaches, and communities to learn how others delegate. Don’t wait until burnout to build your support system.
Resources mentioned if you want to go deeper: Follow Teresa Torres: https://ProductTalk.org. Follow Petra Wille: https://Petra-Wille.com. Petra’s Coaching for Product Leaders: https://www.petra-wille.com/coaching-packages. Dan Martell’s book Buy Back Your Time: https://www.buybackyourtime.com.
I’m curious: what’s one outcome you’ll delegate this week, and what support would make it stick? Share your thoughts in the comments—your playbook might be exactly what another product leader needs right now.
I keep a simple mantra front and center: Figma is not the source of truth. The customer is. In practice, that means the only thing that truly counts is what we ship, how it performs, and whether users come back for more. Mockups are hypotheses; production usage is evidence. When my teams adopt this lens, velocity improves, judgment sharpens, and quality rises where it matters most.
So what does design actually do in a software company? At its best, design builds leverage for the whole system—engineering, product, and marketing—by clarifying problems, raising the quality bar, and making complex decisions legible. The standard I hold is ancient and still essential: products must be useful, usable, and desirable — and above all, used. When we calibrate around “used,” debates about pixels give way to outcomes, and cross-functional partners feel the difference.
I often trace the roots of our craft back well beyond the digital era. The lineage from industrial design to software is real; constraints, ergonomics, affordances, and systems thinking didn’t start with screens. If you’ve ever mapped delight, performance, and reliability in a Kano Model, you’ve touched this lineage. The translation to software is simple: design the full journey, not just the interface—prioritize what improves time-to-value, reduces cognitive load, and earns habitual use.
One lesson I’ve learned the hard way: why design leaders who stop designing stop leading. I still sketch flows, write UX copy, and prototype when it unblocks the team or sets a decisive quality bar. The altitude changes constantly—one hour I’m in a strategic roadmap review, the next I’m in a critique or poking at a prototype. Great design leaders jump up and down in altitude to connect vision to details without becoming a bottleneck.
Over time, I’ve come to rely on four pillars every design manager must master: craft (raising taste and execution), product strategy (clarifying choices and trade-offs), people leadership (coaching, feedback, and hiring), and systems (processes, rituals, and design ops that scale). Neglect any one of these and either quality, speed, or team health will eventually falter.
Perfectionism is a double-edged sword. Over-indexing on quality can paralyze decision-making, but lowering the bar indiscriminately is worse. I’ve seen moments where relaxing standards to “go faster” actually cost the business—rework piled up, trust eroded, and customer value stalled. The answer is principled delegation: I define what “must be true” at each milestone, delegate ownership with clear guardrails, and reserve my veto power for moments where product integrity is genuinely at risk.
Measuring success as a design leader starts with outcomes vs output OKRs. I care about activation, retention, time-to-first-value, NPS verbatims tied to key journeys, and the operational metrics that earn the right to build the next thing. Design output is visible; design outcomes are durable. When trade-offs are needed, I optimize for the smallest shippable surface that still proves the core value proposition, then expand with data.
Scaling judgment is the multiplier. I build it through pattern matching—studying enduring product systems from companies like Airbnb, Amazon, Apple, Asana, Notion, Stripe, Nest, and others—to distinguish where polish compels usage versus where it’s ornamental. Strong opinions matter, but so does being easy to convince with new evidence. I encourage designers to articulate the pattern they’re invoking, why it fits the job-to-be-done, and how we’ll know it worked.
Operating cadence matters. My week is anchored around recruiting, crits, and staff meetings that actually make decisions. In critiques, I use the Do/Try/Consider framework to give actionable direction without micromanaging. On one-on-ones, the question isn’t “Should one-on-ones exist?” but “What are they for right now?”—coaching, performance, or clearing execution blockers. If a meeting doesn’t increase clarity or commitment, it gets redesigned or removed.
Execution-wise, I’ve taken inspiration from Rippling’s operating system—especially its emphasis on speed, precise ownership, and hard commitments. The lesson is timeless: go fast on the right things, make clear promises, and instrument your work so you can see reality quickly. When speed is paired with crisp decision rights and observable outcomes, momentum compounds rather than frays trust.
Hiring your first design leader? Look for someone who can set standards, scale judgment, and ship. They should be able to zoom from company narrative to interaction copy in a single afternoon, coach product trios, and build rituals that make taste and trade-offs explicit. Above all, they should have a point of view on where quality moves the business and where speed is the quality.
Here’s how my team’s approach differs from many: Figma is not the source of truth. We design in Figma, but we learn from production. We pair designers with engineering early, prototype in code when it reduces risk, and wire telemetry into every critical path. Product trios use discovery to validate “useful, usable, desirable — and used,” then commit to outcomes with clear, testable definitions of success. The result is faster iteration, fewer surprises, and experiences customers actually adopt.
The throughline is simple and demanding: design for reality, not for the board. Keep your standards where they create business value, scale judgment with explicit patterns, and instrument everything so learning never stops. When teams embrace that, the work gets better, customers feel it, and the roadmap starts to pull you forward.
“You don’t have to trust the algorithm; you can see exactly why a conversation earned the score it did.”
We recently shared how we redesigned CX Score to deliver deeper, more actionable insights across every conversation. The most common follow-up from support leaders was simpler and incredibly important: “Can I trust it?” It’s the right question—and it’s the one I use as my own bar for whether a metric is ready for the C‑suite.
CS teams are the subject matter experts on customer experience. They understand the nuance of what customers feel, the context behind every interaction, and the difference between a technically resolved issue and a genuinely satisfied customer. I’ve learned, conversation by conversation, that any metric we ship has to capture that nuance at scale—or it doesn’t deserve to be used.
We built CX Score to give support teams a complete view of how their customers feel across every conversation. It surfaces what’s working, what’s not, and why—so leaders can communicate impact clearly and drive change across support, product, and the wider business.
A CX Score in action: repeated CSV export failures trigger a low score and customer frustration, while the AI agent clarifies next steps and gathers details—turning raw signals into actionable support insights.
Here’s exactly how I approached building a trustworthy metric that support leaders can inspect, explain, and defend.
1) It’s grounded in how support teams define quality. I started with how experienced support professionals actually evaluate conversations—collecting real examples of strong, mixed, and poor interactions across industries, identifying the specific factors that shape overall experience, and writing plain-English rules for each. The result: CX Score applies the same criteria a trained support professional would use, not generic LLM assumptions.
2) It’s aligned with human judgment. We created a dataset of thousands of real customer conversations spanning multiple industries, languages, channels, and agent types. Each was manually reviewed by experienced support professionals—with two reviewers per conversation where possible and disagreement resolution to create stable consensus labels. The result: CX Score is trained and tested to behave like an expert reviewer, not a language model making broad guesses.
A modern CX analytics view shows how conversations flow from chat, email, and mobile into AI assistance, then to resolutions and sentiment outcomes—turning messy support data into a single, defensible CX Score.
3) It’s engineered by AI specialists. CX Score isn’t a prompt attached to an LLM. It’s a production system built by Intercom’s AI Group: 37 ML scientists and 350 engineers whose full-time focus is AI for customer service. The system includes specialized handling for long transcripts, model configuration tailored for support language and subtle sentiment, prompt engineering designed to default to neutral when evidence is weak, and a multi-stage evaluation pipeline that checks for precision, consistency, and reliability. The result: A metric built by a team that understands LLM behavior in production support environments, where accuracy and consistency matter most.
4) It’s validated statistically, not qualitatively. Trust requires measurement, not vibes. We tested CX Score across standard ML metrics: Precision (when the model flags a negative experience, how often do humans agree?), recall (how many human-identified issues does it catch?), and F1 score (the balance between both). We set an explicit bar: F1 above 0.8, representing high agreement with human judgment. We reran these evaluations through every revision, checking for regressions or biases, and I focused especially on negative experiences, because a false negative hides a real problem. The result: CX Score meets a measurable standard before it ships—not a gut check, a statistical requirement.
5) It was battle-tested with real customers. Lab accuracy isn’t enough. Customer environments are messy: Varied ticket types, mixed languages, unpredictable edge cases. Before release, we ran a multi-phase field test—shadow-scoring conversations with both old and new models, validating sensible behavior across agent type and conversation length, then rolling out to a controlled customer group who confirmed the scores felt right, reasons were clear, and insights were actionable. The result: CX Score shipped because real teams told us it made sense in practice, not because it passed internal tests.
From conversation to clarity: this visual maps the drivers behind a CX Score. Explore how policy feedback, answer quality, and effort combine to produce defendable insights support leaders can act on.
The importance of explainability. One of the most critical choices I made was ensuring CX Score isn’t a black box. Every score comes with clear reasons, concrete excerpts, and a short explanation of what influenced the rating. This turns the metric into something you can inspect, audit, and explain to executives. You don’t have to trust the algorithm. You can see exactly why a conversation earned the score it did.
A metric that evolves with your business. Customer expectations shift. Products change. AI improves. A trustworthy metric can’t be static. CX Score evolves with the same commitments that shaped its redesign: Evaluate the real signals that shape customer experience, keep the logic simple and interpretable, and ensure leaders can make clear decisions from it. It’s built to be a durable source of truth across every conversation.
The takeaway. In a world where products look the same and AI can generate any interaction, customer experience is one of the few differentiators that actually matters. Support leaders have built that expertise conversation by conversation. What they’ve lacked is a measurement system that could validate it at scale—one that’s reliable enough to report to the C-suite, explainable enough to defend in strategy meetings, and rigorous enough to drive real decisions. That’s what CX Score is designed to be: A metric that reflects the reality support leaders see every day, backed by the technical rigor to make it credible everywhere else.
Want to see CX Score in your workspace? Ask your admin to enable it for your team, and start using explainable AI insights to improve customer experience and coach with confidence.
I’ve been exploring what I call the next level of vibe coding: orchestrating agentic AI to build complex product artifacts in minutes, not days. The breakthrough comes from ditching linear handoffs and embracing true parallelism—letting specialized agents tackle the work simultaneously while I steer the orchestration. In product management contexts where speed and clarity matter, this shift changes everything.
Building a KPI Driver Tree in two hours becomes possible when you stop building sequentially and start building with parallel agents.
For product leaders, a KPI Driver Tree is the fastest way to make strategy legible. It ties high-level outcomes to the levers we can actually pull—features, channels, pricing, onboarding, activation, and retention mechanics—so we can prioritize with confidence. Done well, it connects outcomes vs output OKRs, clarifies measurement, and aligns the team around a shared, testable model of growth.
Here’s how I operationalize it with agentic AI and AI workflows. I spin up a small team of specialized parallel agents: a Metrics Librarian (taxonomy and definitions), a Data Modeler (event and table design), a Research Synthesizer (voice of customer and causal hypotheses), a UX Prototyper (visualizing the tree and flows), and a QA/Evaluator (logic and consistency checks). An Orchestrator coordinates these agents, resolves conflicts, and composes outputs into a single, production-ready artifact—while I set constraints, review deltas, and decide.
In a typical two-hour sprint, all agents run at once. While the Metrics Librarian finalizes the KPI ontology, the Data Modeler validates instrumentable events and joins, and the UX Prototyper renders an interactive driver tree for a unified analytics platform. Meanwhile, the Synthesizer maps qualitative insights to quantitative levers, and the Evaluator stress-tests assumptions. Because we’re not waiting for sequential handoffs, we converge on a coherent driver tree and its initial measurement plan in one pass.
The payoff isn’t just speed—it’s higher-quality decisions. Parallel agents reduce context loss, expose trade-offs earlier, and allow me to compare multiple viable paths side-by-side. This accelerates continuous discovery, aligns with product strategy, and gives product managers and LLMs for product managers a clear, living map of how inputs roll up to outcomes. It’s the closest I’ve found to running a product trio at machine speed.
Guardrails matter. I pair this approach with strong data governance, privacy-by-design, and eval-driven development so every agent’s output is testable and auditable. Clear prompts, scoped corpora, and consistent acceptance criteria keep the Orchestrator honest, while lightweight Agent Analytics helps me see where reasoning falters and where to improve the system.
If your team is still tackling analytics artifacts sequentially—requirements, then instrumentation, then visualization—consider switching mental models. Treat the driver tree as the backbone, empower parallel agents to co-create around it, and reserve human judgment for the critical calls. This is vibe coding for product management: creative, fast, and grounded in measurable outcomes.
AI adoption is everywhere. I see more teams every quarter moving from pilots to production—and increasing their budgets accordingly. But the gap between “using AI” and truly transforming with it is widening fast. Launching an AI Agent is easy; building a mature, AI-powered support operation is where the real work—and the real value—lives.
In the new research, the "2026 Customer Service Transformation Report," the difference comes down to depth of deployment. It’s not enough to dabble. Teams that design their operations around AI are pulling away from those who treat AI like a bolt-on feature.
This article kicks off part one of my five-part deep dive into the research. I’ll unpack the data, share what I’ve learned leading product and AI strategy, and translate it into practical steps you can apply now. If you’d like to go straight to the source, you can download the report here.
First, the macro picture: 2,470 global support professionals across industries were surveyed to understand current AI usage, challenges, and the 2026 opportunities. The headline is clear—AI investment is now table stakes. Eighty-two percent of senior leaders say their teams invested in AI in the past year and 87% say they plan to invest in 2026. Those investments are already paying off: Over three-quarters of CS teams (77%) say AI is meeting or exceeding expectations, delivering faster response and resolution times, always-on coverage, cost savings, increased capacity, and multilingual support that scales globally.
And yet, only 10% of organizations say they have reached a "mature" level of deployment, where AI is fully integrated into operations and working at scale. That’s the tell: most teams are skimming the surface and leaving meaningful performance gains on the table.
Most service teams are still early in AI adoption. Only 10% report mature deployment, while 26% are scaling, 35% are in initial rollout, and 26% remain in exploration, with 3% unsure.
When I map the data to what I’ve seen in the field, the maturity difference shows up immediately in outcomes. Teams at mature deployment don’t just automate repetitive tasks; they build AI into critical workflows, give it real responsibility, and iterate continuously. Beyond automating the bulk of their manual work, they’re using AI to proactively engage customers and perform tasks on their behalf.
The results follow. Of the teams that have reached mature deployment, 43% report higher quality and consistency across support—nearly double the rate of those still in the initial deployment stage. That quality shift is how support evolves from a cost center to a value driver. Great experiences don’t just prevent churn; they create advocacy and become a reason customers choose you. The more you trust your AI Agent with meaningful work, the more it creates the conditions for higher-quality, more consistent support.
One example I point to often: Lightspeed. They operate a complex product across regions and languages, with tens of thousands of monthly requests. When they adopted Fin in early 2023, they needed a solution that could scale with that complexity—and they treated the transition like a first-class change program.
They leveraged foundational training and built custom, in-house modules aligned to their processes. They supported their team post-launch and worked closely with leadership to align on the goals and benefits of AI. In a large, distributed org, that executive alignment created ownership and momentum. Their VP of Information Systems, Yamine Gluchow, put it perfectly: "It’s not magic. If you invest in understanding, adoption, and great content, AI performance takes off."
Mature AI Agent rollouts deliver bigger gains in customer service—outperforming initial deployments in automation, proactive engagement, and task completion (63% vs 52%, 51% vs 41%, 45% vs 28%)—showing how depth drives measurable impact.
Their outcomes reflect that depth: An 88% involvement rate. 72% of Fin conversations resolved without human intervention. 43,000+ customer requests resolved monthly. Service in 12+ languages across 100+ countries. Stable CSAT—with improvement in some markets.
What impressed me most was the complexity Fin now resolves. A merchant in France asked about tax invoices—normally a long phone call to check back-end data and explain rules step by step. Instead, Fin handled the conversation in French, provided an accurate end-to-end explanation, and earned positive CSAT. That’s what mature deployment looks like: a system that absorbs complexity and delivers correct, efficient results at scale.
So how do we build toward that level of maturity? In my experience, this journey requires a mindset shift and operational rigor—not just a bigger AI budget.
Rethink how you approach support. If you were building from scratch today, you’d design around AI from day one. As Grant Lee, CEO of Gamma, puts it: "If you want to unlock the real value of AI, you have to design for it, not retrofit around it." Treat AI as infrastructure, not a feature. That shift impacts your org design, workflows, and what “good” looks like.
Leaders are racing ahead with real AI in support. Explore the 2026 Customer Service Transformation Report to see where deployment is stalling, benchmark your team, and get practical steps to scale automation that delights.
Secure executive sponsorship early. You won’t scale without C-suite backing. AI reshapes how support works, how teams are structured, how performance is measured, and how cost and value flow. Align your CFO on ROI, your CCO on journey design, and your CEO on customer experience as a strategic advantage. Early wins are great—but the compounding gains only come when leadership backs AI as infrastructure, not a one-off cost save.
Assign clear ownership for AI performance. One common failure mode: no one owns the AI. Stand up an AI operations lead or support ops specialist to review resolution trends and handoffs, tune content and configuration, coordinate on systemic issues, and drive a prioritized improvement roadmap. Without this role, feedback loops break and performance plateaus.
Treat content as critical infrastructure. Your AI Agent is only as good as the knowledge it can access. Ensure coverage for the topics it must handle, keep information accurate and current, and structure content so it’s easy for AI to consume. Make maintenance part of BAU, not a quarterly fire drill. A clean, governed, retrieval-first pipeline dramatically increases autonomous resolution.
Build a continuous improvement system. AI performance isn’t static. Train your AI Agent by expanding its knowledge, refining behavior, and connecting new data sources to handle more scenarios autonomously. Validate changes against real scenarios before they ship. Roll out updates in a controlled way across channels and segments. Use performance data to find patterns—frequent handoffs, low-resolution topics—and decide what to improve next. I often point to the Fin Flywheel (Train → Test → Deploy → Analyze) as a practical example of turning performance data into action.
The big takeaway from the "2026 Customer Service Transformation Report" is encouraging: investment is widespread, and early returns are real. The bigger opportunity is to turn those early wins into durable transformation. Teams leaning into AI as infrastructure—supported by executive alignment, clear ownership, strong content, and a continuous improvement loop—are already separating from the pack.
Next up in this series, I’ll dig into how leading teams measure success. Beyond simple cost savings, mature deployments tie AI to clear ROI and strategic impact—shifting more work into value-adding, revenue-generating territory. Follow along here, or subscribe on LinkedIn to get the next installment in your feed.
I care about meetings only insofar as they create momentum and outcomes. What if your meetings could actually produce the artifacts you need—specs, tickets, slides—before the call even ends?
I recently listened to an episode of Just Now Possible where Teresa Torres talks with Mark Barbir (CEO) and Sanden Gocka (Co-Founder), the co-founders of Earmark, about building a productivity suite that turns unstructured conversations into finished work in real time. As a product leader, this premise hits the sweet spot of agentic AI, real-time AI workflows, and ruthless focus on outcomes over output.
Listen to this episode on: Spotify | Apple Podcasts
Unlike generic AI notetakers that produce summaries nobody reads, Earmark runs multiple agents in parallel during your meetings—translating engineering jargon, drafting product specs, even spinning up prototypes in Cursor or V0 while you're still talking. That’s the bar I want from AI in the room: finished work, not notes.
What impressed me most was the clarity of their pivot. They moved from an Apple Vision Pro presentation coaching tool to a web-based meeting assistant. I’ve made similar calls: when the distribution path and daily workflow are obvious, you follow the user’s gravity. This shift unlocked a broader surface area—PMs, engineers, design partners—and made agentic workflows useful where work actually happens.
They also turned a technical constraint into a commercial advantage. Their ephemeral (no-storage) architecture became a feature for enterprise sales. I’ve seen this repeatedly in AI risk management: privacy-by-design and clear data governance reduce friction with security reviewers and accelerate procurement. For many enterprises, “we don’t store your data” is the win condition.
Cost discipline was another standout. They tackled the hard problem of making real-time AI affordable—from $70 per meeting down to under a dollar through prompt caching. That’s not just optimization; it’s product strategy. Choices like model selection, context window management, and retrieval-first pipeline design determine whether a feature can scale to every meeting or remains a demo.
On capability design, the team leaned into templates and simulated stakeholders to ship value fast. Template-based agents: Engineering Translator, Make Me Look Smart, Acronym Explainer. Personas that simulate absent team members (security architect, legal, accessibility). This is exactly how I frame early AI workflows: remove friction for the product trio, anticipate blockers, and let the agent do the tedious, error-prone first pass.
They were refreshingly pragmatic about models. Why GPT 4.1 still beats newer models for prose quality in their use case is a reminder that “best” is contextual. When the job-to-be-done is precise prose and production-grade artifacts, consistent quality trumps leaderboard buzz. Of course, they also invest in guardrails to ensure quality and manage hallucinations—another non-negotiable for enterprise adoption.
Search and analysis across time is where many AI products stumble. They explained the limits of vector search for analysis questions across meetings and how they’re building agentic search with multiple retrieval tools (RAG, BM25, metadata queries, bespoke summaries). I couldn’t agree more: analysis requires reasoning over structure, time, and purpose—not just semantic proximity. Layered retrieval with stateful agents beats a single embedding call.
They also articulated a crisp user thesis: design for product managers as the extreme user to solve for everyone. In my experience, if you satisfy the PM’s bar for clarity, traceability, and actionability, engineers, designers, and go-to-market teams benefit immediately. That’s how you earn daily active use, not once-a-week novelty.
For builders curious about the stack and comparables, they discuss services and tools like Assembly AI for speech-to-text, OpenAI API with prompt caching support, and build integrations with Cursor and V0 by Vercel. They also reference Granola as a comparison point and nod to ProductPlan, where both founders previously worked. If you want to try the product, here’s Earmark—a productivity suite where the work completes itself.
If you're a PM drowning in follow-up work or a builder curious about real-time AI architectures, this conversation offers a detailed look at what it takes to ship an AI product that people can't imagine working without. Personally, I see this as a credible path toward an AI chief of staff—their vision goes beyond automating deliverables to orchestrating judgment, compliance signals, and cross-functional readiness.
The episode covers the founder backstory, what Earmark does, comparisons to competitors, unique features, templates and personas, technical decisions, early versions and challenges, optimizing transcript summarization, managing multiple tools and costs, challenges with context and reasoning models, innovative search and retrieval techniques, creating actionable artifacts from meetings, ensuring quality and managing hallucinations, and the future vision for an AI chief of staff. It’s a full-spectrum look at building with agentic AI, not just talking about it.
Podcast transcripts are only available to paid subscribers.
Executive function, for me, is the art and discipline of building systems that make high-quality decisions without my constant involvement. The real unlock isn’t personal heroics; it’s institutionalizing judgment. When I do my job well, teams move faster, ambiguity shrinks, and the organization compounds learning even when I’m not in the room.
Operating simultaneously at 30,000 feet and ground level is the defining muscle of executive leadership. I deliberately switch altitudes. At 30,000 feet, I obsess over strategy, architecture, and resourcing. On the ground, I validate core assumptions with firsthand data, listen for weak signals, and spot process cracks before they widen. Altitude changes are not random; they’re triggered by variance from plan, critical customer moments, or leading indicators that deviate from expected ranges.
The leap from frontline manager to manager of managers is where many rising leaders stall. As a manager of managers, my primary value shifts from personal execution to system design. I move from answering questions to installing mechanisms that ensure questions get answered well by others. This includes clear decision rights, shared metrics, and repeatable, lightweight rituals that scale across teams.
What is an executive actually accountable for? Outcomes over output, talent density, and the clarity of the operating system. That means defining strategy, aligning resources, creating a cadence of review that exposes truth, and ensuring incentives reward the behaviors we want. My barometer: if I step away, do priorities hold, do metrics behave as expected, and do tradeoffs land where I would have landed?
Knowing when to dive deep versus when to step back is a craft. I dive deep when risks are existential, when metrics have no credible owner, or when narrative and numbers diverge. I step back when leaders demonstrate consistent judgment, metrics sit inside control limits, and learnings are documented. The principle I return to again and again: context is everything. Senior leaders operate on context, not control.
To scale judgment, I teach people how I think. I externalize my mental models: how I construct decision trees, how I stress-test assumptions, and how I weigh time horizons. I rely heavily on driver trees for metrics because they force causal clarity. If we can’t map how a top-line goal decomposes into controllable levers, we’re managing by hope, not design.
Creating a shared language across the business is a force multiplier. I standardize definitions for our core metrics, codify what “good” looks like, and make it easy to repeat the system. We align around outcomes versus output, and we use cadences like MBRs and QBRs to unify narrative and numbers. Shared language makes decisions legible across functions and reduces rework.
My COO playbook emphasizes owning the full customer experience end to end. When marketing rolls up under a COO in certain stages, the upside is coherence: one narrative from awareness to activation to expansion, one set of metrics, one growth engine. The point isn’t org charts; it’s removing seams customers can feel.
Demanding and supportive is not a contradiction. I set ambitious, unambiguous bars and back them with coaching, resourcing, and fast feedback. The combination builds trust: expectations are clear, and help is immediate. I expect leaders to bring problems paired with proposed solutions and to escalate early, not perfectly.
Inside my executive interview process, I’m assessing altitude agility, operating cadence, and taste in metrics. I use structured interviews and live case workshops to see how candidates frame ambiguous problems, build driver trees, and prioritize tradeoffs. The best prompts are simple and revealing: design the operating system for a 3x scale scenario; diagnose a broken funnel with incomplete data; align two teams with conflicting incentives. The workshop prompts that reveal everything surface thinking speed, humility, and the instinct to make context legible.
The common thread in failed executive hires is a mismatch between the company’s operating system and the leader’s default mode. Some leaders can’t stop doing the work themselves. Others stay too abstract and never build mechanisms. I look for demonstrated ability to change systems, not just run them—leaders who can both author and evolve the playbook.
On metrics, I practice the driver tree philosophy. I begin with the North Star, decompose it into controllable levers, instrument each node, and assign single-threaded owners. We design review cadences where deviations trigger targeted diagnostics, not thrash. Each tree has documented assumptions, data sources, and thresholds that prompt action. This is how teams learn to anticipate, not react.
High-functioning executive teams are visibly collaborative. We clarify decision rights, disagree and commit quickly, and conduct post-decisions to harvest learnings without blame. My favorite litmus test is simple: can 30 people operate as one team when it matters? When we get this right, information flows, execution accelerates, and customers feel consistency.
One of the most counterintuitive leadership lessons is working yourself out of a job. If the system cannot run without you, you have a key-man risk, not a leadership strength. I aim to build successors, codify judgment, and design mechanisms that make good decisions the default state. That’s how you create durable, compounding advantage.
And the review feedback you can’t unhear? Mine was brutally honest: my bar was high, but my mechanisms were implicit. Once I wrote them down—how I decide, what I expect, where I dive deep—the organization moved faster, and I actually became less central. If there’s a throughline to extraordinary leadership, it’s this: make your judgment teachable and your systems inevitable.