Customer experience is where strategy, data, and execution converge—and where AI can deliver compounding value when thoughtfully designed. In my work, I’ve seen how the right CX vision becomes a growth engine when it’s operationalized through clear measures, robust analytics, and disciplined product practices.
"Amanda Sime is the Customer Experience Strategy Lead at Amplitude. She shapes CX strategy and partners across orgs to design and scale AI-powered solutions." That concise description captures a model I deeply respect: start with a strong CX strategy, then partner across the organization to make AI real in the day-to-day. It’s not just about new technology; it’s about aligning teams, systems, and incentives to deliver consistent customer value.
Translating that approach into practice requires a rigorous AI Strategy, anchored in measurable outcomes and informed by behavioral analytics. I prioritize journey mapping to expose friction, then connect those insights to AI workflows that enhance customer success and in-product guidance. When cross-functional partners—from solutions engineering to support—operate from a shared driver tree, the roadmap balances speed with sustainability.
Data is the backbone. A unified analytics platform—often centered on Amplitude analytics—helps teams move beyond vanity metrics to track user activation, feature adoption, and retention analysis with precision. With that foundation, we can test responsibly, iterate quickly, and validate impact with product-led growth motions that scale across segments without sacrificing quality.
Operational excellence matters just as much as vision. I’ve learned to treat CX programs like enduring products: build reliable feedback loops, connect customer support AI strategy to clear service-level outcomes, and empower product management leadership to make evidence-based tradeoffs. When teams have clarity on the problem space and access to trustworthy insights, they deliver solutions that feel both intelligent and human.
The real win is cultural: empowering product trios and partner teams to co-own outcomes, not just outputs. That’s how AI moves from a promising experiment to a durable capability—by aligning strategy, analytics, and execution so customers experience value at every touchpoint.
Inspired by this post on Amplitude – Perspectives.
I followed the energy at Fin Labs Paris and immediately zeroed in on the announcement of Monitors. In my view, it’s the missing piece that turns Fin’s powerful automation into an observable, trustworthy system—sitting alongside Insights and Recommendations to form a complete observability suite that gives teams confidence in what Fin is doing.
With Monitors, you define what conversations get reviewed, both Fin and human, and set evaluation criteria using Custom Scorecards. That level of control ensures you’re measuring the metrics that matter most to your business and holding support quality to your bar, not a generic one.
Used in concert with Insights and Recommendations, you can finally see what’s happening across your support operation, evaluate every conversation against your standards, and take targeted action to continuously move toward perfect customer experiences.
As Agents become more powerful, transparency and control become critical. I’ve seen this shift firsthand: AI is advancing fast, and the stakes are no longer theoretical—Agents are resolving real customer issues with real consequences at scale.
Visualizing the AI development flywheel—Train, Test, Deploy, Analyze—this graphic spotlights Analyze in orange to introduce Monitors, turning opaque model behavior into measurable signals and continuous customer service insights.
Fin has almost 8,000 customers, averages a 67% resolution rate, and resolves close to 2 million customer queries every single week, including highly complex queries in regulated industries.
At that scale, observability isn’t a nice-to-have; it’s a necessity. Traditional CSAT and small QA samples weren’t built for Agent-led operations—they miss edge cases, don’t scale, and can’t explain drift. The result is a black box. What teams need most right now is confidence, built on data you can trust and act on.
At Intercom, this is called the Fin Flywheel: Train, Test, Deploy, Analyze.
See inside Intercom's Monitors: a streamlined dashboard with pass‑rate charts and review queues, alongside a panel to define a 'Vulnerable customers' monitor, test it on sample chats, and run continuous checks.
Analyze is the step where you find out what’s actually happening and it’s where improvement begins.
In my experience, achieving confidence in an AI support operation requires three things: (1) a complete understanding of what Fin, your human team, and your customers are talking about; (2) a way to monitor and score conversations based on the criteria that matter most to your business; and (3) AI-powered recommendations that make it easy to act on what you find. Intercom launched Insights and Recommendations to address the first and third. Now, Monitors completes the system for full observability and opens the black box.
Monitors: know whether every conversation met your standards. Customer sentiment is important, but it’s different from determining whether a conversation was handled correctly. With Monitors, you can do both—and do it at scale.
Customer support leaders praise Monitors for turning AI performance from a black box into measurable signals. This quote from Ineke Oates of Agorapulse highlights the shift from manual spot checks to continuous quality tracking.
Monitors is a new QA capability that delivers a structured, repeatable way to define which conversations get reviewed and evaluate them against quality criteria you set. It replaces ad-hoc sampling and spreadsheet-driven QA with a system that scales as your volume grows.
Two components work together: Monitors define what gets reviewed and Custom Scorecards define how each conversation is evaluated. That pairing brings the rigor of Agent Analytics and the discipline of eval-driven development to everyday CX operations.
Random sampling has always been a blunt tool. When AI is handling thousands of conversations a week, a small, arbitrary slice won’t reliably capture your highest-risk edge cases, your most complex escalations, or where quality is starting to drift. I’ve felt that pain in operations reviews—too many unknowns, not enough signal.
Open the AI black box with Monitors: track conversations, triage unreviewed items, and build transparent scorecards with criteria like accuracy, process adherence, and efficiency to lift customer support quality.
With Monitors, you select and evaluate conversations with intent. You can target specific signals of risk or failure, like “the customer showed signs of financial vulnerability” or “Fin looped around with the same answer without resolving the issue.” Or you can create consistent, repeatable samples to benchmark quality over time. Use the existing library of filters (customer data, channel, Fin-specific metrics) or describe nuanced scenarios in natural language. Most teams will do both: hone in on the conversations that matter most and maintain a steady, structured QA sample each week.
"When I saw Monitors, my first reaction was — this is exactly what we need. The ability to track quality continuously, instead of relying on spot checks, is a big shift for us." Ineke Oates, Head of Support, Agorapulse
Custom Scorecards make your standards explicit and enforceable. One-size-fits-all rubrics never reflect your brand voice, industry constraints, or customer expectations. With Custom Scorecards, you define what “good” looks like for your business and turn that into a measurable, comparable quality score for every conversation.
A customer testimonial underscores the promise of Monitors: bring quality assurance into the flow of work, unifying AI assistant Fin and human agents in a single place for faster, clearer customer support.
You define the criteria that matters, how each should be measured, and how important each one is. Some criteria can be scored automatically by AI, others reviewed by a human, or both — all within the same scorecard. This means you’re not choosing between scale and judgment; you get both in one system.
Each conversation is then evaluated against these criteria, and the system calculates an overall quality score based on your configuration. You can weigh what matters most, or mark certain criteria as critical, so a single failure can fail the entire evaluation when needed.
The result is a single, consistent quality score that reflects your standards—not a generic metric, and not a collection of disconnected checks. That’s what makes quality measurable over time and comparable across AI and human support.
Monitors helps open the AI black box by turning model outputs into trackable reviews. This clean queue groups customers, monitor types, scores, and actions—with AI auto-review—so teams improve quality faster.
There’s an important distinction here: CX Score tells you how customers felt about a conversation. Custom Scorecards tell you whether it met your standards. You need both.
"We looked at dedicated QA tools, but what's compelling about Monitors is that it lives where our conversations already happen. We don't need another system — we can run QA across Fin and our human team in one place." Jared Ellis, Senior Director, Global Product Support, Culture Amp
When a conversation meets your criteria for review, Monitors routes it into a Review Queue. Each conversation is assigned to the right reviewer with its scorecard attached and status tracked end to end: Not reviewed, Reviewed, Needs a fix, Fix complete. Reviewers work directly in Intercom, capture what went wrong, and propose concrete fixes—like updating documentation or refining a workflow—so quality loops end in action, not just scores.
Monitors turn AI performance from opaque to measurable. The Fin quality view summarizes review score, pass rate, and review counts while a time‑series chart tracks escalation ease, clarification, and efficiency—delivering fast, actionable CX insights.
Reporting turns QA into a continuous signal rather than a one-off audit. You can track review scores over time across Monitors and Scorecards, and compare them directly to CX Score, resolution rate, and other performance metrics. Patterns that were previously invisible become clear: a topic consistently underperforming, a quality dip correlated with a recent knowledge base change, or a team whose scores are improving week over week. This is observability applied to CX—evidence you can act on.
Monitors for Fin conversations is live today, and the roadmap goes further. Human agent QA will bring the same structured evaluation to your human team’s conversations, creating one consistent quality system across your entire support operation.
Real-time alerts will notify you the moment a conversation crosses a threshold you’ve defined—before the issue reaches more customers and risks compounding negative sentiment.
Kick off your journey with the #1 Agent—an AI partner designed to turn resolutions into real outcomes. Tap “Start a free trial” to explore faster, smarter customer service and see how Fin delivers value from day one.
Knowledge base evaluation will connect AI scoring directly to your content so conversations are assessed against your latest policies and documentation, catching inaccurate or outdated responses and providing clear rationale linked to the relevant source.
Creating perfect customer experience with AI requires transparency. You need to understand how the system is performing if you want to maintain and improve quality over time. With Insights, Monitors, and Recommendations, this is now possible—a complete analysis suite that lets you see what’s happening across every conversation, ensure it meets your standards, and pinpoint improvement opportunities when they matter most.
I’ve long advocated for a retrieval-first, eval-driven approach to AI Strategy because it makes risk visible and manageable. Monitors operationalizes that philosophy for CX leaders: you get continuous signal, shared definitions of quality, and a direct path from flags to fixes. If you’re scaling AI support, this is how you replace uncertainty with control—and turn the black box into a competitive advantage.
Are you an AI product manager or want to become one? This guide cuts through the noise and shows where the PM role is really heading with AI.
I’ve spent the last few years scaling AI initiatives across complex SaaS products, and I’ve learned that “AI product manager” isn’t a vanity title—it’s a capability set. The role evolves traditional product management with new responsibilities across data, model behavior, risk, and continuous learning systems. My goal here is to demystify what matters, so you can lead with clarity, build with confidence, and deliver measurable outcomes.
First, let’s separate hype from reality. An effective AI Strategy starts with the customer problem, not the model. I anchor roadmaps around clear use cases, then evaluate whether we need a retrieval-first pipeline, agentic AI, or conventional automation. “Build vs buy” is no longer a procurement question; it’s a lifecycle question about iteration speed, quality control, data governance, and long-term unit economics.
Discovery also looks different. I still run continuous discovery and customer interviews, but I augment them with behavioral analytics and targeted experiments to validate feasibility, risk, and value. I practice privacy-by-design and AI risk management from day one, and I define guardrails for acceptable model behavior alongside success metrics. When high stakes are involved, I document data provenance and align with regulatory compliance standards to protect customers and the business.
Execution shifts from shipping static features to operating learning systems. In product roadmapping and sprint planning, I account for context window management, prompt engineering, and the realities of LLMs for product managers: latency, cost, drift, and failure modes. I use feature flags, A/B testing, and eval-driven development to move from offline model evals to online impact with a minimum detectable effect (MDE) worth the release risk. Observability, anomaly detection, and incident management aren’t optional—they’re how we earn trust.
Collaboration expands beyond engineering and design. I work closely with data science on evaluation frameworks, with solutions engineering to de-risk complex enterprise deployments, and with customer success to close the loop on model performance in the wild. Our outcomes vs output OKRs emphasize activation, time-to-value, and sustained retention over vanity accuracy metrics.
Tooling is now strategic advantage. My AI product toolbox includes prompt libraries with versioning, synthetic data generation where appropriate, and a disciplined approach to model and prompt regression tests. I standardize AI workflows—intake, evaluation, deployment, and monitoring—so teams can ship faster without cutting corners. This is how empowered product teams scale safely.
Career-wise, I look for—and coach—PMs who can frame trade-offs crisply: explain when to fine-tune vs use retrieval, when to embed agents, and when not to use AI at all. Show me driver trees that connect model metrics to business outcomes, a clear risk register, and a plan for continuous discovery. If you can tell a compelling story backed by transparent evaluation and customer value, you’re already ahead.
Here’s the bottom line: the “AI product manager” that matters in 2026 is a product leader who can turn uncertainty into systematized learning. If you focus on real customer problems, rigorous evaluation, responsible design, and iterative delivery, you won’t just carry the title—you’ll create durable competitive differentiation.
I’ve watched too many AI agent deployments celebrate velocity while overlooking the one thing that determines long-term success: whether real users are actually getting value. Dashboards tend to spotlight model upgrades, prompt tweaks, and launch counts, yet they rarely quantify task completion, trust, or time-to-value. That blind spot isn’t technical—it’s human.
Enterprises are spending 93% of their AI budget building agents and almost none know if those agents are actually working for users. Pendo Agent Analytics closes the gap.
In my product reviews, I look for evidence that agentic AI is improving outcomes across the customer journey, not just the demo path. Without behavioral analytics and observability, teams optimize for throughput instead of resolution, for novelty instead of reliability. This is where eval-driven development, A/B testing, and rigorous cohort analysis become non-negotiable: they translate agent performance into user impact we can measure and improve.
Here’s the pattern that works for me: define user-centric success metrics first, then let the AI follow. I prioritize signals like successful task completion, low-friction activation, reduced escalations, and sentiment lift—tied directly to product-led growth indicators such as retention and expansion. When these metrics move in the right direction, I know the agent is creating compounding value, not just answering faster.
Practically, I operationalize this with an analytics spine that captures end-to-end agent interactions: intents, prompts, responses, clarifying turns, handoffs, and final outcomes. I segment by persona, journey stage, and account tier to uncover where agents delight and where they degrade trust. With this foundation, I can run controlled experiments, spot anomalies early, and connect improvements in agent behavior to improvements in business performance.
Pendo Agent Analytics closes the loop by making these user outcomes visible and actionable. Instead of guessing whether an agent helped or hindered, I can analyze where users stall, which prompts or skills drive completion, and how interventions like in-app guides or product tours change behavior. That visibility lets me tune models and experiences in days, not quarters—and gives stakeholders confidence that our AI investments are paying off for customers.
If you’re scaling agents today, start small but instrument deeply: map top user intents, define offline and online evals, A/B test prompts and policies, monitor regressions, and tie every improvement to activation, adoption, and retention. The result is a durable feedback loop that keeps agents aligned with user value as your surface area grows.
AI agents are not a destination—they’re a capability. When we anchor that capability to clear user outcomes and measure it with the right analytics, we stop flying blind and start compounding advantage. That’s how we turn promising demos into dependable products.
For years, I’ve watched product, growth, and data teams burn cycles stitching together manual dashboards and reports, then slogging through replay review just to validate a hunch. That overhead slows discovery and delays decisions. The promise here is different: "Discover how Amplitude AI Agents help product, growth, and data teams turn questions into action without manual dashboards, reports, or replay review." As someone obsessed with decision velocity and evidence-based product strategy, that shift is exactly what I’ve been waiting for.
In practice, I think about "Amplitude AI Agents" as always-on data analysts embedded in our workflow. Instead of queuing requests or context-switching into tooling, I can ask targeted questions, get synthesized insights, and move directly to action. This is a powerful example of agentic AI meeting behavioral analytics in a unified analytics platform—removing friction between inquiry and impact while keeping teams focused on outcomes, not artifacts.
What changes for my day-to-day? I can interrogate customer behavior in real time, pressure-test hypotheses from discovery interviews, and quickly understand whether activation, retention, or monetization is the current constraint. If I’m probing a driver tree for activation or a retention analysis for a specific cohort, I can get to a decision faster—without waiting on someone to build a bespoke dashboard. That means more cycles spent shaping product strategy and fewer sunk into report wrangling.
This matters beyond speed. When product, growth, and data leaders anchor discussions in the same source of truth, we shorten the distance from signal to decision. That alignment is the backbone of product-led growth and continuous discovery: shared context, faster feedback loops, and clearer trade-offs. It also reduces the long tail of analytics debt—those one-off reports and stale views that quietly accumulate across teams.
Of course, adopting any AI workflow in analytics demands governance. I hold these systems to the same bar I set for my teams: clarity of assumptions, consistent metric definitions, and auditable reasoning. Pairing "Amplitude analytics" with strong data governance, CI/CD for analytics definitions, and lightweight evals helps ensure the recommendations we act on are reliable, reproducible, and explainable. AI should accelerate our judgment, not replace it.
The strategic shift is simple and profound: move from building dashboards to making decisions. With always-on analysis, we can spend less time instrumenting analytics theater and more time delivering customer value. That is how we translate insights into impact—and why I’m excited to operationalize this capability across our product trios and go-to-market partners.
Inspired by this post on Amplitude – Best Practices.
Every week, I coach product and documentation teams on a simple truth I keep pinned above my desk: "AI is reading your documentation! Learn tips from the Amplitude docs team about how to structure your documentation for both human and AI audiences." That line captures the shift we’re all living through—our docs must now serve customers, support engineers, and increasingly, LLMs powering chat, search, and in‑product help.
My AI strategy for documentation starts with intent. I map the core questions users ask at activation, onboarding, escalation, and renewal, then shape information architecture to reduce ambiguity. This helps humans find answers faster and helps LLMs retrieve the right chunks with higher precision—a win for UX writing, product-led growth, and support deflection.
Structure beats style when AI is in the loop. I rely on semantic headings (H1–H3), consistent slugs, stable anchors, and one‑topic pages that can stand alone. Short paragraphs, scannable summaries, and canonical references reduce duplication and improve retrieval quality. Treat docs-as-code with CI/CD so changes are reviewed, versioned, and shipped reliably—documentation deserves the same rigor as product releases.
Chunking matters for LLMs. I design content for context window management: one concept per section, tight procedures with numbered steps, and FAQs that mirror real queries. Glossaries define canonical terms and accepted synonyms so retrieval-first pipelines match user language without fragmenting meaning. Error messages and parameter names appear verbatim to strengthen search and grounding.
Metadata is a multiplier. I add clear titles, descriptions, last‑updated dates, product area tags, and audience labels (admin, developer, analyst) to boost SEO and machine readability. Stable IDs for components, examples, and API objects improve deep linking and evaluation. Where appropriate, I include structured examples that align with prompt engineering best practices so AI assistants can extract inputs, outputs, and constraints cleanly.
Quality is measured, not hoped for. I pair content audit checklists with analytics to see what’s searched, where users pogo‑stick, and which articles drive successful task completion. Tools like Amplitude analytics reveal gaps and dead‑ends, while lightweight evals (answer accuracy, grounding rate, latency) ensure LLMs retrieve the right doc chunks at the right time.
Consistency is a feature. I standardize terminology across UI, API, and docs, and I avoid synonym sprawl that confuses both readers and LLMs. Page intros state the job-to-be-done; conclusions link to adjacent tasks; and deprecation notes are explicit with forward paths. This coherence lowers cognitive load and improves both RAG performance and human trust.
Governance keeps it scalable. I assign owners per section, define SLAs for updates, and automate checks for broken links, orphaned pages, and outdated screenshots. Redirect rules avoid 404s, and version banners prevent LLMs from mixing deprecated guidance into current answers—small details that cumulatively protect customer experience.
If you’re just getting started, begin with three moves: clarify intents, restructure pages into atomic, linkable units, and add metadata that reflects how customers actually search. From there, tighten your retrieval-first pipeline and run regular evals. The payoff is durable: faster time to value for users, lower support load, and AI assistants that answer accurately, confidently, and consistently.
Inspired by this post on Amplitude – Perspectives.
Great products aren’t just shipped; they’re understood. In my product management practice, the difference between a good release and a great one often comes down to disciplined documentation that moves at the speed of delivery. That’s why the docs-as-code approach has become a cornerstone of how I build, lead, and measure product experiences across teams.
As I reflect on leaders who set a high bar in this craft, one description stands out: "With years of experience as Senior Documentation Manager, Jeff leads teams and oversees the end-to-end creation of documentation using docs-as-code methodology." That concise statement captures a model I deeply respect—one that treats documentation as a first-class citizen in the product lifecycle.
In practice, docs-as-code integrates documentation into CI/CD pipelines, version control, and peer review workflows—exactly how we ship software. This elevates quality, enforces consistency, and accelerates responsiveness to change, all while enabling rigorous content audit and UX writing standards. When documentation evolves with code, it becomes discoverable, testable, and measurable—key traits for scalable product management leadership.
The downstream impact is tangible. Users ramp faster through onboarding, in-app guides, and product tours because the narrative aligns with the product’s true state at any given commit. Support tickets drop, developers work with greater clarity, and PMs gain the feedback loops needed for continuous discovery. In a product-led growth motion, this clarity compounds—reducing time-to-value and enabling teams to ship confidently.
Equally important is the leadership pattern behind the methodology: aligning product, engineering, and customer-facing teams around shared truths. I’ve seen empowered product teams operate at their best when documentation is embedded in planning, sprint reviews, and release gates. This creates a single source of truth that scales knowledge, preserves intent, and shortens the path from decision to delivery.
For me, the standard expressed above isn’t just a role description—it’s a blueprint for operational excellence. When we manage documentation with the same rigor as code, we build trust at every touchpoint and create the conditions for sustained product velocity. That’s the level of clarity and execution I strive to foster across every product line.
Inspired by this post on Amplitude – Perspectives.
I remember the exact moment our product crossed the threshold from scripted automation to truly agentic AI. The excitement was real—so was the pit in my stomach when our dashboards went dark. Our trusted analytics and observability stack, which had served us flawlessly for traditional software, suddenly couldn’t explain what the agent was doing, why it made certain choices, or how to reproduce outcomes across runs.
"The moment our product became a AI agent, our entire observability stack became irrelevant—not something you want as an analytics company. Here's what we did."
Why does this happen? Agentic AI doesn’t behave like conventional apps. Instead of deterministic flows and neatly tagged events, we face non-deterministic trajectories, tool-use chains, evolving prompts, context window dynamics, and policy guardrails that influence outcomes in real time. Clicks and pageviews give way to tokens, tool calls, and conversation turns. Without purpose-built observability, you can’t do credible product discovery, measure behavioral analytics, or run eval-driven development with confidence.
That’s why we built Agent Analytics. We needed a unified lens to trace every step of an AI workflow—from user intent to model prompts, function calls, retrievals, tool outputs, and final responses—while capturing latency, cost, guardrail hits, fallbacks, and outcome tags. We instrumented runs end-to-end, added experiment support for prompt engineering and policy variants, and wired in evaluations so we could turn subjective quality into objective signals the team could act on.
The impact on product management was immediate. We shortened iteration cycles by making failure states obvious and reproducible, turned ambiguous feedback into structured data, and gave engineers and designers a shared source of truth for conversation design and AI workflows. With visibility into containment, escalation, autonomy ratio, and step-level success, we could ship confidently, rollback safely, and align roadmap bets to measurable outcomes—not anecdotes.
Building this capability demanded more than logging. We invested in data governance and privacy-by-design to mask sensitive content while preserving semantic context, and we separated human-identifiable data from model telemetry. We treated prompts and policies like code—versioned, diffable, and safely rolled out behind feature flags and CI/CD—so we could experiment without risking regressions in production.
What should every team measure? Start with outcome quality (task success, resolution, containment), reliability (tool success rate, guardrail triggers, fallbacks), performance (time-to-first-token, total latency, step-level latency), and efficiency (tokens and cost per successful task). Add groundedness checks for retrieval steps, regression evals for core journeys, and post-release anomaly detection to catch drift before users do. These metrics become your operating system for agent performance and your compass for product strategy.
If you’re building or scaling AI agents, you need Agent Analytics before you hit your first incident. It’s the difference between guessing and knowing—between reactive firefighting and proactive iteration. With the right observability, your team can move faster, manage risk intelligently, and translate agent behavior into business outcomes that compound over time.
Inspired by this post on Amplitude – Best Practices.
I’ve spent my career building and scaling product platforms, and I’ve seen firsthand how the right AI Strategy can unlock disproportionate impact. Foundational AI platforms are the engine room of modern analytics—when they’re done well, they compress time-to-insight, improve quality, and empower empowered product teams to deliver outcomes that matter.
Across leading analytics ecosystems, including Amplitude analytics, the winning pattern is consistent: invest in a unified analytics platform that abstracts complexity while enabling rapid iteration. By standardizing data governance and privacy-by-design, teams gain the freedom to experiment confidently without sacrificing compliance or security.
For me, “foundational AI platforms” means pragmatic building blocks that product and engineering can trust: evaluation harnesses for models, retrieval pipelines that surface the right context, feature stores that ensure consistency, and CI/CD with robust observability. When these AI workflows are in place, behavioral analytics, anomaly detection, and A/B testing stop being one-off projects and become repeatable capabilities.
The payoff isn’t just efficiency—it’s strategic differentiation. Internal innovation accelerates when teams can go from idea to live experiment in days, not quarters. That speed shapes the future of AI analytics: richer insights woven directly into product experiences, LLMs for product managers to prototype faster, and analytics that feel conversational, contextual, and deeply actionable.
Execution still makes or breaks the vision. I align product strategy around outcomes vs output OKRs, pair product trios with forward-deployed engineers, and use a clear build vs buy rubric for platform components. The goal is platform scalability without reinventing the wheel—own the parts that differentiate, integrate the rest, and keep your interfaces painfully simple.
If you’re leading this journey, start by mapping your critical use cases to platform capabilities, close gaps in data governance, and stand up an eval-driven development loop. Within one or two quarters, you should see a measurable lift in deployment frequency, a sharper signal on performance, and a culture that ships with confidence. That’s how foundational AI platforms empower internal innovation and help define the future of AI analytics.
Inspired by this post on Amplitude – Best Practices.
I keep a running list of product wisdom that sounds great on a slide but quietly sabotages execution. Recently, I revisited that list after a deep conversation with a seasoned CPO from a leading security and compliance platform and reflected on how these lessons show up in my own operating rhythm. What follows is my practical playbook for scaling product organizations without losing speed, quality, or the soul of the product.
Most big-tech veterans struggle when they leap into startups because the safety net of process disappears. At a startup, the buck truly stops with you—there’s no committee to shield a decision and no process to rescue a weak plan. The mindset shift is simple to say and hard to do: own outcomes end to end, reduce your reliance on institutional scaffolding, and make decisions with incomplete information while keeping standards high.
“Great product leaders stay in the details.” I sample artifacts every week—PRDs, design flows, user research notes, postmortems—and I read customer threads to calibrate my intuition. To maintain shipping velocity as headcount grows, I instrument a few critical indicators (deployment frequency, change failure rate) and favor outcomes over output. Data guides my attention; it never replaces judgment.
As teams scale, I use a blunt rule to keep speed high: small autonomous teams, small batch sizes, short feedback loops. One clear owner, one prioritized backlog, and weekly demos to customers. We ship thin slices, not big bangs. And “Great CPOs should avoid comfort metrics”—the easy dashboards that rise when nothing meaningful is moving. I push for outcome-centric OKRs tied to customer value, not vanity charts.
Rigid hierarchies derail quality decision-making. They slow signal, encourage escalation theater, and suppress the truth from the edges. I shorten paths between PMs, engineers, designers, research, and go-to-market leads, and I strip out stage gates that don’t add learning. Above all, I refuse to “Stop making your team fetch rocks”—randomized executive requests without context. Instead, I frame clear problem statements, explicit constraints, and observable success criteria.
Revenue and product can feel at odds, but they don’t have to be. The key to a quality CPO and CRO relationship is a shared operating model: one customer narrative, a joint pipeline of problems worth solving, and a common scorecard. We meet weekly, review the same signals, and align on sequencing: what we solve now for impact, what we stage for scale, and what we sunset to reduce complexity. When trade-offs get tough, we anchor on customer value and long-term defensibility.
Who ultimately oversees the quality bar? I do—and I do it through clarity, exemplars, and consistent feedback loops, not micromanagement. When I leave feedback, I make it actionable and specific: name the user scenario, note the friction, propose a sharper decision frame, and suggest a smaller, testable slice. I expect narrative memos and crisp acceptance criteria; I offer rapid, detailed responses so momentum never stalls.
Open office hours are my forcing function for transparency and speed. Anyone can bring a thorny escalation, a design in progress, or a customer insight. Pair that with weekly 1:1s—non-negotiable for developing leaders and unblocking work—and the organization learns to surface issues early, make faster decisions, and self-correct without drama.
Here’s a glimpse into my working week: Mondays set priorities and confirm the few decisions that matter; midweek is for deep reviews across roadmap, research, and engineering readiness; Thursdays I’m with customers and partners; Fridays I write and synthesize. I leave space for unscripted time with individual contributors—because ICs are the unsung heroes of a company—and I celebrate excellent craft out loud.
The hardest leadership skill is knowing when to push and when to give space. I push on clarity, sequencing, and quality; I give space on solutions and implementation paths. I reject comfort metrics, reinforce outcomes vs. output, and keep the organization close to customers and details. If you’re stepping from big tech into a startup or scaling your product org through rapid growth, these practices will help you ship faster, decide better, and raise the quality bar without burning out your team.
“Outcomes over outputs” is the right mantra—and one I’ve championed across product teams—but turning it into daily practice is where most teams stumble.
It’s simple in theory: focus on the impact of what we build, not just shipping features. In reality, it’s rarely black and white because most teams are asked to do both—hit outcomes and deliver specific outputs—at the same time.
In a benchmark survey, 20% of product teams claim to be outcome-focused, nearly half describe themselves as working in a mix of outcomes and outputs, and about 30% are still primarily working with outputs. I’ve seen versions of this in my own org: we aspire to outcomes, but our rituals, roadmaps, and reporting still reward shipping.
Here’s how I draw the line clearly, coach my teams to avoid common traps, and negotiate better, more actionable outcomes that unlock genuine product discovery and business results.
Simple definitions we live by
An output is something you build or produce—a feature, a project, an initiative. It’s something your team ships.
An outcome is the impact of that output—a change in customer behavior or a business result.
Josh Seiden puts it well in his book Outcomes Over Output: “An outcome is a change in human behavior that drives business results.”
Shift from shipping to shaping results. This graphic clarifies outputs vs outcomes, revealing that value emerges between deliverables and impact—when features change customer behavior and move business results.
I distinguish business outcomes from product outcomes. Business outcomes are typically financial metrics that measure the health of the business (e.g. increase revenue or reduce costs) while product outcomes measure a customer behavior in the product or a sentiment about the product.
Here’s a simple example I’ve used with platform teams. Many B2B companies support a number of integrations. Integrations are outputs. Having integrations alone doesn’t create value. Customers using and finding value in those integrations—that’s an outcome. If those customers retain their subscriptions longer because of the integrations—that’s also an outcome.
Building something isn’t the same as creating value. That’s the core of this distinction, and it’s what separates empowered product teams from feature factories.
Why this distinction matters for empowered product teams
When we task teams with delivering outputs, they’re done when the software ships. When we task teams with delivering outcomes, they aren’t done until the software ships and has the expected impact.
That small shift changes almost everything about how a team works: what we measure (impact, not just delivery), how we know we’re done (measurable behavior change, not release notes), the autonomy we grant (told what to achieve, not what to build), and the planning artifacts we use (an opportunity solution tree beats a feature roadmap when we’re exploring the best path to an outcome).
When I assign outcomes, I’m giving the team latitude—and responsibility—to figure out the best path to success. That’s what opens the door for real product discovery and continuous discovery habits.
Shift your lens from shipping features to achieving impact. This side-by-side visual explains how outcome-driven teams measure success, grant more autonomy, define 'done' by results, and plan with an opportunity solution tree.
Examples: spotting outputs disguised as outcomes
Clear-cut example: “Our outcome is to deliver an Android app.” An Android app is something we build and ship. It’s clearly an output.
To get to an outcome, I ask, “What’s the value of having an Android app?” or “How will we know the Android app is successful?”
We might answer: “Having an Android app will allow us to engage more users. We’ll know it’s successful when people engage with the app on a regular basis.”
This answer uncovers the hidden outcome: engage more people. Now we can set the right scope: increase the percentage of engaged users across any platform; increase the percentage of engaged mobile users; or increase the percentage of engaged Android users.
Any of these outcomes gives us more room to explore than a fixed output. Maybe we don’t need a native app at all. We could deliver the same engagement through a mobile web experience, notifications, or email. And we’re not done when we ship—we’re done when the right people are actually engaged.
Tricky example 1: measure the value creation moment (hires, not applicants)
Move beyond shipping features to the impact that matters. This visual maps the path from build an Android app to the real goal, increase engaged users, by asking why, defining value, and owning results.
When setting outcomes, it’s tempting to choose the easiest-to-measure metric. But a good outcome measures the customer’s value creation moment.
I worked at a company that helped new college grads find their first job. When I started working there, the primary outcome was “increase job applications.” This technically is an outcome—it measures a specific behavior in the product.
But it doesn’t measure the value creation moment. A job seeker doesn’t get value when they apply for a job. They only get value when they get the job. Similarly, employers don’t get value from any job applicant, they get value when the right job applicant applies.
Many job boards try to measure qualified applicants—instead of counting any applicant, they compare the credentials of the applicant to the job description and only count qualified applicants. This is better. But it still doesn’t measure the value creation moment. Both the job seeker and the employer get value when an open job is successfully filled. The right metric is hires.
Yes, “hires” can be hard to instrument because it happens off-platform and incentives misalign. Measure it anyway, even with proxies. The easy metric isn’t always the right outcome.
Tricky example 2: measure impact, not user-generated output (the course reviews trap)
I worked with a team that helped students choose university courses. They set their outcome as: “Increase the number of course reviews on our platform.”
Confusing activity with impact? This visual breaks down four common outcome traps—measuring at the wrong moment, mistaking outputs, chasing adoption, and relying on sentiment—so teams focus on real value.
Sounds like an outcome, right? It’s a metric. You can measure it. It’s an action users take on the site—writing a review. But it’s actually an output in disguise.
Reviews are valuable when they help a student evaluate a course. They don’t create any value if a student never sees them. More reviews aren’t always better, especially if they’re clustered where nobody looks.
A better outcome is “Increase the number of course views that include reviews.” Now we’re measuring impact on the decision moment, not just the production of content.
If you can hit your metric without helping customers, you’re tracking an output, not an outcome.
Tricky example 3: measure success, not just adoption (the traction metric trap)
“Increase the percentage of users who viewed the performance report.”
This looks like a good outcome. It measures a specific behavior in the product. It’s within the team’s control. But it’s what I call a traction metric—it measures adoption of a single feature, not value to the customer.
Why teams get trapped in shipping features: a vicious trust cycle fuels micromanagement, while performance-linked outcomes push safe targets. Break the loop and refocus on customer outcomes that truly move the needle.
Two problems arise. First, people can view the report and still not find what they need. Second, we might have perfectly happy customers who don’t need the report at all. Driving usage of an unneeded feature wastes time and erodes trust.
Measure the value creation moment, not just feature adoption.
Tricky example 4: pair sentiment with behavior
I define a product outcome as a metric that measures either 1. a specific behavior in the product or 2. a sentiment about the product. But sentiment metrics—like CSAT or NPS—can be tricky on their own.
Sentiment metrics are outcomes, but they aren’t directional. They don’t tell us where to explore or set guardrails for what to avoid. So I pair a behavior with a sentiment, for example: “Increase engagement without negatively impacting satisfaction.” I use sentiment as a counterweight.
Facebook and Instagram illustrate why this matters. Meta is exceptional at driving engagement—but to a fault. Many of us don’t like these addictive products. Pairing engagement with a satisfaction guardrail prevents “engagement at all costs.”
Why getting this right is hard (and how I counter it)
Ready to move from shipping features to creating impact? This visual playbook shares five practical moves—translate metrics, partner with teams, iterate, avoid traps, and dig deeper—to turn outputs into measurable outcomes.
The trust cycle. Managers don’t trust that teams can reach outcomes on their own. So managers micromanage the outputs. Teams, in turn, don’t communicate their progress toward outcomes—they communicate their progress on features. This reinforces the manager’s belief that they need to stay involved in the details. It’s a vicious cycle.
I break it by asking teams to show their work—share assumptions, research, opportunity solution trees, and evidence behind choices—and by giving feedback on the thinking, not just the solutions.
The accountability trap. When performance reviews are tied to hitting outcomes, teams play it safe. They sandbag their targets. They disguise outputs as outcomes to guarantee “success.”
I treat outcomes as learning opportunities first. When we start on a new outcome, I set a learning goal—“learn what moves the needle on this metric”—before a performance goal—“increase X by Y%.” This creates space to explore without fear.
How I get teams started with better outcomes
Translate business outcomes to product outcomes. Business outcomes like revenue, retention, and market share are lagging indicators—by the time you see them, it’s too late to act. Product outcomes measure behavior changes within the product that lead to those business results. They’re leading indicators within the team’s control.
Negotiate outcomes with your team. Outcome-setting should be a two-way conversation. Leadership brings the cross-company context. The team brings customer insight and technical realities. Neither side dictates; we co-own the target and the constraints.
Stop celebrating shipped features and start celebrating change. This visual contrasts a feature factory mindset with a true product team, urging teams to track impact, not output, and define success by outcomes.
Expect to iterate on your metrics. Your first outcome metric probably won’t be right. That’s normal. Sonja at tails.com went through four iterations—from 90-day retention to 30-day to 5-day to behavior-based metrics—before landing on something actionable. Thomas at Bluestone Analytics iterated three or four times before finding the right metric. Iteration is the work.
Watch for common mistakes. Outputs disguised as outcomes. Traction metrics masquerading as product outcomes. Sentiment metrics without direction. Business outcomes assigned directly to product teams without translating to behavior change.
Use the right artifacts. Replace feature roadmaps with an opportunity solution tree to explore multiple paths, test assumptions, and sequence bets explicitly against a clear outcome.
Align OKRs with outcomes. If your company uses OKRs, make sure the “KR”s are true product outcomes (behavior change and value creation), not a list of features to ship.
The bottom line
When we shift from an output-first mindset to an outcome-first mindset, it doesn’t mean that outputs stop mattering. Product teams will always ship features, and the ability to do so quickly and with quality still matters. This shift simply ensures those features achieve the intended impact. We aren’t done when we ship—we’re done when what we shipped has the intended impact.
Measure success by the impact of what you ship and you’ll build a product team that learns, adapts, and creates real value. Measure success by what you ship and you’ll get a feature factory.
Quick self-check: is your “outcome” really an outcome?
Ask yourself: 1) Does it measure a behavior change or a sentiment tied to value creation? 2) Could we hit it without helping customers? 3) Is it adoption of a single feature (a traction metric) or a result that customers and the business care about? 4) Do we have a counter-metric to prevent unintended harm? If you stumble on any of these, refine it before you commit.
The world can feel like it’s spinning, and as a product leader, I feel that pressure acutely—juggling customer needs, stakeholder expectations, and the relentless news cycle. I recently listened to a powerful conversation with Teresa Torres and Petra Wille about staying grounded when everything feels “bonkers,” and it offered a practical, human way to keep showing up without losing yourself.
What resonated most was the invitation to live my values through small, consistent actions. Rather than waiting for grand gestures or perfect solutions, I’m leaning into the mindset of “Something is better than nothing.” It’s the same spirit we bring to continuous improvement in product: make a change, evaluate impact, iterate.
“Create the world you want to live in” has become a daily prompt for me. I’m applying it to how I spend my attention, time, and platform—three scarce resources for any product management leader. I’m not going to do everything perfectly, but I can make better trade-offs this week than I did last week, and I can keep improving.
Practically, that looks like reconsidering which speaking invites I accept, especially when representation is skewed. If a stage is heavily male, I now ask organizers about their plan for balance before committing. I also question travel expectations for short talks when a high-quality virtual experience is possible—good for sustainability, budgets, and energy. These choices compound, just like product roadmapping and sprint planning decisions.
Petra’s “under-complexity” lens was a wake-up call. In product, oversimplified narratives—whether a single KPI, a vanity metric, or a forced binary—usually increase fear and bad decisions. The same is true in civic discourse. To counter that, I’m seeking more nuance on purpose: reading multiple sources on the same story, listening for who’s not in the room, and noticing how the same facts can carry different meanings depending on who’s telling it.
One simple habit helps: I’ll read The New York Times and The Wall Street Journal on a headline, then follow up with Tangle by Isaac Saul, which lays out “what the left says / what the right says / editor’s take,” sometimes including perspectives from affected communities. It’s a lightweight form of personal knowledge management that improves my product judgment and my citizenship.
Another idea that stuck with me is swapping media proxies for human connection. In product, we don’t ship based on secondhand opinions—we run customer interviews, co-create with users, and build empowered product teams. The same principle applies in community: talk to someone directly affected, ask real questions, and stay curious. When conversations get heated, I try to build bridges, reduce proxies, and look people in the eye.
I’m also reflecting on platform responsibility. Even a “small” platform can snowball through weak ties inside a company or community. I’m asking: When should I speak up? Where should I draw lines? And when is “staying in your lane” actually a way to avoid necessary leadership? These are the same stakeholder management questions we navigate in product strategy—assess impact, clarify intent, and act with integrity.
Local grounding matters, too. I’ve found energy and clarity in community-level action: voting, attending public protests when it feels right, mentoring, and supporting nonprofits like World Pulse. I love the framing of “don’t mess with my neighbors”—it keeps me focused on tangible care when the internet starts to feel like reality. I’ve also seen leaders use angel investing in agriculture-related efforts as a counterbalance to “internet reality,” channeling resources into durable, real-world outcomes.
If you want to experiment this week, pick one small lever you control: where you spend money, time, attention, or your platform. Add nuance by reading at least two different perspectives before reacting. Replace proxies with people by talking to someone with lived experience. Reduce polarization by asking, “what shaped that view?” before judging it. And go local—connect with neighbors or a community group and let small actions compound.
If you’d like to hear the full conversation that inspired these reflections, you can listen on Spotify or Apple Podcasts. Here are the direct links: Spotify: https://open.spotify.com/episode/1sxEFquu73ZB9fL9gGk6Om and Apple Podcasts: https://podcasts.apple.com/kh/podcast/staying-sane/id1794203808?i=1000755696295
Resources I’m exploring and recommend: World Pulse (https://www.worldpulse.org/), The New York Times (https://www.nytimes.com/), The Wall Street Journal (https://www.wsj.com/), and Tangle by Isaac Saul (https://www.readtangle.com/ and https://www.readtangle.com/author/isaac-saul/). For builders and writers, I also appreciate Ghost (https://ghost.org/) as an open-source publishing platform. If you work in or with the MENA ecosystem, take a look at MENA Product Summit ’26 (https://www.prdkt.plus/summit26). Colleagues like Jeff Merrell (https://jeffdmerrell.com/) and grassroots efforts such as No Kings Protest (https://www.nokings.org/) offer additional perspectives and ways to get involved.
If this resonates, share it with a teammate who’s been feeling the weight of the world. I’d love to hear one small, values-aligned action you’re taking this month—what “something” will you try next?