Tag: outcomes vs output OKRs

From 70 Employees to Dominance: My Playbook for Hypergrowth, Focus, and Top-Down Goals

Scaling a real-world marketplace from scrappy to dominant takes a different kind of product leadership. Reflecting on Christopher Payne’s decade leading DoorDash as President and COO — growing from roughly 70 employees to the dominant food delivery platform in the US — I’m struck by how much of that success hinged on mastering an atoms-based business while still operating with software-level rigor. As a VP of Product Management, I see the same patterns in my own work: relentless clarity on inputs, a bias for builder-executives, and a cadence that keeps leaders close to product details without becoming bottlenecks.

Running an atoms-based business versus a pure software company forces you to obsess over operational physics: unit economics, quality control, on-time reliability, and dense local liquidity. It’s precisely where traditional “bits” executives can stumble. What’s worked for me is a simple “plate spinning” framework for executive attention: identify the five or six plates that must never stop — customer experience, marketplace health, quality and safety, product velocity, platform reliability, and P&L — then schedule recurring deep dives to keep those plates spinning. If a plate wobbles, I drop in, fix the root cause, re-instrument the inputs, and zoom back out.

Hiring at hypergrowth speed only works when you bias toward a “builder mentality.” I look for executives who run toward fuzzy problems, write clearly, and can prove they’ve shipped value with incomplete information. Prior industry experience can be a liability when you’re reinventing the market; first-principles thinkers outlearn domain experts who try to port yesterday’s playbooks. In executive hiring, I’ve found structured work samples and narrative memos far more predictive than marathon interview loops — companies routinely spend too much time on job interviews and too little time evaluating how candidates think and execute.

Great executives never outgrow the details. Staying close doesn’t mean micromanaging — it means sampling the customer journey and instrumenting the system so you can feel where it hurts. In my own practice, I rotate through frontline touchpoints weekly: support transcripts, NPS verbatims, failed checkout sessions, and reliability dashboards. Small signals often reveal systemic issues. A single ciabatta bread moment — the kind of edge-case substitution that seems trivial — can expose broken handoffs, unclear policies, and misaligned incentives across the marketplace.

Top-down goal setting beats bottom-up when you’re aiming for category leadership. Bottom-up targets tend to regress to comfort; they calibrate to today’s constraints, not tomorrow’s possibilities. I set ambitious, top-down outcomes (not output), frame the non-negotiables, and map driver trees to clarify the input metrics that matter. Then I ask empowered product teams to pressure-test the plan, propose approaches, and own the how. This preserves ambition while unlocking creativity — a practical balance of clarity and autonomy that outcomes vs output OKRs were designed to achieve.

One-size-fits-all management is a myth. Early-stage teams need hands-on coaching and fast decisions; later-stage teams need mechanisms that scale: crisp PRDs, pre-mortems, and operating cadences that separate strategy, planning, and execution. The mark of a high-functioning executive team is not uniform style — it’s high candor, fast escalation paths, and visible commitment after debate. In tough moments, a little charisma goes a long way; in practice, that’s not theatrics, it’s steady optimism, simple language, and consistent follow-through that keeps people moving forward.

The hypergrowth skill stack for executives is surprisingly learnable: ruthless prioritization under uncertainty, narrative writing that aligns cross-functionally, structured delegation with clear “inspection points,” and a weekly rhythm that protects maker time. I leverage a cadence of business reviews (inputs > outputs), customer-scent checks, and decision logs so we can move fast without losing the thread. CEO and executive time management is the ultimate forcing function — if we can’t show where our attention maps to goals, the team won’t either.

Some of my enduring lessons echo the best of Amazon and eBay: customer obsession beats competitor obsession, input metrics beat lagging vanity metrics, and simple mechanisms beat heroics. From Jeff Bezos’s playbook I borrow the insistence on written narratives, single-threaded ownership, and clarity on what will not change. Those principles remain the backbone of platform scalability and resilient product strategy, especially when markets get noisy.

AI is about to flatten organizations. With agentic AI, retrieval-first pipelines, and AI workflows embedded into product development, managers can widen their span without losing fidelity. I see LLMs for product managers accelerating discovery, PRD drafting, and experiment analysis — while raising the bar on decision quality. The implication for leadership: fewer layers, more transparency, and even greater pressure to define sharp, top-down outcomes that teams can autonomously pursue.

If I had to compress this into a playbook, it’s this: set audacious, top-down goals; keep your “plate spinning” calendar sacred; write more than you talk; hire builders, not resume archetypes; sample the customer journey every week; and build mechanisms that make the right thing easier than the heroic thing. That’s how you scale product management leadership from dozens to thousands — in atoms, in bits, and in the messy, exhilarating space where they meet.

April 17, 2026
Outcome-Driven Product Development: A Practical Operating Model
Your team has hit its release dates, the roadmap is moving, and the launch calendar is full. Then someone asks the question that exposes the problem: what changed for the customer or the business? If the answer is a list of shipped features, activity is visible but progress is still uncertain.

Outcome-driven product development closes that gap. It gives a team a measurable problem to solve, enough freedom to change its solution as evidence changes, and a clear point at which exploration should become committed delivery. The result is not less shipping. It is less investment in work that has never earned the right to scale.

Replace the feature request with an outcome contract

A feature describes what the team will produce. An outcome describes the change the team intends to cause. That distinction sounds simple, but it changes planning, discovery, measurement, and accountability.

Suppose the roadmap says, “Launch an AI onboarding assistant.” The statement has a deliverable, but it leaves the important questions unanswered. Which customers are struggling? What behavior needs to change? How will the assistant cause that change? What evidence would justify continued investment? What must not get worse?

Rewrite the request as an outcome contract before discussing scope. A useful contract contains:
- Target customer and context: the specific user or account segment experiencing the problem, plus the situation in which it occurs.
- Problem evidence: the interviews, behavioral data, support patterns, or other observations showing that the problem is real.
- Behavioral outcome: the customer action that should become more frequent, successful, or efficient if the problem is solved.
- Business connection: the reason that behavior matters to activation, conversion, retention, revenue, cost, or another strategic result.
- Baseline, target, and measurement window: the current state, the intended direction or threshold, and the period over which the change will be assessed.
- Guardrails: the customer, operational, financial, ethical, or reliability measures that must not deteriorate while the primary metric improves.
- Decision point: the evidence that will trigger scaling, iteration, another experiment, or stopping the work.
The rewritten AI onboarding bet might be: “For new workspace administrators who struggle to complete setup, increase the share who reach their first useful workflow during onboarding, without increasing support demand or incorrect automations.” The assistant is now one possible solution, not the definition of success.

That wording gives the team room to discover that a checklist, better defaults, a guided setup flow, clearer copy, or a narrower AI capability solves the problem more effectively. It also prevents a common failure mode: launching the requested feature and retroactively selecting whichever metric happened to move.

Use three tests before accepting an outcome:
- If the proposed feature disappeared, would the stated customer and business result still matter?
- Can the team observe the target behavior with enough precision to distinguish exposure, use, and successful completion?
- Does the team have permission to change the solution while preserving the outcome and agreed constraints?
If any answer is no, you probably have a feature commitment decorated with outcome language. Fix that before the roadmap item absorbs a delivery team.

Use an evidence gate between learning and earning

Outcome-driven teams still build. The important choice is what they are building for at each moment.

In build-to-learn mode, the objective is to reduce uncertainty cheaply. The team is trying to understand the problem, test whether a solution is desirable and usable, and expose delivery or business risks before making a large commitment. Customer interviews, lightweight prototypes, assumption mapping, an opportunity solution tree, and narrowly scoped experiments belong here.

In build-to-earn mode, the objective changes to dependable value capture. The team has enough evidence to invest in production quality, integration, operational readiness, adoption, and scale. Acceptance criteria, sprint planning, release discipline, observability, and post-launch measurement become central.

These are modes of investment, not separate departments. A product trio can move between them as confidence changes. The practical goal is to learn only until the evidence supports conviction, then move decisively into value capture while keeping discovery alive.

Make every learning activity answer a decision

Discovery becomes theater when teams collect feedback without specifying what the feedback will change. Give each experiment a compact brief:
- Hypothesis: what the team currently believes about the customer, problem, solution, or expected behavior.
- Riskiest assumption: the belief most likely to invalidate the bet if it is wrong.
- Method: the cheapest credible way to test that assumption.
- Evidence: the observable result that would strengthen or weaken confidence.
- Timebox: the boundary that prevents exploration from continuing without a decision.
- Next action: scale, revise, run a different test, or stop.
Match the method to the question. Interviews can reveal whether a problem exists and how customers describe it, but they do not prove that a shipped solution will change behavior. A prototype can expose comprehension and usability problems, but it does not establish durable adoption. An A/B test can quantify incremental impact when exposure, instrumentation, and analysis are suitable, but it cannot rescue a weak problem definition.

An opportunity solution tree helps keep those questions connected. Start with the outcome, map the customer opportunities that could influence it, attach possible solutions to those opportunities, and place experiments under the assumptions they test. This makes it easier to notice when a favored solution has become detached from the original problem.

Define the evidence gate before enthusiasm takes over

There is no universal discovery duration or confidence score. The appropriate threshold depends on the cost, reversibility, operational risk, and strategic importance of the decision. A narrow, reversible change should not face the same gate as a platform migration or a customer-facing AI system with meaningful risk.

The team is usually ready to enter build-to-earn mode when it can answer yes to these questions:
- Is the target problem supported by evidence from the intended customer segment?
- Does the proposed solution reliably address that problem in the contexts tested so far?
- Has the team observed a credible leading signal connected to the desired behavior?
- Can the outcome, baseline, primary measure, and guardrails be instrumented?
- Are delivery, operational, and business risks understood well enough to make a commitment?
- Is the expected value sufficient to justify production investment and opportunity cost?
Do not wait for certainty; product decisions never get it. But do not confuse executive sponsorship, customer enthusiasm, a polished prototype, or engineering progress with evidence that the bet will produce the intended result. If the gate is not met, fund the next learning step rather than pretending the full solution is ready to scale.

The transition does not have to happen all at once. A team can productionize a narrow use case while continuing to test adjacent opportunities. What matters is that learning work and scaling work are identified honestly, funded deliberately, and judged by different standards.

Run discovery and delivery as one product system

An outcome will not survive if strategy, discovery, delivery, and stakeholder management operate as separate handoffs. It needs an operating model in which the same people retain context from the problem through the impact review.

Organize the work around an empowered product trio: product management, design, and engineering jointly own the outcome and the evidence behind the solution. That does not erase specialist responsibilities. It removes the pattern in which product writes requirements, design decorates them, and engineering discovers the hard constraints after the decision has already been presented as final.

Discovery and delivery should run as coordinated tracks. While validated work moves through implementation, release, and adoption, the trio continues testing the assumptions behind upcoming bets and watching what released behavior teaches them. This preserves learning without making committed delivery wait for every future question to be answered.

Maintain a compact bet record for every meaningful investment. It should show:
- the customer problem and strategic outcome;
- the current solution hypothesis;
- the evidence gathered so far;
- the assumptions that remain unresolved;
- the current mode: learning or earning;
- the next decision and the evidence required to make it;
- the primary metric and guardrails;
- the delivery constraints and dependencies that materially affect the bet.
This record should travel into roadmap and sprint decisions. A roadmap then becomes a sequence of outcome bets, ordered by expected leverage, evidence, dependencies, and strategic fit. Sprint work remains concrete, but each significant task can be traced to the hypothesis it supports. That traceability exposes orphan features: work with no defined problem, no testable belief, and no measurable result.

Turn stakeholder reviews into decision reviews

Stakeholders usually ask for feature certainty because feature status is what the operating system gives them. Change the review format and the conversation changes with it.

For each bet, report the outcome, the evidence gained since the previous review, the decision that evidence supports, the largest remaining risk, and the delivery forecast when the work is in earn mode. Then ask:
- Has the customer problem or strategic priority changed?
- Did the latest evidence increase or reduce confidence?
- Is the team still testing the highest-risk assumption?
- Has new scope appeared without a corresponding outcome or risk reduction?
- Does the next investment buy learning, value capture, or neither?
This is also how you control scope without turning every discussion into a negotiation. New work must improve the expected outcome, address a documented risk, or satisfy an explicit constraint. If it does none of those things, it belongs outside the current bet.

A stakeholder can still impose a solution for regulatory, contractual, architectural, or strategic reasons. Record that constraint plainly. Then preserve the team’s responsibility for validating the problem, minimizing unnecessary scope, instrumenting the result, and reporting whether the mandated solution actually changed behavior.

Instrument the decision loop, not just the launch dashboard

A release is evidence that the team produced something. It is not evidence that customers encountered it, understood it, used it successfully, or changed their behavior because of it.

Build a metric chain before production work becomes expensive:
- Business result: the strategic effect the company ultimately cares about.
- Customer behavior: the action expected to contribute to that result.
- Product signal: the observable event or state that indicates the behavior occurred successfully.
- Capability: the shipped solution intended to influence the signal.
If retention is the business result, successful setup or repeated use might be an earlier behavioral signal. That relationship is still a hypothesis in your product. Do not assume a leading indicator matters merely because it is easy to move. Validate its connection to the downstream result over an appropriate measurement window.

Before release, define who is eligible, how exposure will be recorded, what successful use means, which segment will be analyzed, what baseline or comparison will be used, and which guardrails could stop expansion. Verify the instrumentation itself. Otherwise, a missing event can be mistaken for missing adoption, while duplicate or ambiguous events can create fictional progress.

Where traffic and exposure permit, an A/B test can estimate whether the capability caused an incremental change. Where randomization is not practical, a staged release, cohort comparison, or carefully monitored pilot can still provide directional evidence, provided its limitations are stated. In either case, observability should cover both customer behavior and the operational health of the released system.

Use the post-launch review to make a decision, not to celebrate a dashboard. Work through the causal chain in order:
- No meaningful exposure: investigate eligibility, distribution, rollout, onboarding, or instrumentation before judging the solution.
- Exposure without use: examine relevance, comprehension, trust, discoverability, and usability.
- Use without the intended behavior: challenge the solution mechanism and the definition of successful use.
- Behavior change without the business result: reassess the assumed link, segmentation, and measurement window.
- Primary improvement with guardrail damage: pause expansion and address the tradeoff rather than averaging the harm away.
These are diagnostic starting points, not automatic conclusions. Their value is that they direct the next investigation. The review must end with an explicit choice: expand, continue observing, revise the solution, test a different opportunity, or stop investing.

AI changes the cost of learning, not the evidence standard

Generative AI can make prototypes, interface variants, and qualitative synthesis much faster. That is useful because it lowers the cost of testing assumptions. It also makes it easier to produce a convincing solution before anyone has established that the problem deserves investment.

Treat an AI-generated prototype as a learning instrument, not customer validation. A plausible interface or polished response does not show that customers will trust it, incorporate it into their workflow, or achieve a better result. Those questions still require evidence from the intended users and context.

For an AI-powered feature, separate system quality from product outcome. Prompt changes, model changes, response quality, and task-level evaluations can explain how the capability behaves. The customer outcome tells you whether that capability improved the workflow that matters. A better model-level result can coexist with a flat customer outcome if the feature addresses the wrong problem, appears at the wrong moment, or creates too much friction around the generated output.

Keep risk guardrails beside the primary outcome from the beginning. The relevant guardrails depend on the use case, but they should capture the ways an apparently successful AI feature could create unacceptable customer, operational, ethical, or business consequences. Faster experimentation is valuable only when the decision loop can detect both value and harm.

Outcome-driven product development FAQ

What if an executive has already specified the feature?

Treat the feature as a proposed or mandated solution, then ask what result makes it worth building. Document any non-negotiable constraint and write the outcome contract around it. Offer alternatives when discovery shows a cheaper or more effective route, but do not hide the original decision. The team should still instrument the feature and report whether it delivered the intended behavior.

How long should discovery take?

Long enough to cross the evidence gate for the decision being made, and no longer. Timebox each experiment, not the truth of the entire opportunity. A reversible, limited bet may need modest evidence; an expensive or risky commitment warrants more. If discovery continues without changing confidence or informing a decision, narrow the question or stop the activity.

Can an outcome-driven team still commit to a delivery date?

Yes, once the work is in build-to-earn mode and the important delivery unknowns are bounded. Keep the delivery forecast separate from outcome confidence. A team may be highly confident about when a capability will ship while remaining uncertain about the behavior it will cause. Reporting both prevents schedule confidence from being mistaken for product confidence.

What should happen when the outcome does not move?

Start with exposure, then use, then behavior, then the business connection. Identify where the chain broke before adding scope. If the evidence weakens the underlying hypothesis, revise or stop the bet. Outcome accountability means changing course when the mechanism fails, not punishing a team for refusing to manufacture a favorable story.

At your next roadmap review, take the highest-investment item and replace its feature description with an outcome contract. If the problem evidence, behavioral measure, guardrails, or decision rule is blank, fund the missing learning before you fund scale. That single change will reveal whether the roadmap is managing customer and business progress or merely scheduling production.

References
- Shivam.Consulting Blog – Build to Learn vs. Build to Earn: My Proven Playbook for Outcomes Over Output in the AI Era
April 16, 2026
Product Work Is Relationship Work: How I Align Stakeholders Faster and Cut Team Politics

Lately, I keep hearing a familiar question: with AI making it so easy to generate ideas and build products, do we still need product managers? My answer is unequivocal—yes. Tools accelerate delivery, but they don’t build trust, reconcile competing incentives, or create the shared understanding teams need to ship outcomes. Product work is relationship work.

I recently listened to “Product Work Is Relationship Work – All Things Product with Teresa & Petra,” and it echoed what I see every day in high-performing product organizations. If you prefer to watch, here’s the episode on YouTube: https://www.youtube.com/embed/d-0f8uAfc8w?feature=oembed

Listen to this episode on: Spotify | Apple Podcasts

While AI can help build things faster, it can’t replace the relationship work required to align stakeholders, navigate competing priorities, and create shared understanding across teams. That’s the hard, human part of product management—and it’s not going away.

In my experience, product teams stall when collaboration becomes transactional. We jump to negotiation (“What can you commit by Friday?”) before establishing context (“What problem are we solving and why now?”). When I slow down to get curious—about constraints, incentives, and assumptions—momentum actually increases because we’re rowing in the same direction.

Stakeholder alignment often breaks down when we conflate advocacy with exploration. We argue our viewpoint as if it were the only lens that matters, rather than making space to surface how others see the system. I’ve found the distinction between “dialogue vs. discussion,” rooted in work by Chris Argyris and elaborated in The Fifth Discipline by Peter Senge, to be a powerful reset. Dialogue builds shared understanding; discussion decides. You need both, in the right order.

Language matters in the room. The improv principle “Yes, and” is deceptively simple but transformative. When a designer, engineer, or executive feels heard (“Yes”) and we build on their idea (“and”), we create psychological safety without sacrificing critical thinking. I use “Yes, and” to explore perspectives before we converge on decisions—especially with product trios and senior stakeholders.

Here are the moves I rely on to keep collaboration relational and outcomes-focused. First, we align on outcomes before solutions. I explicitly separate outcomes vs output OKRs so we’re clear on what success looks like, independent of the features we ship. That clarity reduces rework and speeds up decision-making later.

Second, we operationalize curiosity with continuous discovery. I schedule recurring, lightweight touchpoints with customers and internal stakeholders so insights compound. When learning is continuous, debates quiet down—evidence does the heavy lifting.

Third, we invest in relationship rituals. Regular 1:1s with key partners, stakeholder maps that capture motivations, and pre-reads that frame trade-offs all prevent misalignment from surfacing in the last mile. These small habits pay huge dividends in trust and speed.

Fourth, I’m explicit about mode-switching in meetings: are we advocating a position or exploring perspectives? Calling the mode out loud prevents people from mistaking questions for opposition and keeps the conversation productive.

Fifth, we use “Yes, and” to move from possibility to practicality. We explore generously, then converge rigorously—ranking options by impact, effort, and risk so decisions are transparent and fair.

If stakeholder alignment, team dynamics, or product “politics” slow your team down, this conversation offers a practical reframe. You’ll move faster when you build the relational tissue first—because alignment is an accelerant, not a tax.

Resources & Links:

Follow Teresa Torres: https://ProductTalk.org

Follow Petra Wille: https://Petra-Wille.com

Mentioned in this episode:

Petra’s Coaching Packages

Work by Chris Argyris on organizational learning and dialogue vs. discussion

The Fifth Discipline: The Art and Practice of the Learning Organization by Peter Senge

Improv principle “Yes, and”: Saying “Yes, and” — A principle for improv, business & life and Yes, and …

Have thoughts on this episode or examples from your team? Leave a comment below—I’d love to learn what’s working (and what’s not) in your stakeholder landscape.

Inspired by this post on Product Talk.

April 14, 2026
Stop Drowning in Tasks: How AI Marketing Agents Restore Focus and Maximize Impact

Every week I meet marketers who are working harder than ever—more campaigns, more content, more dashboards—yet seeing less movement on metrics that matter. The surge of AI tooling has amplified activity, not necessarily impact. That’s the focus problem: we confuse motion with momentum, and our backlogs look great while our outcomes stall.

Learn how AI agents for marketing can help you prioritize impact so you can do important work, instead of just more work.

In my role leading product and growth teams, I’ve learned that AI only compounds value when it is pointed squarely at outcomes. If we don’t define what “good” looks like, agentic AI will simply scale busywork. The antidote is a disciplined operating model that connects strategy to execution and instruments agents with clear success criteria.

First, anchor your program with outcomes vs output OKRs. Choose one or two measurable business outcomes—such as qualified pipeline, conversion rate, or activation—and make everything else subordinate. This provides the compass agents need to make effective trade-offs when speed and volume tempt you to do “one more thing.”

Second, map a driver tree from the target outcome down to the controllable levers: audience segments, offers, channels, messaging, and experience friction. This traceability shows where agents can move the needle fastest—whether that’s accelerating research, sharpening positioning, or eliminating handoffs that slow experimentation.

Third, design a small, agentic AI workforce aligned to those levers. For example: a Research Agent that synthesizes market insights and past performance; a Copy Agent that generates on-brief, on-brand variants; a Distribution Agent that adapts content to each channel and schedules posts; and an Analytics Agent that runs A/B tests, summarizes results, and flags anomalies. Keep human oversight where judgment matters most—strategy, brand voice, and high-stakes decisions.

Fourth, instrument rigor from day one with Agent Analytics and eval-driven development. Define offline evals for brand consistency, factuality, safety, and response time; pair them with online experiments that quantify lift on your target outcomes. Set a minimum detectable effect (MDE) so you stop shipping changes that cannot plausibly move the metric.

Fifth, operationalize your AI workflows. Standardize prompts, inputs, and handoffs; templatize briefs and acceptance criteria; and keep a change log so improvements compound rather than reset. Use short, frequent feedback loops to prune low-impact work and double down on what demonstrably advances your objectives.

I’ve seen teams reclaim focus and momentum when they treat agents as teammates, not toys. The magic isn’t in producing more assets—it’s in consistently choosing the next best action in service of a clear outcome. When you combine outcome clarity, a driver tree, targeted agents, and tight evals, AI becomes a force multiplier for marketing impact.

If you’re feeling overwhelmed by AI’s possibilities, start small: commit to one outcome, one driver you believe is material, and one agent designed for that job. Prove lift, codify the workflow, then scale. Velocity is only valuable when it’s pointed in the right direction.

Inspired by this post on Amplitude – Best Practices.

April 10, 2026
Commercial vs. Internal Products: Hard Truths, High Leverage, and How I Make the Call

Internal Products Are Hard; Commercial Products Are Harder. That line captures years of hard-won lessons from leading both internal platforms and market-facing SaaS at HighLevel. I’ve seen how the two demand different muscles—even when the tech stack, talent, and timelines look the same on paper.

When I talk about internal products, I mean services and solutions that our own employees use to take care of customers—customer-enabling tools and services, agent consoles, fulfillment and billing workflows, operations dashboards, and the underlying platforms that keep them fast, compliant, and resilient. These tools don’t generate revenue directly, but they quietly determine customer experience, gross margin, and how quickly we can ship, resolve issues, and scale.

Commercial products, by contrast, add a second challenge layer. Beyond discovery, usability, and reliability, we must conquer positioning, pricing and packaging, competitive differentiation, sales enablement, procurement hurdles, and ongoing customer success motion. The surface area for failure is bigger, and the time-to-signal on product-market fit is slower and noisier.

Here’s how I decide where to invest. First, I anchor on outcomes, not output. If the business priority is net revenue retention, faster onboarding, or reduced cost-to-serve, internal products often provide the highest-leverage path. If the priority is new revenue, new market entry, or a must-have differentiator, we lean commercial. I make the trade explicit in outcomes vs output OKRs so we can defend the decision when pressure mounts.

Second, I run a clear build vs buy calculus. For internal needs, the default is buy if a mature, configurable solution exists that meets our security, data governance, and integration requirements. I only build when the workflow is core to our differentiation, the TCO of customization is lower than vendor sprawl, or we can capture unique proprietary advantage. For commercial products, I avoid embedding third-party IP in a way that caps differentiation or compresses margins as we scale.

Third, I insist on continuous discovery. Internal audiences are not a captive market—they’re discerning experts with real jobs to do. I treat them like customers, with structured customer interviews, journey mapping, and opportunity solution trees. I rely on empowered product teams and product trios to validate problems and reduce solution risk before we commit engineering time.

Fourth, I frame commercial vs internal work with capacity guardrails. In most planning cycles, I reserve explicit allocation for platform scalability and internal tooling, separate from feature bets. Without this, internal products become backlog filler, which guarantees we’ll pay the interest later in churn, SLA breaches, and slower delivery.

Execution differs too. For internal products, change management is the make-or-break. I plan enablement as a first-class deliverable: clear rollouts, in-app guides, training, and feedback loops with frontline champions. I track adoption, time-to-resolution, error rate, and satisfaction for internal users with the same rigor we apply to external users.

For commercial products, I design the discovery-to-GTM handshake early. Pricing and packaging must reflect value drivers discovered in research, not what’s easiest to meter. Sales and solutions engineering need crisp narratives, objection handling, and proof points. Customer success needs activation plans and health signals tied directly to leading indicators of retention.

Across both, I instrument the product and process. I lean on feature flags and progressive delivery to manage risk, and I protect SLOs with error budgets so teams balance reliability with iteration speed. CI/CD isn’t a badge—it’s how we earn the right to ship continuously without eroding trust.

Common pitfalls recur. Teams skip UX for employee tools because “they have to use it”—which backfires as shadow workflows and rework. Leaders underfund internal platforms, then wonder why velocity stalls. On the commercial side, teams over-index on features and under-invest in positioning and onboarding, leading to poor activation and elongated sales cycles.

What’s the payoff? When we treat internal products as products, we unlock scale: shorter handling times, fewer escalations, clearer accountability, and higher customer satisfaction. When we approach commercial products with the same discovery rigor plus smart GTM, we compress time-to-value and amplify differentiation. The craft is knowing which lever to pull when—and having the discipline to measure what matters.

My rule of thumb is simple. If the goal is operational excellence that compounds across the entire customer journey, invest in internal products with the same intensity you reserve for revenue-generating features. If the goal is market expansion or category leadership, invest in commercial products with a tight discovery-to-GTM loop. In either case, clarity of outcomes, disciplined discovery, and empowered teams win the day.

Inspired by this post on SVPG.

April 9, 2026
Stop Forcing AI to Prove ROI: A Product Leader’s Playbook to Measure Real Business Value

Every planning cycle, I feel the drumbeat: “Show me the AI ROI—this quarter.” The pressure is real, especially when boards and CFOs expect immediate payback. Yet when I review stalled initiatives across teams and peers, the pattern is consistent: most companies treat AI like a feature to ship, not a system to manage. That mindset almost guarantees we measure the wrong things, declare victory (or failure) too early, and miss the durable value AI can create.

Here’s the core problem I see: we leap to solution and skip the counterfactual. Without a baseline, a clear control, or a defined “what would have happened otherwise,” we’re guessing. We also fixate on lagging, financial KPIs that move slowly (revenue, cost, risk), then use outputs—not outcomes—as OKRs. If we don’t align on outcomes vs output OKRs upfront, the best team in the world can still optimize for activity over impact.

My AI Strategy starts from a simple truth: value shows up along three vectors—revenue, cost, and risk—on different timelines. In the near term, we must validate leading indicators (adoption, engagement, activation) that ladder to those vectors through a transparent driver tree. Over time, those drivers compound into the lagging KPIs finance cares about. When we make the driver tree explicit, everyone can see how model precision, response time, and workflow integration roll up to conversion lift, case deflection, time-to-resolution, or reduced exposure.

To make this rigorous, I run a five-step playbook. First, define the decision and business outcome in plain terms. Second, instrument the baseline with behavioral analytics on a unified analytics platform—tools like Amplitude analytics or Pendo help expose friction points we’ll later target. Third, create a counterfactual using A/B testing and specify a minimum detectable effect (MDE) so we know how long to run and how much traffic we need. Fourth, quantify costs (training, inference, integration, change management) and include AI risk management, privacy-by-design, and data governance up front. Fifth, lock a measurement plan that connects leading indicators to lagging ROI through the driver tree.

Most AI initiatives don’t fail on model quality—they fail on adoption. If the workflow isn’t smoother, trust isn’t earned, or value isn’t obvious, users revert. That’s why I invest early in onboarding, in-app guides, product tours, and thoughtful tooltip design to reduce the time-to-first-value. Then I watch user activation, retention analysis, and task completion to ensure the assistive experience is not just novel—it’s habit-forming.

For generative use cases, eval-driven development is non-negotiable. I maintain offline evaluations for accuracy and safety, and online evaluations for business impact. Retrieval-first pipeline health, context window management, and prompt engineering affect reliability; so do latency and grounding quality. We ship behind feature flags, measure guardrail effectiveness, and tighten feedback loops from human-in-the-loop reviews into model updates—continuously.

On the business side, I avoid “AI theater” by structuring benefits like a CFO. Revenue: increased conversion or expansion driven by better recommendations, faster sales cycles, or higher trial activation. Cost: case deflection, agent time saved, fewer escalations, and lower rework. Risk: reduced exposure via automated checks, anomaly detection, and consistent policy application. If any claim can’t be tied to measured deltas—via A/B testing or strong quasi-experiments—it doesn’t go in the deck.

Build vs buy deserves the same discipline. I map platform scalability, governance requirements, and total cost of ownership against time-to-impact. Teams often underestimate integration and maintenance drag; a pragmatic mix of bought components with thin custom layers can accelerate outcomes while keeping options open. The goal isn’t to own every layer—it’s to own the learning loop and the differentiated experience.

I also remind teams that tooling should serve the strategy, not replace it. I’ve seen concise, effective messaging that captures the point: “Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.” The words are compelling because they reflect the three-vector value model and the adoption imperative. The same standard should apply to any AI initiative we propose.

If you’re under pressure to prove ROI, shift the conversation: lead with the driver tree, specify your counterfactual, and anchor on leading indicators you can move in weeks—not quarters. Then connect those to the lagging KPIs finance expects over time. When we manage AI like a product—grounded in evidence, experimentation, and user-centered adoption—we don’t have to force ROI. We compound it.

Inspired by this post on Pendo – Perspectives.

April 8, 2026
How Top Product Teams Roadmap Through Uncertainty: Align Faster, Adapt Smarter, Deliver

Product roadmaps should not be promises etched in stone; they are portfolios of bets made under uncertainty. When I build a roadmap, I’m not predicting the future—I’m designing a system that helps the team learn faster than the market changes, allocate capital wisely, and create alignment across engineering, design, go-to-market, and leadership.

The best roadmaps I’ve seen and shipped anchor on outcomes rather than features. “Outcomes vs output OKRs” is more than a slogan; it’s how we translate strategy into measurable impact. I start by defining a small set of outcome metrics that matter—such as activation rate, time-to-first-value, or expansion revenue—and attach clear key results and guardrails to each theme. This reframes prioritization from “what can we build?” to “what must change in customer behavior?” and gives empowered product teams real autonomy.

I organize the roadmap into time horizons—Now, Next, Later—with explicit confidence levels. Near-term items have higher confidence and more specificity; mid- and long-term bets are thematic with wider time windows. This approach reduces false precision and builds trust because stakeholders can see both the intent and the uncertainty. When dates matter, I use windows and service level expectations rather than single deadlines, and I pair each initiative with a lightweight risk scoring so we can discuss uncertainty explicitly rather than implicitly.

Continuous discovery keeps the roadmap honest. I partner in tight “product trios” across product, design, and engineering to run rapid customer interviews, opportunity sizing, and assumption tests before we commit significant delivery capacity. The opportunity solution tree is my favorite artifact here; it visualizes the path from outcomes to opportunities to experiments and solutions, making trade-offs and sequencing transparent. By the time something moves into sprint planning, we’ve already reduced key uncertainties and clarified the narrowest viable slice we can ship.

Uncertainty demands options. I plan initiatives as options with stage gates and explicit kill criteria rather than as single monolithic projects. For every significant theme, I outline base, best, and worst-case scenarios with pre-decided triggers for when we escalate, pivot, or stop. This practice prevents sunk-cost fallacy and keeps the team focused on evidence. We treat scope as a knob, not a switch, and we bias toward small, sequential bets that compound learning.

Capacity is strategy. I routinely reserve a discovery buffer—typically 10–20%—and a contingency buffer for integration, security, and performance risks that always show up late. I ruthlessly control work-in-progress to limit thrash and protect the team’s ability to respond when new information arrives. When we must navigate dependencies, I use thin vertical slices and decouple via contracts or feature flags so discovery momentum doesn’t stall while platforms evolve underneath.

Prioritization under uncertainty benefits from explicit models. I combine value, effort, and confidence with risk scoring to surface where the unknowns are hiding. Driver trees help us connect top-level outcomes to leading indicators, so we can place bets where they have the highest causal leverage. I also lean on the Kano Model and qualitative signals to avoid over-investing in performance attributes while neglecting excitement features that unlock differentiation and word-of-mouth.

The most effective stakeholder management is narrative-first. For executives, I present a one-page outcomes roadmap that shows themes, expected shifts in key results, and the learning plan. For teams, I provide a more detailed plan that links discovery insights, assumptions-to-test, and decision points. I make room for a “what we’re not doing” section to reduce noise and prevent shadow backlogs from reappearing in every meeting. Most importantly, I socialize change before it happens, explaining the evidence and the trade-offs so adjustments feel like progress, not whiplash.

Measurement closes the loop. We instrument experiments and releases with leading indicators tied to the driver tree and review them on a predictable cadence. If movement stalls, we diagnose whether we have a targeting problem (wrong audience), a value problem (weak proposition), or a friction problem (broken journey). That discipline lets us iterate with purpose instead of chasing vanity metrics or isolated anecdotes.

Here’s a concrete example of roadmapping through uncertainty. Suppose our Q3 objective is to “Increase user activation” with key results to raise the Week-1 activation rate from 32% to 45% and cut time-to-first-value by 30%. In discovery, customer interviews reveal confusion in the first-run setup and a missing integration that advanced users expect. We map an opportunity solution tree and identify two high-leverage opportunities: simplifying the first 10 minutes and offering a guided setup for the integration. We then shape two minimal bets: an in-app guide to streamline the first three tasks and an integration wizard behind a feature flag. Each bet has an explicit decision rule and a two-sprint runway. We ship the guide first, confirm a statistically significant lift via A/B testing, then expand scope. The integration wizard underperforms initial expectations, so we pause, revisit the assumptions, and re-allocate buffer to the stronger path. The roadmap updates in real time, and everyone understands why.

When uncertainty spikes—new competitor, pricing shock, platform deprecation—I shift the roadmap cadence to rolling-wave planning. We shorten planning horizons, increase the frequency of readouts, and elevate discovery allocations temporarily. We also create thematic “containment zones” where we explore multiple options in parallel with small budgets until one path justifies scale. This allows us to stay responsive without abandoning strategy.

Good governance accelerates, it doesn’t slow. A lightweight product council that reviews outcomes, risks, and cross-functional dependencies prevents surprise escalations and ensures we keep shipping what matters. We avoid death-by-approval by agreeing in advance on decision rights and thresholds—for example, a product trio can pivot a bet within a theme up to a certain budget or timeline impact without additional approval, as long as it improves the outcome likelihood.

If you’re evolving your roadmap practice, start with three moves. First, reframe your plan in outcomes and publish a driver tree that connects those outcomes to the few leading indicators you believe move them. Second, stand up a continuous discovery cadence with a visible opportunity solution tree and an assumptions-to-test backlog. Third, implement time windows and confidence levels for all mid- and long-term items, and pair each major initiative with explicit kill criteria. You’ll feel the difference in a single quarter: clearer trade-offs, faster learning, and more predictable delivery—despite uncertainty.

In the end, a roadmap that thrives in uncertainty is an agreement about how we learn and decide together. It aligns the organization on outcomes, it funds options—not fantasies—and it gives empowered product teams room to maneuver. That’s how top product teams plan for uncertainty and still deliver with confidence.

Inspired by this post on Product Talk.

April 8, 2026
A Product Leadership System for Faster, Clearer Execution
Your roadmap is full, every function has a planning ritual, and experienced people are working hard. Yet decisions still wait, priorities keep reopening, and substantial work reaches customers later than anyone expected. Adding another process layer will not solve that problem.

You need an execution system: explicit ownership, small batches, a dependable decision cadence, direct customer feedback, and a scorecard that distinguishes progress from activity. When those elements reinforce one another, your teams can move faster without lowering the quality bar or routing every judgment through you.

Give each team an operating contract, not just a roadmap

A roadmap identifies intended destinations. It rarely tells a team how to make the decisions required to reach them. That gap is where autonomy turns into ambiguity: product believes it owns the sequence, engineering waits for scope to stabilize, design explores a wider problem, and an executive assumes a requested feature is already committed.

Before an initiative becomes active work, give the team a short operating contract. It should fit on one page and answer these questions:
- Whose problem are you solving, and in what specific scenario does it occur?
- What observable customer or business outcome should change if the work succeeds?
- Who is accountable for the initiative and its sequence?
- Which constraints are fixed, and which assumptions remain open?
- What is explicitly outside the current scope?
- What is the smallest end-to-end slice that can produce useful evidence?
- What evidence will support the next decision?
- When will that decision be made, and who has the right to make it?
The owner is not the person who approves every task. The owner keeps the problem, outcome, sequence, and unresolved decisions coherent. Engineering, design, research, and product still make solution decisions together inside the stated boundaries.

This contract also protects the team from executive requests that arrive as solutions without context. When someone asks for a feature, do not turn the request directly into a backlog item. Translate it into a problem entry first: the affected customer, the workflow that breaks, the evidence behind the request, the relevant constraint, and the result the requester expects. A commercially important request can remain urgent after that translation, but the team can now evaluate it rather than merely obey it.

Set escalation boundaries at the same time. A team should escalate when decision rights are unclear, two constraints conflict, a priority change affects another team, or the work crosses an agreed risk boundary. It should not need escalation merely because a solution choice is consequential. If every consequential choice travels upward, the team is not autonomous; it is a queue feeding a senior leader.

Finally, maintain one prioritized backlog for the team. Separate executive, product, engineering, and sales backlogs create hidden competition. The operating contract establishes the logic, and the single backlog makes the resulting sequence visible.

Run a weekly loop around decisions and customer learning

Many product cadences organize meetings while leaving decisions to happen unpredictably. A useful cadence does the opposite. Every recurring touchpoint should help the organization choose, learn, or remove a constraint.

A workable leadership week looks like this:
- Monday: Confirm the few priorities that matter, identify decisions that could block progress, and resolve changes in sequencing. Do not reread the entire roadmap.
- Midweek: Review selected product requirements, design flows, research findings, and engineering readiness. Concentrate on ambiguity, batch size, and untested assumptions.
- Thursday: Spend time with customers and partners. Put working slices in front of them when possible, and bring the resulting evidence back to the team.
- Friday: Write down what changed in your understanding. Update the backlog, decision log, and operating contracts where the evidence warrants it.
The sequence matters. Monday establishes intent. Midweek exposes execution risk while there is still time to change course. Customer contact tests the team’s reasoning. Friday turns scattered observations into organizational memory. Without the synthesis step, customer conversations can become interesting anecdotes that never alter a decision.

Make the weekly demo the heartbeat of the team. A good demo starts with the user scenario and the intended outcome, shows the smallest working behavior, states what the team learned, and ends with the next decision. A tour of completed tickets is not a substitute. For platform or infrastructure work, demonstrate working behavior, operational evidence, or a retired technical risk rather than manufacturing a customer-facing screen.

When a team repeatedly has nothing meaningful to demonstrate, inspect the system before questioning effort. The batch may be too large. A dependency may lack an owner. Decisions may be waiting in an approval queue. The team may be building several disconnected components before completing one testable path. The correction is usually to narrow the slice, clarify the decision, or remove the dependency.

A thin slice is not an arbitrary reduction in scope. It must preserve one coherent scenario, reach a state where someone can evaluate it, and create evidence for a consequential next choice. Backend, frontend, and enablement tasks can all be necessary, but completing them separately does not create a feedback loop.

Put product and revenue in the same operating loop

Product and revenue drift apart when they maintain different versions of the customer. Product sees research themes and usage behavior. Revenue sees active deals, objections, urgency, and willingness to pay. Neither view is sufficient on its own.

Use one customer narrative, one shared pipeline of problems worth solving, and one scorecard. Review them together every week. Each proposed problem should carry the customer segment, affected workflow, available evidence, commercial context, expected outcome, and complexity the solution could add.

Then make the sequencing decision explicit:
- Solve now: The problem is important enough, supported well enough, and compatible with the current strategy.
- Stage for scale: The need is credible, but the team must first validate the pattern, build a reusable foundation, or resolve a dependency.
- Do not add: The request is too narrow, conflicts with the product direction, or creates complexity that its value does not justify.
- Sunset: Existing functionality consumes attention without contributing enough customer value or strategic leverage.
This turns product-versus-sales conflict into a visible portfolio decision. Revenue contributes evidence and urgency. Product protects coherence and long-term defensibility. Both functions see why an item moved, waited, or stopped.

Measure outcomes, flow, and quality as separate signals

A team can ship frequently without improving a customer outcome. It can also improve an outcome temporarily while accumulating quality problems that make the pace unsustainable. Your scorecard needs to keep those conditions separate.

For each important bet, review three signal groups:
- Outcome: The observable customer or business result the team is trying to change, supported by current evidence rather than a list of releases.
- Flow: Deployment frequency and the age or state of the current thin slice. These signals reveal whether value and learning can move through the system.
- Quality: Change failure rate and the recurring friction exposed in customer feedback, support conversations, or postmortems.
Use the scorecard to direct attention, not to automate judgment. If deployment frequency is healthy but the intended outcome is not moving, inspect the hypothesis, target customer, and value proposition. More releases may simply deliver the wrong idea faster. If deployment frequency falls, examine batch size, dependencies, and delayed decisions. If change failure rate worsens, narrow the slice and strengthen readiness or recovery before asking the team to accelerate.

Do not rank unlike teams by raw deployment counts. Use trends within the relevant product and technical context. The point is to find constraints and make decisions, not to turn a diagnostic signal into a performance contest.

Write outcome-focused OKRs with enough precision to guide a trade-off. A useful structure is: for a named user and scenario, improve an observable result from its current baseline toward an agreed target by the review point, without damaging a stated guardrail. Establish the baseline before debating the target. If the team cannot observe the result, say that plainly and make instrumentation or customer evidence part of the initial slice.

Feature count, roadmap completion, tickets closed, and activity volume can help with local planning. They are not proof of customer value. Treat them as operational context, not as the headline definition of success.

Keep the executive view compact. Each team should be able to present its intended outcome, current evidence, deployment-frequency trend, change-failure trend, most important customer learning, and next unresolved decision. If a metric never changes a question, a priority, or an intervention, remove it from the review.

Stay close to the work without taking the work away

Product leaders lose judgment when they only consume summaries. They become bottlenecks when they join every working session. The useful middle ground is deliberate sampling: inspect enough real work to calibrate your view, then give feedback that strengthens the team’s next decision.

Each week, sample a rotating set of artifacts such as a product requirements document, a design flow, customer research notes, a postmortem, or a customer thread. You are not trying to approve every artifact. You are checking whether the operating system is producing clear thinking.

Use questions that reveal decision quality:
- Does the requirement name a user scenario and a problem, or does it begin with a predetermined feature?
- Does the design expose a complete path that can be tested, or only polished fragments?
- Do the research notes separate what customers did and said from the team’s interpretation?
- Does the postmortem change an operating mechanism, or merely remind people to be careful?
- Does the customer thread reveal a pattern, an important exception, or one loud request?
- Can the team state the next decision this artifact is meant to support?
Feedback should create motion. Name the user scenario, identify the friction or ambiguity, state the decision principle, propose a smaller testable slice when appropriate, and clarify the next decision. A vague comment such as make this more strategic forces the team to guess what you mean and then wait for another review.

I use a simple leadership boundary: push hard on problem clarity, sequencing, and the quality bar; leave room on solution design and implementation. That boundary keeps accountability with leadership without converting senior judgment into remote-control product management.

Exemplars make this boundary easier to scale. Keep a small, current library of strong problem statements, concise narrative memos, useful research syntheses, clear acceptance criteria, and honest postmortems. Show why each example is effective. Teams learn a quality bar faster from visible work than from an expanding rulebook.

Create short paths for decisions and uncomfortable information

Open office hours give anyone a direct route for a difficult escalation, unfinished design, customer insight, or cross-team conflict. Run them as a decision forum, not an extra status meeting. Capture the decision, owner, rationale, and follow-up so people who were not present can still act consistently.

Keep weekly one-to-ones with your leaders as well. Office hours expose work across the organization; one-to-ones develop judgment, surface recurring constraints, and help a leader notice when someone is absorbing ambiguity on behalf of the system.

Fast feedback from leadership matters because waiting expands batches. When teams expect a long approval cycle, they tend to gather more material and seek approval for more decisions at once. Publish clear decision rights and a dependable response path. If you do not need to make the decision, say so immediately and return it to the named owner.

Spend unscripted time with individual contributors, too. Formal reporting lines filter information. Direct exposure to the people building, researching, designing, supporting, and selling the product helps you hear where the written process and actual work have diverged.

Install the system without reorganizing first

You do not need a company-wide transformation program to test this operating model. Start with one important initiative that is moving slowly or generating repeated disagreement. Keep the current reporting structure and change the mechanics around the work.
1. Capture the current friction. Identify where the initiative waits, where priorities conflict, which decisions keep reopening, and where work returns for avoidable clarification.
2. Write the operating contract. Name the problem, outcome, owner, constraints, non-goals, initial thin slice, required evidence, and next decision.
3. Collapse the work into one sequence. Bring product, engineering, executive, and commercial requests into one prioritized backlog. Preserve their context rather than preserving separate queues.
4. Run the weekly loop. Set priorities on Monday, inspect selected artifacts midweek, expose work and assumptions to customers, and synthesize the learning on Friday.
5. Publish the compact scorecard. Show the intended outcome, deployment frequency, change failure rate, newest customer evidence, and next decision. Do not wait for a perfect dashboard.
6. Inspect the mechanism after a full loop. Remove one gate that added waiting without adding learning, divide one oversized batch, and clarify one decision right that caused an escalation.
During the review, ask concrete questions: What waited for a decision? What was redone because the original problem was unclear? Which customer signal changed the plan? Which metric caused an intervention? Which request arrived without enough context? Where did leadership provide useful boundaries, and where did it take ownership away from the team?

Expand the model only after the team can explain how it changed actual work. Copying the ceremonies without the decision rights, customer exposure, and scorecard will create more meetings, not a stronger execution system.

Key takeaways
- Start each important initiative with a one-page operating contract that connects a real customer problem to an owner, outcome, constraints, thin slice, and next decision.
- Protect autonomy with explicit boundaries. Escalate conflicting priorities and constraints, not every consequential solution choice.
- Organize the week around decisions, working evidence, customer contact, and written synthesis rather than status reporting.
- Read outcomes, deployment frequency, and change failure rate together. No single signal can tell you whether the team is delivering sustainable value.
- Sample real artifacts and give specific feedback, while leaving solution and implementation ownership with the team.
- Give product and revenue one customer narrative, one problem pipeline, and one scorecard so trade-offs become visible sequencing decisions.
At your next Monday priority review, choose one live initiative and write its operating contract before discussing another roadmap change. The missing answers will show you where execution is actually slowing down. Fix that mechanism, run the loop, and let evidence determine where the system should expand next.

References
- Shivam.Consulting Blog — The CPO Playbook I Wish I’d Had: Ditch Bad Wisdom, Ship Faster, and Lead with Clarity
March 19, 2026
Outcomes vs Outputs: How I Stopped the Feature Factory and Drove Real Product Impact

“Outcomes over outputs” is the right mantra—and one I’ve championed across product teams—but turning it into daily practice is where most teams stumble.

It’s simple in theory: focus on the impact of what we build, not just shipping features. In reality, it’s rarely black and white because most teams are asked to do both—hit outcomes and deliver specific outputs—at the same time.

In a benchmark survey, 20% of product teams claim to be outcome-focused, nearly half describe themselves as working in a mix of outcomes and outputs, and about 30% are still primarily working with outputs. I’ve seen versions of this in my own org: we aspire to outcomes, but our rituals, roadmaps, and reporting still reward shipping.

Here’s how I draw the line clearly, coach my teams to avoid common traps, and negotiate better, more actionable outcomes that unlock genuine product discovery and business results.

Simple definitions we live by

An output is something you build or produce—a feature, a project, an initiative. It’s something your team ships.

An outcome is the impact of that output—a change in customer behavior or a business result.

Josh Seiden puts it well in his book Outcomes Over Output: “An outcome is a change in human behavior that drives business results.”

Shift from shipping to shaping results. This graphic clarifies outputs vs outcomes, revealing that value emerges between deliverables and impact—when features change customer behavior and move business results.

I distinguish business outcomes from product outcomes. Business outcomes are typically financial metrics that measure the health of the business (e.g. increase revenue or reduce costs) while product outcomes measure a customer behavior in the product or a sentiment about the product.

Here’s a simple example I’ve used with platform teams. Many B2B companies support a number of integrations. Integrations are outputs. Having integrations alone doesn’t create value. Customers using and finding value in those integrations—that’s an outcome. If those customers retain their subscriptions longer because of the integrations—that’s also an outcome.

Building something isn’t the same as creating value. That’s the core of this distinction, and it’s what separates empowered product teams from feature factories.

Why this distinction matters for empowered product teams

When we task teams with delivering outputs, they’re done when the software ships. When we task teams with delivering outcomes, they aren’t done until the software ships and has the expected impact.

That small shift changes almost everything about how a team works: what we measure (impact, not just delivery), how we know we’re done (measurable behavior change, not release notes), the autonomy we grant (told what to achieve, not what to build), and the planning artifacts we use (an opportunity solution tree beats a feature roadmap when we’re exploring the best path to an outcome).

When I assign outcomes, I’m giving the team latitude—and responsibility—to figure out the best path to success. That’s what opens the door for real product discovery and continuous discovery habits.

Shift your lens from shipping features to achieving impact. This side-by-side visual explains how outcome-driven teams measure success, grant more autonomy, define 'done' by results, and plan with an opportunity solution tree.

Examples: spotting outputs disguised as outcomes

Clear-cut example: “Our outcome is to deliver an Android app.” An Android app is something we build and ship. It’s clearly an output.

To get to an outcome, I ask, “What’s the value of having an Android app?” or “How will we know the Android app is successful?”

We might answer: “Having an Android app will allow us to engage more users. We’ll know it’s successful when people engage with the app on a regular basis.”

This answer uncovers the hidden outcome: engage more people. Now we can set the right scope: increase the percentage of engaged users across any platform; increase the percentage of engaged mobile users; or increase the percentage of engaged Android users.

Any of these outcomes gives us more room to explore than a fixed output. Maybe we don’t need a native app at all. We could deliver the same engagement through a mobile web experience, notifications, or email. And we’re not done when we ship—we’re done when the right people are actually engaged.

Tricky example 1: measure the value creation moment (hires, not applicants)

Move beyond shipping features to the impact that matters. This visual maps the path from build an Android app to the real goal, increase engaged users, by asking why, defining value, and owning results.

When setting outcomes, it’s tempting to choose the easiest-to-measure metric. But a good outcome measures the customer’s value creation moment.

I worked at a company that helped new college grads find their first job. When I started working there, the primary outcome was “increase job applications.” This technically is an outcome—it measures a specific behavior in the product.

But it doesn’t measure the value creation moment. A job seeker doesn’t get value when they apply for a job. They only get value when they get the job. Similarly, employers don’t get value from any job applicant, they get value when the right job applicant applies.

Many job boards try to measure qualified applicants—instead of counting any applicant, they compare the credentials of the applicant to the job description and only count qualified applicants. This is better. But it still doesn’t measure the value creation moment. Both the job seeker and the employer get value when an open job is successfully filled. The right metric is hires.

Yes, “hires” can be hard to instrument because it happens off-platform and incentives misalign. Measure it anyway, even with proxies. The easy metric isn’t always the right outcome.

Tricky example 2: measure impact, not user-generated output (the course reviews trap)

I worked with a team that helped students choose university courses. They set their outcome as: “Increase the number of course reviews on our platform.”

Confusing activity with impact? This visual breaks down four common outcome traps—measuring at the wrong moment, mistaking outputs, chasing adoption, and relying on sentiment—so teams focus on real value.

Sounds like an outcome, right? It’s a metric. You can measure it. It’s an action users take on the site—writing a review. But it’s actually an output in disguise.

Reviews are valuable when they help a student evaluate a course. They don’t create any value if a student never sees them. More reviews aren’t always better, especially if they’re clustered where nobody looks.

A better outcome is “Increase the number of course views that include reviews.” Now we’re measuring impact on the decision moment, not just the production of content.

If you can hit your metric without helping customers, you’re tracking an output, not an outcome.

Tricky example 3: measure success, not just adoption (the traction metric trap)

“Increase the percentage of users who viewed the performance report.”

This looks like a good outcome. It measures a specific behavior in the product. It’s within the team’s control. But it’s what I call a traction metric—it measures adoption of a single feature, not value to the customer.

Why teams get trapped in shipping features: a vicious trust cycle fuels micromanagement, while performance-linked outcomes push safe targets. Break the loop and refocus on customer outcomes that truly move the needle.

Two problems arise. First, people can view the report and still not find what they need. Second, we might have perfectly happy customers who don’t need the report at all. Driving usage of an unneeded feature wastes time and erodes trust.

Measure the value creation moment, not just feature adoption.

Tricky example 4: pair sentiment with behavior

I define a product outcome as a metric that measures either 1. a specific behavior in the product or 2. a sentiment about the product. But sentiment metrics—like CSAT or NPS—can be tricky on their own.

Sentiment metrics are outcomes, but they aren’t directional. They don’t tell us where to explore or set guardrails for what to avoid. So I pair a behavior with a sentiment, for example: “Increase engagement without negatively impacting satisfaction.” I use sentiment as a counterweight.

Facebook and Instagram illustrate why this matters. Meta is exceptional at driving engagement—but to a fault. Many of us don’t like these addictive products. Pairing engagement with a satisfaction guardrail prevents “engagement at all costs.”

Why getting this right is hard (and how I counter it)

Ready to move from shipping features to creating impact? This visual playbook shares five practical moves—translate metrics, partner with teams, iterate, avoid traps, and dig deeper—to turn outputs into measurable outcomes.

The trust cycle. Managers don’t trust that teams can reach outcomes on their own. So managers micromanage the outputs. Teams, in turn, don’t communicate their progress toward outcomes—they communicate their progress on features. This reinforces the manager’s belief that they need to stay involved in the details. It’s a vicious cycle.

I break it by asking teams to show their work—share assumptions, research, opportunity solution trees, and evidence behind choices—and by giving feedback on the thinking, not just the solutions.

The accountability trap. When performance reviews are tied to hitting outcomes, teams play it safe. They sandbag their targets. They disguise outputs as outcomes to guarantee “success.”

I treat outcomes as learning opportunities first. When we start on a new outcome, I set a learning goal—“learn what moves the needle on this metric”—before a performance goal—“increase X by Y%.” This creates space to explore without fear.

How I get teams started with better outcomes

Translate business outcomes to product outcomes. Business outcomes like revenue, retention, and market share are lagging indicators—by the time you see them, it’s too late to act. Product outcomes measure behavior changes within the product that lead to those business results. They’re leading indicators within the team’s control.

Negotiate outcomes with your team. Outcome-setting should be a two-way conversation. Leadership brings the cross-company context. The team brings customer insight and technical realities. Neither side dictates; we co-own the target and the constraints.

Stop celebrating shipped features and start celebrating change. This visual contrasts a feature factory mindset with a true product team, urging teams to track impact, not output, and define success by outcomes.

Expect to iterate on your metrics. Your first outcome metric probably won’t be right. That’s normal. Sonja at tails.com went through four iterations—from 90-day retention to 30-day to 5-day to behavior-based metrics—before landing on something actionable. Thomas at Bluestone Analytics iterated three or four times before finding the right metric. Iteration is the work.

Watch for common mistakes. Outputs disguised as outcomes. Traction metrics masquerading as product outcomes. Sentiment metrics without direction. Business outcomes assigned directly to product teams without translating to behavior change.

Use the right artifacts. Replace feature roadmaps with an opportunity solution tree to explore multiple paths, test assumptions, and sequence bets explicitly against a clear outcome.

Align OKRs with outcomes. If your company uses OKRs, make sure the “KR”s are true product outcomes (behavior change and value creation), not a list of features to ship.

The bottom line

When we shift from an output-first mindset to an outcome-first mindset, it doesn’t mean that outputs stop mattering. Product teams will always ship features, and the ability to do so quickly and with quality still matters. This shift simply ensures those features achieve the intended impact. We aren’t done when we ship—we’re done when what we shipped has the intended impact.

Measure success by the impact of what you ship and you’ll build a product team that learns, adapts, and creates real value. Measure success by what you ship and you’ll get a feature factory.

Quick self-check: is your “outcome” really an outcome?

Ask yourself: 1) Does it measure a behavior change or a sentiment tied to value creation? 2) Could we hit it without helping customers? 3) Is it adoption of a single feature (a traction metric) or a result that customers and the business care about? 4) Do we have a counter-metric to prevent unintended harm? If you stumble on any of these, refine it before you commit.

Inspired by this post on Product Talk.

March 18, 2026
From Resolutions to Outcomes: How We Price AI Agents Fairly and Amplify Customer Value

I’ve long believed a simple truth about AI in customer support: if AI is going to earn trust, pricing has to be aligned with value. That principle has guided my product decisions and the way I hold our teams accountable for measurable outcomes, not activity.

When we shared our perspective on pricing AI Agents in 2023, we made a simple argument: if AI is going to earn trust, pricing has to be aligned with value. At the time for Fin, that value was clear. You pay when the AI resolves a customer’s problem. If it doesn’t, you don’t. That’s fair, easy to understand, and grounded in results, not activity. We were the first to introduce this pricing model because we believed that pricing and value should be inherently linked.

That belief hasn’t changed, it’s grown stronger over time. What’s changed is what Fin can do. As we expanded capabilities and pushed deeper into complex workflows, it became clear that measuring value solely by end-to-end resolutions no longer captured the full picture of impact.

Resolutions were the right place to start. Historically, we measured value based on whether Fin fully resolved a conversation on its own. These are known as resolutions and they gave support teams a clear way to measure ROI, easily comparing the cost of AI versus human support. They also aligned our incentives with our customers, as our revenue was directly tied to Fin’s performance.

That clarity worked. Today, more than 7,000 teams use Fin. Our average resolution rate across customers has increased every month and now stands at 67%, even as Fin increasingly handles more complex queries. That progress came from building an Agent that could take on harder problems and still deliver.

But as Fin got more powerful, “success” stopped being binary. I saw this first-hand in customer design sessions where policy, risk, and compliance needs rightly demanded human-in-the-loop confirmation. We weren’t failing to deliver value; we were delivering it differently.

Over the last couple of years, we invested heavily to ensure Fin could handle the most complex parts of support. As Fin’s capabilities expanded, customers began pushing what Fin can do for them by deploying Fin deeper into their workflows to handle the toughest queries.

In some cases, this required Fin to work in tandem with a human agent because that’s what customer policies and oversight needs dictated. Subscription changes, transaction disputes, billing issues, and other multi-step support scenarios can often require Fin to gather context, read and write to external systems, and execute actions before handing off to a human agent for confirmation.

Fin is still doing what it was configured for – intentionally handing off after doing more of the heavy lifting, saving valuable time for support teams and overall time to serve for their customers. But our pricing metric only recognized value when the conversation ended in a full “AI resolution” (i.e. a human was never involved).

That’s why we’re evolving Fin’s pricing metric from resolutions to outcomes. This shift reflects how customers now define value: not just in full automation, but in safe, efficient progress toward the right result across complex, multi-step, and policy-constrained workflows.

An outcome represents when Fin successfully completes the action it was configured to perform, as part of a conversation. Resolutions are still one type of outcome Fin can deliver, where it handles the issue end-to-end. Another type of outcome can be a Procedure where Fin gathers context, takes action, and hands the conversation off when that’s what customers configured it to do.

Kick off your journey with the #1 Agent—an AI partner designed to turn resolutions into real outcomes. Tap “Start a free trial” to explore faster, smarter customer service and see how Fin delivers value from day one.

Increasing end-to-end AI resolutions is still a core component of scaling Agents, but they are no longer the only measure of Fin's success and utility. Especially as Fin takes on more complex work. Moving to outcomes recognizes that solving a customer problem with full automation isn’t always appropriate. It’s about getting to the right result, safely, and efficiently.

As Fin’s capabilities expand, teams should feel empowered to use it in more nuanced, collaborative work. Outcomes support that by allowing customers to design workflows that meet compliance requirements and include a human agent when necessary. From a product management standpoint, this is how we align incentives, keep risk controls intact, and still accelerate time-to-value.

Fin is becoming even more powerful at handling complex, multi-step support queries. With outcomes, we can support that growth without constantly reinventing how value is measured. And this change gives us a strong pricing foundation that can scale as Fin continues to grow and take on more roles beyond service. This aligns with our vision of Fin becoming a “Customer Agent,” capable of handling the entire customer experience.

What this means for pricing is intentionally straightforward. An outcome will be counted when Fin successfully completes an action it was configured to perform, as part of a conversation. That keeps the model predictable for finance leaders while staying transparent for operators and product teams managing AI workflows.

The pricing model stays simple and the definition of value becomes more accurate. In other words, we’re doubling down on fairness, predictability, and competitiveness—core tenets for any consumption SaaS pricing strategy tied to real business impact.

When we first wrote about outcome-based pricing, we said that trust is the currency of AI. That’s still true. Trust is earned when customers see pricing move in lockstep with utility and risk posture, especially as gen AI and agentic AI take on higher-stakes tasks.

Pricing has to feel fair, it has to be predictable, and it has to stay competitive. Evolving from resolutions to outcomes isn’t a departure from that belief. It’s the natural maturation of how we measure value as AI moves from simple Q&A into complex procedures and human-in-the-loop collaboration.

Fin has grown more powerful because customers asked more of it. Outcomes are how we reflect that progress honestly, while staying true to the same principles that guided us from the start. This is product strategy in action: align incentives, measure what matters, and scale what works.

And as Fin continues to get stronger, we’ll keep holding ourselves to the same standard: price based on the value delivered. That’s how we build durable trust, sustainable ROI, and a better customer experience at scale.

Inspired by this post on The Intercom Blog.

March 12, 2026
Inside Zipline’s Wild Pivot: My Take on Hiring Heat-Seekers and Scaling to 5,000 Hospitals

I’m consistently drawn to stories where product strategy and operational grit collide to change real lives. Zipline, the world’s largest commercial autonomous delivery system, is one of those rare cases. Serving 5,000 hospitals across multiple countries and saving an estimated 17,000 lives per year, it embodies the kind of mission-driven execution I try to model in product management. The arc—from a near-dead home robot startup to a scrappy bet on drone blood delivery in Rwanda, to 135 million autonomous miles flown—offers some of the clearest lessons I’ve seen on hiring, leadership, and product-market fit under extreme constraints.

One principle that immediately resonated with me: why Zipline doesn’t hire for experience. The idea behind “Why Zipline hires teenagers over PhDs” isn’t a dismissal of expertise; it’s a commitment to learning velocity, ownership, and unteachable hunger. The best startup employees, as described here, are “heat-seeking missiles for pain”—people who chase the hardest problems, not the shiniest projects. In my org, I look for the same signal: candidates who can move from ambiguity to action, who find the bottleneck without being asked, and who care more about outcomes than optics.

I also appreciated the unapologetic stance that “blind references are a non-negotiable.” In high-stakes builds—especially in regulated or safety-critical categories—the cost of a mis-hire compounds. I routinely validate for two traits during references: intellectual humility and accountability. “Can candidates admit when they screwed up?” is a powerful filter. If someone can’t name a hard mistake and how they specifically changed as a result, they’re unlikely to scale with the organization.

Equally important is clarity about who not to hire. The employees Zipline doesn’t want are those who optimize for status, process theater, or low-friction work. In practice, that means pressure-testing for problem-finding, not just problem-solving. I often design interviews around messy, cross-functional constraints (regulatory, operational, and financial) to see who can integrate tradeoffs, not just ideate features. That’s how we build empowered product teams that ship consequential outcomes, not outputs.

There’s a reference to “Zipline’s secret leadership playbook,” and while the specifics remain private, the spirit is unmistakable: first principles decision making, ruthless focus, and a culture that rewards radical responsibility. Translating that to my product organization, I emphasize five behaviors: orient to the mission under uncertainty, run fast but close the loop with data, communicate constraints early and often, own the long tail of consequences (especially in safety and reliability), and scale judgment by teaching the why, not just the what. That blend of clarity and autonomy is the backbone of product management leadership at any growth stage.

On the other side of the culture coin is “Why you should always fire quickly” and “The brutal firing advice that shaped Keller’s leadership.” I’ve learned (sometimes the hard way) that slow decisions erode trust and team velocity. Moving quickly doesn’t mean being harsh; it means being fair, explicit, and humane—tight feedback loops, role clarity, and decisive action when the gap persists. If your bar is clear and your coaching is consistent, acting fast protects both the mission and the team’s energy.

Strategically, the origin story reads like a masterclass in choosing the right problem. The team moved “from toy robots to drone delivery: Zipline’s pivot,” then partnered deeply with Rwanda, where “How Rwanda’s health minister changed everything” is a pivotal moment. It wasn’t a linear climb—”How Zipline almost died – twice” and “Why Zipline’s launch was a ‘complete disaster’” underline a tough truth: breakthrough products rarely arrive fully formed. What matters is the operating cadence that turns early chaos into repeatable reliability—especially when the stakes are measured in minutes and lives.

Scaling from 1 hospital to 5000 required more than product brilliance; it demanded systems thinking across logistics, compliance, safety, and community trust. That’s stakeholder management at its highest level. The product lessons are durable: anchor on outcomes, not artifacts; build reliability as a feature; and practice founder-led GTM where your credibility is on the line with customers and regulators. This is where first principles decision making beats benchmarking—particularly in novel categories where there are no playbooks to copy.

There’s also a hard-nosed operational takeaway in “The 10x hardware cost rule every founder should know.” My read: assume total cost of ownership will balloon once you account for manufacturing variability, support, redundancy, maintenance, and compliance. In product strategy, I treat those multipliers as design inputs, not afterthoughts. If the unit economics can’t survive these realities, the idea isn’t ready—no matter how elegant the prototype looks in a lab.

Across all of this, a few product management patterns stand out for me: build teams around outcomes vs output OKRs; hire for slope, not just intercept; make continuous discovery routine with real users (in this case, clinicians and health systems); and treat operational excellence as a product surface. When a mission is this consequential, culture becomes a safety system—and every leadership decision compounds into either speed with quality or speed with regret.

For leaders building in complex domains, this journey is a blueprint: pick problems that matter, hire “heat-seeking missiles for pain,” keep blind references non-negotiable, lead with first principles, and scale with responsibility. Do that well and even a “complete disaster” launch can become the inflection point of a category-defining company that flies 135 million autonomous miles and saves 17,000 lives per year.

March 12, 2026
How I Used Claude Code to Run a Full Content Audit in Hours—and Uncovered Big SEO Wins

Can an AI agent actually run a credible content audit end to end? I put that to the test. In my role leading product at a high-growth SaaS and as a hands-on content strategist, I’m constantly balancing depth with reach. During a recent office-hours discussion, someone asked me to zoom out and explain when to use Claude Code. That prompt inspired me to launch a running series—Conversations with Claude—showing exactly how I apply it to real product management and SEO problems.

I’m a heavy user and share what works for me. I receive no compensation from Anthropic for this series; if that ever changes, I’ll disclose it. With that out of the way, let’s dive into how I had Claude conduct a full content audit—and why the results exceeded my expectations.

For the first installment, I chose a fairly complex use case: a comprehensive content audit of my site. I expected this to be a slog. Instead, it was refreshingly fast and rigorous once I set Claude up with the right scaffolding.

I kicked off with a simple directive: start by asking clarifying questions, proceed step by step, and capture notes in a shared task file. I also provided deep context—specifically, the CDH Book (15 chapters + intro) and my entire blog archive in markdown—so the model could reason with my actual corpus rather than guessing from sparse prompts.

Claude began with smart clarifying questions that framed the analysis well. Scope of keywords: Should it focus strictly on concepts unique to or heavily associated with my work like "opportunity solution tree" and "continuous discovery," or also include broader product management terms such as "product outcomes," "assumption testing," and "customer interviewing"? Keyword geography: Start with US-only or include UK/global? Blog coverage assessment: What counts as "well covered"—dedicated deep dives or credible coverage within broader posts? Output format: Add findings to the task file or create a separate deliverable?

Peek inside a Notion-style page that turns content strategy into action: a content-audit task with due date and tags, plus clear steps for keyword research, blog gap analysis, and SEO improvements.

I replied: 1. both 2. us only is a good place to start 3. evaluate this based on how well we rank for the keyword, if we rank reasonably well, you might suggest content improvements to rank better, if we don't rank at all, then you might suggest a whole new article 4. add to the task file

From there, Claude read the CDH Book, extracted roughly 100 keywords, ran them through Keywords Everywhere in two batches of 50 to capture search volume, and pulled current domain rankings and traffic metrics. Within minutes, I had a high-signal view of what’s working, where we’re invisible, and how to prioritize fixes.

The good news came first: we own our branded terms—#1 for "product discovery," "opportunity solution tree," "continuous discovery," and "product trio." That brand equity is doing real work for us.

The biggest gaps were in broad topics the CDH Book covers but where there’s no targeted content. "Outcomes vs outputs" (1,300/mo) — Arguably THE central thesis of CDH, and no ranking. This is the single biggest gap. "Product roadmap" (4,400/mo) — I have a strong anti-roadmap POV but no content targeting this. "Product strategy" (1,900/mo) — Ch 7 argues strategy = opportunity selection. Strong differentiator, no ranking. "Story mapping" (5,400/mo) — I use story maps uniquely (for surfacing assumptions). Huge volume. "Stakeholder management" (2,900/mo) — Ch 13 is entirely about this. No ranking. "Pre-mortem" (4,400/mo) — I cover this as a product discovery technique. No ranking.

Inside a dark-themed writing workspace, a long-form chapter is open while a tidy folder tree catalogs pages and chapters. The scene invites readers to think like auditors—inventory content, track structure, and surface gaps with AI assistance.

The trojan horse opportunity: High-volume generic terms like story mapping, pre-mortem, and usability testing could bring in readers who don't know about CDH yet. Write about these broadly-searched topics with my specific product-discovery angle.

In just a few minutes, Claude generated an analysis of what keywords we ranked for and at what position, a ranked set of high-, medium-, and lower-volume (but strategic) keywords where we didn’t rank yet had relevant content, concrete net-new topics to close the gaps, and a list of existing articles to update to lift their SERP positions. It worked far better than I expected.

Here’s how I set it up so the model could deliver: I didn’t simply ask Claude.ai to "audit my site" and hope for the best. I supplied rich, relevant context (my book and all blog posts as markdown) so it could anchor on my language, frameworks, and mental models. I paired that with live data via APIs like Keywords Everywhere to ground recommendations in actual search volume and competitive rankings. With the right inputs, Claude Code behaved like a capable research analyst and an SEO strategist—able to reason, prioritize, and suggest high-leverage actions.

Next, I went deeper and used the findings to draft a long-form article that addresses the biggest gap—"Outcomes vs outputs"—and ties it directly to product roadmapping and sprint planning. I wove in continuous discovery practices, opportunity solution tree techniques, and product trios collaboration to make it actionable for empowered product teams. I’ll share the end-to-end workflow—including files, prompts, and the editorial QA checklist—in a follow-up.

If you’re new to Claude Code and want a practical starting point, replicate the setup above: assemble your canonical sources in markdown, define a clear evaluation rubric, and ground keyword research with reliable volume data. If you want my exact task file, clarifying-question template, and step-by-step audit rubric, tell me which content gap you’d prioritize first and why—I’ll tailor the walkthrough to the highest-interest topic.

Inspired by this post on Product Talk.

March 11, 2026