Tag: outcomes vs output OKRs

AI Product Management Skills: A Practical 12-Month Roadmap
You may know how to prompt a model and still feel unprepared to own an AI product. That gap is real. Producing a plausible response is easy; deciding what should be built, how to evaluate it, when to trust it, and whether it improved the user journey requires a broader product skill set.

The useful roadmap is not a queue of courses or tools. It is a sequence of increasingly consequential work: understand model behavior, turn ideas into testable artifacts, ship a bounded workflow, and then build the operating system that lets more teams do it responsibly.

What you should be able to do after 12 months

An AI product manager does not need to become a machine-learning engineer. You do need enough technical judgment to frame a feasible problem, challenge an architecture, inspect failures, define an evaluation, and make a release decision with engineering and design.

The 12-month progression from foundations to governed scale works because each phase produces evidence needed by the next one. You learn model constraints before promising a user experience. You build evaluations before exposing the system to real customers. You prove one workflow before standardizing it across a product organization.

Key takeaways
- Months 1-3: Learn model behavior, context management, prompting, retrieval, privacy, and data governance. Apply them to product discovery.
- Months 4-6: Build prototypes and an evaluation system. Instrument activation and retention before treating the feature as ready.
- Months 7-9: Ship a bounded AI-enabled workflow with safeguards, monitoring, recovery paths, and clear human control.
- Months 10-12: Standardize evaluation gates, analytics, discovery practices, roadmapping, and outcome-based reporting.
Treat these as capability gates, not calendar milestones. If you cannot explain why a prototype failed in month six, more production infrastructure will not fix the problem. If you cannot show that users received value in month nine, scaling the feature will only distribute uncertainty.

By the end of the roadmap, your portfolio should contain operating artifacts rather than course certificates: an AI product brief, a prompt and retrieval pattern, a reusable evaluation set, an instrumented production workflow, a risk checklist, and a scale playbook. Those artifacts demonstrate that you can move from possibility to accountable product performance.

Months 1-3: Learn enough AI to make sound product decisions

Your first objective is not technical fluency for its own sake. It is learning where model behavior changes a familiar product decision. A deterministic feature is expected to return the same result for the same state. A generative feature can produce different, incomplete, or confidently incorrect outputs. That changes acceptance criteria, testing, interface design, and the meaning of “done.”

Build an operator’s mental model

Work through four capabilities in order:
1. Model behavior and constraints: Learn what the model receives, what it produces, where variability enters, and which failures matter to the user. You should be able to distinguish a capability problem from a context, instruction, or workflow problem.
2. Context window management: Decide which information belongs in the model’s working context, which information is stale, and which information should never be sent. More context is not automatically better context. Irrelevant material can obscure the evidence the task actually requires.
3. Prompting as product specification: Write reusable instructions that state the task, relevant context, constraints, required output, and quality criteria. Save the prompt with examples of both acceptable and unacceptable behavior. A prompt library is useful only when another person can reproduce and assess the result.
4. Retrieval-first design: For tasks that depend on changing or proprietary knowledge, learn the basic pipeline: retrieve relevant approved information, give that information to the model, generate an answer, and preserve enough traceability to investigate failures. This is a product choice as much as an architecture choice because it determines what the experience can reliably know.
Pair these capabilities with privacy-by-design and data governance from the beginning. Before using customer or company information, write down which data classes are permitted, who can access them, where they may be retained, and what must be removed or masked. If those answers are unclear, use synthetic or explicitly approved material until the policy is settled. Avoiding sensitive data at the prototype stage is safer than trying to remove it after it has spread through prompts, logs, and evaluation files.

Apply the foundations to product discovery

Discovery gives you a low-risk place to practise. Use generative AI to summarize research, cluster feedback, compare recurring needs, or sharpen a value proposition. Keep the model in an assistive role: every synthesized theme should remain traceable to the underlying customer evidence. If you cannot inspect the feedback behind a cluster, you cannot tell whether the model found a pattern or flattened important differences.

Create an AI product brief for one candidate problem. Include:
- The user and the job they are trying to complete.
- The decision or work the model will assist with.
- The inputs the system may use and the inputs it must reject.
- The expected output and the conditions that make it useful.
- The consequence of a wrong, missing, or delayed output.
- The point at which a person reviews, edits, approves, or overrides the result.
- The product signal that would show improved user behavior.
You are ready for the next phase when you can explain the proposed experience without hiding behind model vocabulary. You should be able to identify the necessary context, name the important failure modes, explain whether retrieval is needed, and show how the user remains in control.

Months 4-6: Prototype the experience and build its evaluation system

A prototype is valuable when it tests uncertainty, not when it merely looks polished. Use generative AI to accelerate UX mocks, PRDs, in-app guidance, and alternative interaction flows, but spend the saved time on the questions that determine whether the product deserves to ship.

Prototype the entire decision loop. Show where the user supplies context, how the result is presented, what happens when the answer is weak, how the user corrects it, and whether that correction improves the next step. The error state is part of the primary AI experience; hiding it until engineering integration creates false confidence.

Use evaluation as a development method

Eval-driven development turns a vague judgment such as “the answers seem good” into a repeatable product decision. Build the evaluation alongside the prototype:
1. Define the task boundary. State what the system is expected to do and what remains outside its responsibility.
2. Collect representative cases. Include normal inputs, ambiguous inputs, missing information, adversarial behavior, and cases where the correct response is to stop or ask for clarification.
3. Write a scoring rubric. Assess the properties the user actually needs, such as correctness, relevance, completeness, appropriate tone, traceability, or compliance with a constraint.
4. Record a baseline. Compare the proposed experience with the current workflow or a simpler non-AI alternative. A model output is not valuable merely because it exists.
5. Inspect failure patterns. Separate prompt failures, missing-context failures, retrieval failures, model limitations, interface confusion, and policy violations. Each category points to a different remedy.
6. Set a release gate. Decide which failures block launch, which require human review, and which are tolerable in the intended use case. The gate should reflect the consequence of error, not enthusiasm for the feature.
Keep the evaluation set versioned with the product. When you change the prompt, model, retrieval logic, or available tools, rerun the same cases. Otherwise, an apparent improvement in one example can conceal regressions elsewhere.

Instrument behavior before launch

Quality evaluation and product analytics answer different questions. An evaluation tells you whether the system behaved acceptably on known cases. Behavioral analytics tells you whether customers reached value in the product.

Define the journey in Amplitude or your existing analytics system before exposing the prototype broadly. Capture the moment a user encounters the feature, supplies enough information, receives an output, accepts or edits it, completes the downstream task, returns to use it again, abandons it, or escalates to a person. That sequence gives you activation and retention signals rather than a vanity count of generations.

If you run an A/B test, choose the minimum detectable effect before launch. The decision matters because an experiment that cannot detect a product-relevant change may produce an inconclusive result even when the dashboard looks busy. Define the primary outcome, guardrail metrics, exposure rule, and analysis plan before looking at the results.

Move forward when the prototype solves a defined task, the evaluation catches meaningful failures, the events expose the user journey, and the experiment can answer a decision. A persuasive demo without those four elements is still a demo.

Months 7-9: Ship a bounded workflow, not an open-ended assistant

The production phase is where product judgment becomes visible. Start with a workflow that has a recognizable beginning, end, and owner. Customer-support, CRM, and guided-onboarding workflows are useful patterns because the AI can sit inside an existing user journey rather than asking customers to invent a use case from a blank chat box.

Screen the workflow before committing engineering capacity:
- Is the user’s job clear enough to define a successful completion?
- Does the system have access to approved, relevant context?
- Can you observe whether the user accepted, corrected, ignored, or escalated the output?
- What happens to the customer if the system is wrong?
- Can a consequential action be paused, reviewed, or reversed?
- Is a generative system materially better than a rule, search result, template, or conventional workflow?
Use agentic AI only when the job genuinely requires several connected steps, tool use, or changing plans. Additional autonomy also creates more places for permissions, context, and actions to go wrong. Begin with the narrowest useful boundary, then expand it when production evidence supports the change.

Map the production loop before building it

A product trio should be able to trace the complete workflow on one page:
1. Trigger: What user action or system event begins the workflow?
2. Context: Which profile, conversation, account, or knowledge records are retrieved?
3. Generation or decision: What does the model produce, classify, recommend, or plan?
4. Tool action: Which systems can it read from or write to, and under whose authority?
5. Human checkpoint: Which output can be edited, rejected, or approved before it changes customer data or sends an external message?
6. Recovery: How does the product handle low confidence, missing data, tool failure, timeouts, or a user correction?
7. Learning signal: Which feedback updates the evaluation set, product decision, or workflow design?
Place safeguards at the point of consequence. Restrict the data and tools the workflow can access. Require explicit approval before a high-impact external action. Preserve a record of the inputs, retrieved context, output, action, and user response so a failure can be investigated. If an action cannot be safely reversed, keep it behind human review until the risk has been addressed.

Threat detection and response also need a product playbook. Define what counts as suspicious input or abnormal behavior, who receives the alert, how the workflow is disabled or contained, what evidence is retained, and how affected users are handled. The escalation path should exist before the first serious incident, not be improvised during it.

Monitor the experience at four levels
- User outcome: Did the customer complete the intended job with less effort or fewer avoidable handoffs?
- AI quality: Are the evaluation scores and failure categories changing after releases?
- Workflow health: Are retrieval, model, and tool steps completing as expected, and can the team locate the failing stage?
- Risk: Are users overriding outputs, escalating cases, encountering policy violations, or triggering suspicious behavior?
Track deployment frequency because a team that can release safely can also learn faster. Do not confuse release frequency with customer value, though. The useful loop connects a deployment to a quality change, a behavior change, and a decision about what to do next.

Months 10-12: Turn one successful product into a repeatable system

Scaling is not copying the same AI feature into every surface. It is making the successful practices reusable while preserving room for different user risks and workflow requirements.

Codify the operating assets that reduced uncertainty during the earlier phases:
- An intake template that starts with the user problem, current workflow, expected outcome, and consequence of error.
- A continuous-discovery practice that keeps generated themes connected to original customer evidence.
- A retrieval-first architecture template for products that depend on approved or changing knowledge.
- A shared prompt library with owners, versions, expected behavior, and known limitations.
- An evaluation gate covering representative cases, blocking failures, human-review requirements, and regression checks.
- A production checklist covering permissions, privacy, observability, recovery, threat response, and user control.
- A monitoring cadence that connects product behavior, AI quality, workflow health, and risk.
Do not impose one universal quality threshold on every AI feature. A low-consequence drafting aid and a workflow that changes a customer account do not carry the same downside. Use the same evaluation process across teams, but set release gates according to the task, affected user, reversibility, and consequence of failure.

Use common analytics without erasing product context

A unified analytics model lets leadership compare lift across products without forcing every team to use an identical funnel. Standardize the basic meanings of exposure, meaningful use, successful task completion, correction, abandonment, escalation, and return usage. Then let each product define the events that represent those states in its own journey.

This is also where roadmapping and sprint planning should move from output commitments to outcome-based decisions. “Ship an AI assistant” is an output. A useful objective describes the customer behavior or business result that should change. The roadmap can then contain competing ways to produce that change, including improvements that do not require AI.

Use a consistent stakeholder narrative:
- What shipped: The workflow or capability placed in users’ hands.
- What moved: The user, product, quality, and risk signals that changed.
- What was learned: The assumptions confirmed, rejected, or still unresolved.
- What happens next: The decision to expand, revise, contain, or stop the work.
That structure prevents activity from masquerading as progress. It also gives executives a clear basis for funding decisions: evidence of value, evidence of control, and a specific next bet.

Start this week with one recurring user decision. Write its AI product brief, run the workflow manually with permitted data, and save the successful and failed cases as the beginning of an evaluation set. If you cannot define a good result or the consequence of a bad one, stay in discovery. If you can, you have a concrete first artifact and a reason to proceed to a prototype.

References
- Shivam.Consulting Blog – Master AI as a Product Manager in 12 Months: My 2026 Roadmap to Ship Smarter, Faster
December 17, 2025

Analytics-Led Product Growth: A Practical Operating System

Your dashboards are busy, the roadmap is full, and every team can produce a chart that supports its preferred priority. Yet when activation changes or retention weakens, nobody can say with confidence which customer behavior moved, why it moved, or what decision should follow.

That is the problem analytics-led product growth should solve. It connects a customer outcome to an observable behavior, a trustworthy measurement, and a product decision. Build that chain well and analytics becomes part of how you choose, test, and scale growth bets – not a reporting layer added after the roadmap is set.

Start with the decision, not the dashboard

A useful metric has a job. It helps you make a defined decision about a defined customer journey. If nobody can explain what would change when the metric rises, falls, or stays flat, the metric is decoration.

Before asking an analyst to build a chart, write the decision you are trying to make. Use this sequence:

Name the business outcome. Examples include durable revenue, lower cost-to-serve, or greater adoption of a valuable workflow.
Name the customer outcome that must occur first. A customer may need to complete setup, receive an approval, publish something, invite a collaborator, or finish another meaningful job.
Identify the observable behavior that proves the customer reached that outcome. A login or button click rarely proves value on its own.
Choose the leading metric that will reveal movement soon enough to guide a decision.
Add guardrails for consequences you are unwilling to trade away, such as errors, support contacts, failed verification, or degraded retention.
State the decision in advance: if the primary metric moves and the guardrails remain healthy, what will you ship, stop, expand, or investigate?

This creates a small driver tree. At the top is the result the business needs. Under it are the customer behaviors capable of producing that result. Beneath those sit the product changes you can test. It keeps the team from mistaking a feature launch for progress.

For example, “launch a new onboarding tour” is an output. “Increase the share of eligible new customers who complete onboarding and reach first value, without increasing support contacts” is an outcome. The second formulation tells you what to measure, which trade-off to protect, and how to judge the work. That is why connecting a north star, outcome-based objectives, and trustworthy instrumentation matters before experimentation begins.

Be equally precise about activation. Activation is not whatever event produces the most convenient chart. It is the earliest behavior that credibly indicates the customer has experienced meaningful value. You should be able to explain why that behavior matters and verify whether customers who complete it behave differently later. A relationship with retention is evidence worth investigating, but it is not proof of causation.

Instrument the journey so the numbers can be trusted

Growth analysis breaks when the event model describes the interface instead of the customer journey. “Button clicked” tells you that an interaction happened. “Application submitted successfully” tells you that the customer completed a meaningful step. Instrument the confirmed outcome whenever the product can observe it.

A usable event taxonomy needs more than consistent names. For each critical event, document:

The exact behavior represented by the event.
The condition that causes it to fire, including whether it records an attempt or a confirmed success.
The properties needed for legitimate analysis, such as customer profile, plan, entry channel, product surface, or journey variant.
The identity rule that connects anonymous activity, authenticated users, and accounts.
The event owner and the product change that introduced or modified it.
Known exclusions, delayed events, retries, and duplicate-event behavior.

The distinction between attempt and success is especially important. If an event fires when a customer selects “Submit,” it can overstate completion when validation, verification, payment, or a server error prevents the operation from finishing. Record the attempt when it helps diagnose friction, but use the confirmed success event to measure conversion.

Test the instrumentation by completing the journey yourself in a controlled environment. Confirm that events appear once, in the expected order, with the expected identity and permitted properties. Then test an error path, a retry, an interrupted session, and a return on another session. A tidy taxonomy document cannot compensate for events that fire inconsistently in the product.

Data quality also needs an operating guardrail. Watch for sudden volume changes, missing properties, impossible event sequences, duplicate events, and identity merges that shift historical counts. Assign an owner to investigate those conditions. Otherwise, a tracking defect can enter a roadmap discussion disguised as a customer trend.

In regulated or trust-sensitive journeys, collect only the properties needed for an approved purpose. Do not place sensitive customer values in event names or unrestricted properties. Verification steps, rejection reasons, and error details can be analytically useful, but careless collection can create privacy, access-control, and regulatory exposure. Apply privacy-by-design and data-governance rules before the event reaches the analytics platform, not after it has been copied into dashboards.

This foundation is not analytics housekeeping. A precise event taxonomy with explicit data-quality guardrails determines whether activation, retention, and experiment results are credible enough to guide investment.

Read growth as a sequence of customer behaviors

No single metric can explain growth. Read the journey as a sequence: the customer arrives with intent, reaches first value, returns for value, adopts more of the useful workflow, and does so without creating unsustainable service costs. Each stage answers a different question.

Journey stage	Useful signals	Question to answer	Diagnostic cuts
First value	Activation rate, onboarding completion, time-to-first-value	Are eligible new customers reaching the first meaningful outcome, and how much effort does it require?	Customer profile, entry channel, plan, journey version
Conversion	Step conversion and end-to-end funnel conversion	Where does demonstrated intent fail to become a completed outcome?	Error state, verification path, device or product surface
Retention	D7, D30, and D90 cohort retention	Which customers return at a meaningful interval after starting or activating?	Start cohort, activation behavior, customer profile
Depth	Feature adoption and weekly-to-monthly active ratio	Is recurring value broad and repeated, or concentrated in shallow activity?	Key workflow, account maturity, role or use case
Service economics	Support contact rate and cost-to-serve	Is growth creating a scalable customer experience?	Journey step, error category, customer profile

D7, D30, and D90 are observation points, not universal definitions of healthy retention. Choose intervals that match the product’s natural usage cycle and state the qualifying behavior. “Returned” could mean opening the product, completing the core workflow, or receiving recurring value. Those definitions produce different answers.

Cohorts protect you from another common mistake: combining customers who began at different times. Group customers by a meaningful start event and period, then compare like with like. If a change affected only new customers, an all-user average can hide its impact. If one customer profile improved while another declined, the aggregate can falsely imply stability.

Start diagnosis at the narrowest point where behavior diverges. If onboarding completion falls, inspect the step-level funnel and error states before redesigning the whole experience. If activation rises but D30 retention does not, test whether the activation definition captures real value or merely easier completion. If adoption grows alongside support contacts, inspect whether customers are discovering value or being forced through confusing work.

Benchmarks help you calibrate the baseline and find unusually weak stages, especially when you can compare activation, time-to-first-value, funnel conversion, retention, adoption, and cost-to-serve. They are not targets to copy blindly. Confirm that the compared products use compatible populations, events, intervals, and definitions. Then use the gap to choose where to investigate, not to declare the solution.

Turn a behavioral signal into a disciplined experiment

An unusual funnel drop or cohort difference is a clue. It becomes a product bet only after you identify a plausible mechanism. Move from observation to hypothesis with one sentence: for a defined segment, changing a defined part of the experience should change a defined behavior because of a stated customer problem.

Every experiment brief should contain:

The observed behavior and the segment in which it occurs.
The customer problem or mechanism that could explain it.
The proposed change and the behavior it is intended to influence.
One primary outcome metric tied to the hypothesis.
Guardrail metrics covering important downstream or risk consequences.
The minimum detectable effect, or the smallest difference that would be meaningful enough to change the decision.
The allocation, eligibility rules, analysis window, and stopping rule defined before results are inspected.
The action attached to each possible result: ship, revise, stop, investigate, or collect more evidence.

The minimum detectable effect helps determine whether an A/B test can answer a decision responsibly. Setting it after seeing the data defeats its purpose. If the effect you care about requires more eligible traffic or time than the decision can support, narrow the question, choose a larger meaningful change, or use discovery evidence to reduce uncertainty. Do not label an underpowered result as proof that the idea works or does not work.

Not every problem deserves an A/B test. Fix a confirmed tracking defect before interpreting the metric. Fix a severe error or harmful experience when withholding the repair would be irresponsible. Use an experiment when there is genuine uncertainty about how a product change will affect behavior and a valid comparison can resolve that uncertainty.

Read outcomes without spin. If the primary metric improves and the guardrails remain acceptable, the change has earned consideration for rollout. If the primary metric is flat, do not rescue the result with an unrelated secondary metric chosen afterward. If a guardrail deteriorates, investigate the trade-off even when the primary metric wins. If the result is inconclusive, record it as inconclusive and decide whether the remaining uncertainty justifies more investment.

In-app guidance is a good example of why the outcome matters more than the intervention. Guide views, tooltip clicks, and tour completion describe exposure. The actual question is whether the intended customer reaches value sooner, completes the journey, adopts the useful feature, or needs less assistance. A stack that combines product analytics, in-app guidance, segmentation, and controlled testing can connect those layers, but the tools cannot choose the right success definition for you.

Build an operating cadence that changes the roadmap

Analytics-led growth becomes real when a metric review ends in an owned decision. A recurring meeting that only narrates charts creates reporting work, not product learning. Separate journey diagnosis from portfolio allocation so each conversation has a clear purpose.

Run a weekly journey review

Use the weekly review to inspect one or two critical journeys, not every dashboard. Product, design, engineering, and the relevant data partner should arrive with the same metric definitions. Add risk, operations, support, or go-to-market partners when the journey crosses their responsibilities.

Begin with the customer outcome and the eligible population.
Review movement in the primary metric, guardrails, and important segments.
Separate data-quality issues from actual behavior changes.
Identify the earliest journey step where the affected cohort diverges.
Choose one decision: repair instrumentation, fix a known defect, continue discovery, launch a test, expand a result, or stop work.
Record the owner, next evidence, and decision date.

A short decision log is valuable because it preserves what the team believed before the result was known. Record the observation, hypothesis, metric definition, decision, and eventual outcome. This prevents old ideas from returning without new evidence and makes changes to metric definitions visible.

Use a monthly portfolio review for allocation

The portfolio review should decide where product capacity goes. Compare opportunities using the size of the affected segment, the severity of the broken journey, the connection to a strategic outcome, the strength of the evidence, the cost of learning, and the downside represented by guardrails. This is where benchmarks, discovery, experiment results, and commercial context meet.

Require every material roadmap bet to identify its driver metric and measurement plan. An initiative can still be strategically necessary when immediate experimental proof is unavailable, but that uncertainty should be explicit. Do not disguise a conviction bet as a data-backed conclusion.

Keep objectives focused on outcomes rather than delivery. A roadmap item may be a redesigned verification flow, a product tour, or a new workflow. The objective should describe the customer and business result. The key results should show whether the relevant behavior improved while guardrails remained healthy. That structure gives product, risk, operations, and go-to-market partners a common basis for trade-offs.

Key takeaways

Begin with a product decision and customer outcome; build the dashboard only after both are clear.
Define activation as evidence of first value, not as signup, login, or another convenient activity event.
Instrument confirmed outcomes, attempts, and error states separately so conversion can be diagnosed accurately.
Read activation, conversion, retention, adoption depth, and service economics as a connected journey.
Use cohorts and meaningful segments before trusting an aggregate trend.
Define the primary metric, guardrails, minimum detectable effect, and stopping rule before running an A/B test.
End every analytics review with a decision, an owner, and the next evidence required.

Choose one growth journey this week. Write down the first valuable customer outcome, audit the events needed to reconstruct it, and identify the one decision your current data should support. That small exercise will show you whether analytics is guiding the product or merely describing it.

References

December 16, 2025

2026 Support Capacity Playbook: Bold AI Automation, Smarter Staffing, Zero‑Surprise SLAs

Capacity planning has always been a high-stakes exercise in customer service, and when you miss, the signal shows up fast in backlogs and SLAs. I’ve lived that pressure across multiple cycles, and 2026 will reward teams that plan differently. AI fundamentally changes capacity planning because it changes the work. It resolves the bulk of your volume, speeds up execution, and elevates the complexity and value of what humans handle. The consequence is simple: planning models must evolve. This is the final installment in my 2026 customer service planning series, and I’m focusing on the tension every leader feels right now—be ambitious about automation, but avoid the trap of understaffing if your assumptions don’t hold. My goal is to share how AI changes the logic of capacity planning, what I’ve learned implementing these practices with my team and with customers, and the common traps to avoid. Traditional planning rests on relatively stable assumptions: volume grows predictably, work types stay consistent, handle times don’t swing dramatically, and productivity improves slowly with better tools and training. In an AI-first model, none of that is guaranteed, and the fundamentals flip. The mix of work changes as AI absorbs a growing share of simpler conversations, leaving humans with deeper, more time-consuming issues that demand human-to-human connection. Demand can actually increase when you remove friction, so AI can both resolve more and attract more volume. Human time splits differently as teammates solve customer problems and also review AI behavior, give feedback, improve content, and support system-level work. Performance becomes dynamic, not fixed—automation rate isn’t a one-time number; it can rise with care and fall with neglect. If you plan for 2026 using a pre-AI model—assuming similar productivity, similar work mix, and a linear relationship between volume and headcount—you will underestimate what it now takes to run a high-performing support organization. There are many metrics you can track, but the one to put at the center is automation rate (AI Agent involvement rate × AI Agent resolution rate). This single construct tells me what share of total volume AI actually resolves, how much work remains for humans, how much additional demand humans can absorb, and how ambitious I can be with headcount. Early in the journey, I prioritize raising involvement—getting the AI involved in more conversations. Once involvement is high, I shift to resolution on the hardest remaining work, where each additional 1% of automation can represent several people’s worth of capacity. In my 2026 plans, automation rate sits alongside projected inbound volume, average “output” per person for the more complex work that remains, and occupancy—how much time is allocated to customer-facing interactions versus operational and strategic work. Together, those inputs give a realistic picture of how many people you need and where they should spend their time. First, plan boldly on automation, but match it with investment. I do not cap automation assumptions at 40–50% “because AI is new.” Many teams are already modeling 60%, 70%, even 80%+ for 2026—when they invest in AI ownership and content. The investment is non-negotiable: named ownership for AI performance (AI ops, knowledge management, conversation design), clear automation targets by work type (e.g., informational vs. personalized vs. actions vs. deep troubleshooting), realistic expectations for what’s easy to automate and what’s not, and a concrete plan to raise automation over time in monthly or quarterly steps rather than a single jump. To decide where to invest first, I dig into the data. I start with the biggest volume drivers, separate content-led issues from those dependent on data or complex procedures, assume higher resolution potential for content-led topics once the knowledge base is in shape, and set more modest initial resolution expectations for system-dependent flows. Then I stair-step improvements as the systems, data contracts, and workflows mature. In short, bold automation goals only work when paired with the team structure, content, and systems required to reach them—and the discipline to iterate. Second, expect human “output” per person to go down. That’s a mindset shift. Historically, we assumed individual productivity would stay flat or tick up as tools improved. In an AI-first model, humans handle fewer conversations but more complex, cross-functional issues—and create more value despite lower case counts. I model a lower “cases closed per person” than prior-year baselines, explicitly assume the remaining work is more complex and time-consuming, and redefine productivity to include system-level work like AI Agent improvements, content updates, and policy or workflow change management. I also report “capacity created” from automation alongside human outputs, so leadership sees the full picture. Third, rethink occupancy: more time off the queues, on higher-value work. Traditional occupancy splits time between inbox and training, meetings, and breaks. Now there’s an expanding “out-of-inbox” portfolio that directly affects AI performance and overall capacity: reviewing AI-handled conversations, improving AI Agent triaging and handovers, contributing to content and procedures, feeding insights to product and engineering, and supporting system changes that reduce future volume. I set lower inbox occupancy targets than before and make the rationale explicit. People aren’t working less—they’re working differently. In planning, I assume more time spent on improvement and system work, make it visible (for example, X% in inbox and Y% on AI and system improvement), and treat this as critical, not a “nice to have.” If you don’t proactively allocate it, it won’t happen—and your automation and performance targets will suffer. Fourth, work with the finance team early, and treat your plan as a set of assumptions. Capacity planning with AI is a set of bets across automation rate, human output, demand growth, occupancy, and where surplus capacity (if any) goes. I bring finance in early, show that the plan is dynamic and directly tied to AI performance, and label every lever as an assumption with ranges. I commit to a quarterly review cadence with finance to compare assumptions versus reality and adjust headcount, targets, and investment as needed. The risks are real: if automation grows slower than expected and you stop backfilling too early, you’ll be understaffed for months. Hiring and onboarding take time, so course-correcting late creates strain. If you do produce surplus capacity, have a clear strategy to reallocate those teammates to higher-value work—improving systems, feeding insights back to product, supporting new channels, and driving proactive CX—rather than defaulting to reductions. I also set explicit guardrails—if automation rate misses by five points for two consecutive months, we pause planned reductions and revisit hiring gates. If it over-performs, we shift people into backlog eradication, content upgrades, or proactive outreach, so we bank compounding value. To set your team up for success in 2026, anchor your plan on automation rate, be honest that humans will handle fewer but harder conversations, and protect time for system improvements. Partner early and often with finance, avoid shrinking too fast, and design a plan for surplus capacity so you’re never caught flat-footed. If AI is going to handle the majority of your customer conversations, your plan has to be designed to help it do that well and to keep your team set up for meaningful, sustainable work. A 2026 plan built on adaptable assumptions—not fixed predictions—will hold up as your work, your systems, and your customers’ expectations continue to change. If you’d like future editions like this, subscribe and stay close—I’ll keep sharing what’s working, what isn’t, and how to tune your customer support AI strategy in real time.

Inspired by this post on The Intercom Blog.

December 16, 2025
Year-End Reflection for Product Leaders: Values, Themes, and the 100‑Wishes Reset

I’ve been closing the year with a deliberate reflection ritual for more than a decade, and this season I found fresh energy for it after listening to an insightful conversation with Teresa Torres and Petra Wille on All Things Product. Their approaches mirror the evolution many product leaders experience: moving from rigid annual goal-setting to values-led themes, longer time horizons, and a healthier respect for spaciousness. In my own practice, that shift has created better focus, less pressure, and far more meaningful outcomes.

Prefer to listen? You can find this episode here: Spotify | Apple Podcasts. I took notes with my team in mind and translated the discussion into a simple, values-driven framework that any product organization can adopt.

Why does annual reflection matter for product people? Because our work lives at the intersection of ambiguity, trade-offs, and time. If we only measure ourselves by shipped output or quarterly OKRs, we overlook the compounding value of learning, relationships, and judgement. I treat this ritual as a strategic reset: a chance to surface patterns, adjust expectations, and recommit to outcomes over output.

My own reflection habit started scrappy—paper notebooks, messy timelines, and even artful visualizations inspired by Dear Data by Giorgia Lupi & Stefanie Posavec. Like Petra, I’ve found that tactile, analog artifacts unlock insights I miss in a spreadsheet. Over time, I’ve kept the spirit and simplified the mechanics: a “what went well” review, a short list of hard lessons, and a handful of decisions that paid off—or didn’t.

The biggest evolution for me has been moving from rigid annual goals to values and themes. I still run OKRs, but I use them to track progress, not identity. The lens of process vs. outcome goals—reinforced by ideas from Atomic Habits—helped me set fewer, better commitments. For example, instead of “launch X by Y,” I’ll emphasize the cadence of customer discovery, the health of the product trio, and the quality of decisions made along the way.

One exercise that changed my practice is the “100 wishes” list. It’s powerful—and surprisingly difficult. Pushing past 30 or 40 wishes forces me to name latent interests and long-range intentions I rarely say out loud. Combined with decade-level themes, the list helps me balance ambition with patience. I don’t try to do it all next year; I use it to spotlight direction, not deadlines.

I also review patterns across years: Where did over-scheduling create hidden costs? When did I protect focus time and what did that unlock? Paul Graham’s Maker’s Schedule, Manager’s Schedule remains a useful calibration tool here. And when I feel the pull toward constant throughput, I revisit Stefan Sagmeister’s The Power of Time Off (TED Talk) to remind myself why strategically creating space often yields the most valuable ideas.

Of course, not every year follows plan—and that’s normal. Reflection helps me spot unrealistic expectations early and let them go. When setbacks hit, I’ll rewatch Dealing with Setbacks and re-ground in continuous discovery. The question isn’t “Did we do everything?” but “Did we learn fast, protect customer value, and make trade-offs aligned with our values?” That’s how empowered product teams compound impact.

My sharing philosophy has become more nuanced over time. Some reflections are public to invite dialogue and accountability; others stay private so I can process honestly. I’ve found it helpful to publish what I’m saying no to, capture a theme for the year ahead, and keep the rest for myself and my team. This balance preserves motivation while still contributing to the broader product management leadership community.

If you’re designing your own ritual, consider this lightweight flow: review wins and tough calls, write your “100 wishes,” extract a few values-based themes, then translate those into process goals for Q1. Revisit monthly, not just annually. If you like structured prompts, Chris Guillebeau’s How to Conduct Your Own Annual Review from The Art of Nonconformity offers a practical template you can adapt to your context.

For deeper dives and complementary ideas, I bookmarked these as part of my year-end reset: What I’m Saying No to This Year—And Why, Ask Teresa: My Leaders Still Want Roadmaps with Timelines—What Should I Do?, Scaling Impact: A Look at the Year Ahead (2022), Let’s Connect in 2025: A Look at the Year Ahead, The Interview Coach, and Petra’s own year-ahead reflections (here and her 2026 version). I also recommend revisiting the prior conversation on leadership and change: Role of Leadership in Transformations.

I’d love to hear how you approach your end-of-year reflection. What questions bring you the most clarity? Which practices help you set an intentional, values-driven path for the next year? Share your process—I’m always looking to learn from other product creators and leaders.

Inspired by this post on Product Talk.

December 16, 2025

A Practical Measurement System for B2B Product-Led Growth

Your dashboard can show more sign-ups, more activated users, and more feature adoption while the business becomes no healthier. In B2B, that usually happens when measurement stops at individual activity and never proves that an account reached repeatable value, stayed engaged, or developed credible expansion potential.

You don’t need a larger metric catalog. You need a connected measurement system that follows one account from eligibility to first value, repeated value, retention, and commercial impact. That system should also tell your team where the journey broke and which decision to make next.

Measure one customer journey at two levels

B2B products create value through people, but the commercial relationship usually exists at the account, organization, or workspace level. This creates a measurement problem: user metrics and account metrics can each look healthy while hiding a different weakness.

Growing active-user counts may only mean that existing customers added more seats. Growing active-account counts can conceal dependence on one enthusiastic user inside each account. Measure both levels, but don’t blend them into an ambiguous active-customer number.

If your billing or value unit isn’t an account, substitute the correct economic entity, such as a workspace or billable organization. The important rule is that every metric names the entity being counted.

Before building a dashboard, write a metric contract for every top-line measure. It should specify:

The business question and decision the metric supports.
The entity being counted: user, account, workspace, or revenue.
The qualifying population and the moment an entity becomes eligible.
The event or event sequence that constitutes success.
The observation window and the period allowed for success.
Exclusions for employee activity, test accounts, duplicate identities, and unusable telemetry.
The segments that must remain available for diagnosis.
The owner responsible for resolving definition or data-quality problems.

This contract prevents a common denominator error. Invited members may create new user registrations, but they aren’t necessarily new accounts. If they enter the activation denominator as though they started a new buying journey, the rate stops answering a coherent question.

Your event model must also resolve each action to the account or workspace in which it occurred. Assigning an event to a user’s current account can corrupt historical reporting when that user belongs to multiple workspaces or changes organizations.

Decision question	Primary unit	Useful measures	What a weakness helps you locate
Did a new account reach meaningful value?	Account or workspace	Activation rate and time-to-first-value	Acquisition quality, setup friction, or an unclear value path
Is value becoming repeatable?	Account and user role	Recurrence of the core behavior, active accounts, and feature adoption	Shallow adoption, novelty effects, or dependence on one champion
Does usage endure?	Account cohort	Cohort-based product retention	A gap between initial success and durable value
Is product value creating commercial pull?	Account and revenue	Validated expansion intent plus expansion and contraction revenue	A weak commercial signal, packaging mismatch, or failed handoff
Can the experience scale responsibly?	Account and operations	Support deflection, incident signals, and delivery guardrails	Growth that is shifting cost or reliability problems elsewhere

A useful portfolio view therefore combines activation, onboarding completion, time-to-first-value, active accounts, feature adoption, cohort retention, expansion and contraction revenue, and support deflection. These aren’t interchangeable scorecard tiles. Each one answers a different question in the value chain.

Treat activation as a hypothesis about future retention

Activation isn’t whatever happens at the end of your onboarding checklist. It is your current hypothesis about the earliest observable behavior that shows a qualified account has received meaningful product value.

That distinction matters. In a hypothetical collaboration product, inviting a colleague may be necessary setup. Completing a shared workflow may be the first evidence of value. Calling the invitation activation would reward the team for moving people through configuration, even if the product never solves the underlying job.

A credible activation definition should meet several tests:

It represents delivered value, not mere exposure to a screen or feature.
It occurs early enough for product, marketing, and customer-success teams to influence it.
It can be measured consistently for the eligible population.
It respects different use cases when those use cases have materially different value paths.
It is associated with stronger later retention inside comparable cohorts and segments.

The last test is important, but it doesn’t establish causality. Accounts that activate may already have greater intent, better internal sponsorship, or a more suitable use case. Treat the relationship as evidence that improves your hypothesis, then use controlled interventions where practical to learn whether removing a particular barrier changes downstream behavior.

Use the same contract to define the related measures. Activation rate is the share of eligible accounts completing the activation behavior within the agreed window. Time-to-first-value begins at the same eligibility moment and ends at the same success event. Onboarding completion remains a diagnostic measure unless completing onboarding itself delivers the promised outcome.

A practical validation loop looks like this:

Map the path from eligibility through setup to the proposed first-value event.
Use funnels and segmentation to locate the step where qualified accounts stop progressing.
Compare later retention for accounts that did and didn’t complete the candidate behavior within equivalent use-case, acquisition, and account cohorts.
Inspect the time-to-first-value distribution by segment instead of relying on one blended average.
Test a focused intervention at the identified bottleneck, such as simpler setup, clearer messaging, a contextual guide, or a revised product tour.
After any activation lift, check repeated use and cohort retention before declaring that the growth system improved.

This is where funnels, high-signal behavioral segments, retention cohorts, and A/B tests on messaging or in-app guidance belong in the same workflow. The funnel identifies friction. The cohort tests whether the behavior matters. The experiment tests whether your intervention changes it.

If activation rises while later retention stays flat, don’t celebrate the dashboard. Either the activation behavior is too shallow, the experiment generated temporary compliance, or the product fails to deliver enough value after the first success. Each explanation produces a different roadmap decision.

Use a driver tree to show exactly where growth breaks

A flat scorecard tells you what changed. A driver tree shows where to investigate. For many B2B PLG products, the measurement chain can be expressed as:

Eligible accounts → setup complete → activated → repeated core value → retained active accounts → expansion intent → expansion or contraction revenue.

This isn’t a universal linear funnel. Renewal and expansion can overlap with ongoing adoption, and different roles may enter at different points. Its purpose is to expose the assumptions connecting product behavior to business performance.

Read movement between adjacent stages before reaching for a broad explanation:

If eligible accounts grow while activation falls, split acquisition quality from product friction. Compare equivalent acquisition and use-case segments before changing onboarding.
If onboarding completion improves while activation doesn’t, you probably removed checklist friction without improving the first-value experience.
If activation improves while repeated value doesn’t, inspect whether the activation event is too shallow or the initial experience creates novelty rather than a durable habit.
If active users increase while active accounts remain flat, adoption may be deepening inside existing customers without broadening the account base.
If repeated product value is healthy while account retention or revenue weakens, product telemetry alone can’t explain the result. Join account behavior with customer status and commercial data.
If expansion-intent signals rise while expansion revenue stays flat, validate the signal and inspect the go-to-market handoff before assuming the product created qualified demand.

These patterns narrow the search; they don’t prove a cause. A driver tree should help your team decide which segment, journey step, qualitative evidence, or experiment to inspect next.

The same tree separates leading indicators from lagging outcomes. Setup completion and high-signal power-user actions can lead into active usage and cohort retention, while expansion and contraction revenue arrive later. A leading metric earns its place only when you continue testing its relationship with the outcome it is supposed to predict.

This changes how you write product OKRs. “Launch a new onboarding tour” is an output. “Increase validated activation for qualified accounts without weakening downstream retention or support outcomes” is an outcome. The first statement rewards shipping. The second forces the team to state the behavior it expects to change and the evidence required to keep investing.

For every experiment, record the target segment, affected driver, hypothesis, exposure event, primary outcome, guardrails, analysis window, and downstream validation. Don’t call a variant successful because it increased tutorial clicks when the intended outcome was account activation.

Keep operational guardrails beside growth outcomes. Incident management and DORA measures can complement product metrics when faster experimentation or adoption adds reliability risk. Support deflection provides another check: apparent growth is less attractive if it merely transfers unresolved friction to customer support.

Segment for decisions, then use benchmarks for calibration

A blended retention curve is an average of customers who may have different jobs, expectations, acquisition paths, and product cadences. It can improve because your customer mix changed even when no segment received a better experience.

Build cohorts from a consistent starting event, such as the moment an account becomes eligible to pursue first value. Then define retention using a value-bearing behavior appropriate to the product’s natural cadence. A product used for an occasional but critical workflow shouldn’t be forced into a weekly-use definition merely because weekly activity is easy to chart.

Keep three concepts separate:

User retention asks whether a person or role continues using the product.
Product-level account retention asks whether the original account cohort continues completing the qualifying value behavior.
Commercial retention asks what happened to the cohort’s revenue after expansion and contraction.

One cannot substitute for another. An account may retain its contract while meaningful product use declines. Another may show healthy usage while commercial contraction occurs. That gap is information, not an inconvenience to smooth out.

Start with segments that can change an actual decision:

Primary use case or job-to-be-done, when value paths differ.
Self-serve versus sales-assisted acquisition, when expectations or onboarding support differ.
Account size or plan, when collaboration depth and feature access differ.
Administrator, champion, and end-user roles, when each role contributes differently to value.
New versus established accounts, when the same behavior means something different at each lifecycle stage.

Resist slicing until every cell becomes noisy. A useful test is simple: if a segment underperforms, would you choose a different intervention or owner? If not, it probably doesn’t belong on the operating dashboard.

Treat expansion intent with the same discipline as activation. Seat invitations, adoption of a higher-value workflow, or repeated encounters with a product limit may be plausible candidates, but none should be accepted on intuition alone. Compare each signal with later expansion outcomes by account segment. Keep the label “intent” until the behavior proves commercially predictive.

Sales involvement doesn’t invalidate product-led measurement. Keep a shared lifecycle definition, segment the acquisition or expansion motion, and distinguish product-sourced, product-assisted, and merely product-active accounts using explicit attribution rules. Otherwise, any active customer can be retroactively called product-led.

External benchmarks are most useful after your internal definitions are stable. Before comparing rates, verify the unit of analysis, eligibility rule, event semantics, observation window, segment mix, and treatment of assisted accounts. Peer-informed targets can calibrate ambition and help identify gaps, but a benchmark built from a different denominator is not a target. It is a false comparison.

Your operating view should ultimately answer four questions without a forensic exercise: Which segment moved? At which stage? Did a downstream outcome confirm the movement? What decision changes because of it? If a metric can’t help answer one of those questions, it belongs in a diagnostic workspace rather than the executive scorecard.

B2B PLG measurement FAQ

Should the headline metric count users or accounts?

Use the economic value unit for the headline and user-level measures for diagnosis. For most B2B products, that means retained active accounts or workspaces completing a validated value behavior. Role-based user measures then reveal whether adoption is broad, concentrated in a champion, or blocked for a critical participant.

What is the best north-star metric for B2B product-led growth?

There is no context-free north-star metric. Choose a value-bearing account behavior that naturally recurs and has a defensible relationship with retention. Pair it with activation, expansion and contraction, and reliability guardrails so one optimized number cannot hide damage elsewhere.

Should sales-assisted accounts be excluded?

No. Excluding them can remove a material part of the customer journey and overstate the independence of the product motion. Keep the lifecycle and value definitions consistent, label the acquisition or expansion path, and compare segments. Product-led growth doesn’t require sales-free growth; it requires clarity about what product behavior contributed.

When should the activation definition change?

Change it when the product’s value proposition, target job, telemetry, or evidence linking activation with retention materially changes. Version the definition and avoid splicing incompatible measures into one time series. Backfill the new definition only when the historical event data supports it; otherwise, mark a clean break.

Before your next roadmap review, write the metric contracts for one activation behavior and one retained-account behavior. Connect them to expansion and contraction, add a reliability or support guardrail, and ask every major roadmap bet to name the link it should move. If a bet can’t state its expected behavioral outcome and downstream confirmation, you have found a strategy gap before spending the engineering effort.

References

December 11, 2025

Long-Horizon Company Building: How to Operate for Decades
You are looking at a roadmap full of credible near-term work, yet none of it seems likely to change your company’s position. The team is busy, customers are asking for improvements, and every investment has a reasonable explanation. What is missing is a clear connection between today’s choices and the company you want to become.

Long-horizon company building solves that problem only when it changes how you allocate capital, sequence capabilities, learn from customers, and stop work. A 25-year ambition is not permission to wait longer for results. It is a decision filter that helps you distinguish compounding investments from activity that merely fills the next planning cycle.

Choose a problem that becomes more defensible with time

Not every company should play a decades-long game. Time does not rescue weak demand, an undifferentiated product, or a market whose underlying problem is disappearing. A long horizon is useful when the work required to serve customers creates assets that become more valuable as they accumulate.

Before you commit to a long-horizon strategy, test the problem against a few concrete conditions:
- The pain is structural. Customers are constrained by an enduring workflow, infrastructure dependency, procurement model, or service failure. The opportunity does not depend entirely on a temporary technology cycle.
- Frustration and switching costs are both high. Switching costs alone protect incumbents. Frustration alone can produce shallow demand for a convenient feature. When customers are dissatisfied but cannot change easily, a substantially better end-to-end experience can open a durable market.
- The solution requires cumulative capability. Reliability knowledge, operational data, installation expertise, distribution, hardware, service operations, or customer trust should improve with continued use. If a new entrant can reproduce your advantage quickly, waiting longer will not make the business stronger.
- The first product creates credible adjacencies. Expansion should follow the same customer, capability base, or service promise. A list of unrelated markets is not a platform strategy.
- The customer outcome can support the business model. The way you charge should reinforce the result customers buy, rather than reward complexity they would prefer to avoid.
The sharpest test is simple: explain why the company should be structurally better after years of serving customers. Your answer must identify a mechanism. More telemetry may improve diagnosis. More deployments may reduce installation risk. Deeper workflow integration may increase the value of adjacent services. Trust may lower the friction of adopting the next product. Merely having more customers or more features is not enough.

A useful thesis takes this shape: for a specific customer, a costly problem will persist because of a structural constraint; repeatedly building a named capability will improve a defensible advantage; controlling certain interfaces is necessary to deliver the promise; and observable evidence will tell you when the thesis is weakening.

If you cannot complete that logic without relying on market size, ambition, or executive conviction, you do not yet have a long-horizon strategy. You have a long-range hope.

Convert a 25-year belief into present-day decisions

A decades-long horizon should not produce a decades-long roadmap. The farther out you look, the less credible feature-level precision becomes. Preserve the direction while making the route explicitly revisable.

Separate your strategy into three layers:
- Enduring commitments: the customer you serve, the problem you believe will remain important, the experience you intend to make possible, and the principles you will not trade away casually.
- Revisable hypotheses: the product architecture, distribution motion, ownership boundary, pricing model, and capability sequence that currently appear most likely to deliver the promise.
- Disposable work: features, prototypes, internal systems, campaigns, and implementation choices. These deserve no protection beyond the evidence they produce.
This separation prevents two common errors. The first is strategic thrashing: changing the destination whenever a current bet disappoints. The second is strategic stubbornness: defending a failed implementation because it has been wrapped in the language of mission.

Meter provides a useful example of the distinction. The company maintained its commitment to a full-stack networking service while spending more than four years in early research and development. It also discarded about a year of operating-system work. The durable thesis survived; a costly implementation did not. That is what conviction looks like when it remains accountable to learning.

At each planning cycle, require every major initiative to answer four questions: Which lasting capability will this build? What customer evidence should it produce? What finding would cause you to reshape or stop it? What are you deliberately declining so the investment receives enough attention?

The stop condition matters most. Without one, patient capital quietly becomes protected capital. Teams learn to explain delays instead of testing assumptions. Write the condition while enthusiasm is high, before sunk costs and personal identity enter the decision.

Key takeaways
- Use a long horizon to define durable commitments, not detailed forecasts.
- Fund work that compounds a named capability or reduces a consequential uncertainty.
- Protect the customer problem and company promise, not the current implementation.
- Give every major bet observable evidence and an explicit stop condition.
- Treat abandoned work as a valid strategic outcome when it prevents a larger misallocation.
Own only the stack required to keep the promise

Vertical integration is neither inherently bold nor inherently wasteful. It is justified when a layer you do not control repeatedly prevents you from delivering the outcome customers believe they purchased.

Start with the promise, not the architecture. Map the complete path from customer intent to customer outcome:
- How the customer evaluates and buys the product
- How the product is installed, configured, and activated
- Which interfaces determine performance and reliability
- What telemetry reveals failure before or after the customer notices
- How support diagnoses and resolves a problem
- Which service commitment makes the outcome commercially credible
Mark every point where an external dependency can break the promise. Then ask whether tighter integration would materially improve the experience and whether the capability will compound across customers or future products. Own a layer when both answers are strong. Keep partnering when the dependency is replaceable, the layer is genuinely commodity-like, or internal ownership would add cost without improving the customer outcome.

This prevents full-stack ambition from turning into organizational vanity. Building hardware, software, installation operations, support tooling, and service delivery at once creates many ways to fail. The burden of proof belongs with the added ownership. Each new layer should remove a specific failure mode, improve a measurable part of the promise, or unlock a strategically important product that would otherwise remain impossible.

Physical-product teams should also treat geography as part of the operating design. When design, manufacturing, and iteration depend on one another, physical proximity can compress feedback loops. Meter used Shenzhen in this way during its development. The general lesson is not that every hardware company needs the same location. It is that organizational geography should follow the bottleneck: put the people making interdependent decisions close enough to learn at the speed the product requires.

The business model belongs in the same analysis. If customers want an outcome but must assemble vendors, equipment, installation, and support themselves, packaging the complete experience as a service can reduce complexity and clarify accountability. Service commitments then become part of the product, not language added after the product is built. The company earns recurring revenue by continuing to deliver the outcome, which aligns incentives more closely than a transaction that ends when equipment changes hands.

Distribution should reinforce learning during the early stages. A direct sales motion gives product and commercial leaders access to the buyer’s language, objections, procurement constraints, implementation concerns, and definition of value. That access is especially important when you are trying to establish seller-market fit: the ability to identify the right buyer, explain the value consistently, navigate the buying process, and deliver what was sold.

Before adding channel distance, verify that target buyers recognize the same problem, objections fall into understandable patterns, sales commitments survive the implementation handoff, and the economics support the promised service. A channel can scale a repeatable motion. It cannot repair one that the company does not yet understand.

Replace planning theater with a customer-learning system

Removing OKRs does not create focus. It removes one alignment mechanism. If you do not replace it with a visible decision system, priorities will depend on executive proximity, persuasive storytelling, and whichever escalation arrived most recently.

A lightweight operating system still needs a few explicit artifacts:
- A strategic narrative: the customer problem, the long-horizon thesis, the current constraint, and the choices the company is making because of them.
- A primary customer-value measure: evidence that the promised outcome is actually occurring, not merely that work shipped.
- Guardrails: reliability, service, economics, or trust conditions that must not deteriorate while the primary outcome improves.
- An unhappy-customer ledger: a shared record of broken promises, stuck use cases, escalations, and gaps between what was sold and what was delivered.
- A decision log: the assumption behind each consequential choice, the evidence available at the time, the owner, and the condition for revisiting it.
The unhappy-customer ledger is often more useful than another aggregate dashboard. A satisfaction score compresses many experiences into one number. An escalation exposes the precise boundary where your product, service, sales process, or ownership model failed.

For every serious case, capture the customer’s intended outcome, the point at which progress stopped, the expectation that was violated, the immediate resolution, and the systemic change required. Classify that change as product, operations, sales, support, or ownership-boundary work. Then look for recurring failure modes across cases.

Do not let this become a larger support queue. Closing the individual ticket is necessary, but the strategic value comes from removing the class of failure. If customers repeatedly struggle during installation, the answer may be a better workflow, different telemetry, a narrower promise, or ownership of an interface that has been treated as someone else’s problem.

This system also clarifies empowerment. A product team should know the outcome it owns, the constraints it must respect, the decisions it can make independently, and the conditions that require escalation. Empowerment without a clear outcome produces local optimization. Authority without proximity to customer evidence produces slow, brittle decisions.

The same clarity applies to performance problems. A company cannot preserve a long horizon while allowing unresolved role or behavior gaps to consume the team’s attention. Define the gap, the expected standard, the support available, the decision owner, and the process for reaching a fair conclusion. Move quickly toward clarity, while still following the appropriate people process. Delayed ambiguity is not patience.

Make patience accountable in your next strategy review

Long-horizon work will contain periods when visible output understates real progress. Research, infrastructure, reliability, manufacturing, and operational design may need to mature before customers see the complete benefit. The leadership challenge is to distinguish that legitimate incubation from drift.

Patience is working when the core customer thesis remains supported, important uncertainties are being resolved, a reusable capability is getting stronger, and customer failures are becoming better understood or less frequent. The dates may move, but the quality of evidence improves.

Drift looks different. Milestones move without producing new knowledge. Teams defend work by describing its difficulty or the effort already invested. The same customer failures return without a systemic response. Adjacent products receive attention before the original promise is dependable. Leadership keeps adding resources because it has not defined what would justify stopping.

Review the portfolio by decision, not by project status. Continue work that compounds a necessary capability. Reshape work when the thesis remains sound but the current method is failing. Stop work whose original assumption no longer holds. Keep adjacent opportunities separate until the core business has earned the capacity to pursue them.

You can run the review with the following sequence:
1. Write the customer promise in language a buyer would recognize.
2. Name the structural reason the problem should remain worth solving.
3. Identify the capability that should become more valuable as the company learns.
4. Map the interfaces, operations, and commercial dependencies that can break the promise.
5. Examine recent unhappy-customer cases for repeated failure modes.
6. For every major investment, write the evidence expected and the condition that would cause a change of course.
7. Remove work that neither improves the current promise nor builds a required future capability.
8. Assign the next consequential decision to a named owner with access to the relevant customer evidence.
Do not leave that review with a more elaborate long-range deck. Leave with fewer bets, clearer ownership, explicit learning goals, and at least one piece of work you are prepared to stop.

At your next planning meeting, ask which current investment will make the company structurally better at solving its chosen problem. If nobody can name the capability, the evidence, and the customer promise it serves, pause the work before time turns activity into strategy by accident.

References
- Shivam.Consulting Blog — Playing the 25-Year Game: Rethinking Networking, Ditching OKRs, and Owning the Full Stack
December 10, 2025

Outcome-Led Product Leadership: A Prioritization System

Your team has more plausible work than capacity. Sales has a customer commitment, support sees recurring friction, engineering sees reliability debt, and executives want a differentiator. Every item can be defended. That is exactly why ranking features is the wrong first move.

An outcome-led system changes what earns priority. You first decide which customer behavior, product condition, or business result needs to change. Then you compare opportunities and solution bets by how credibly they can cause that change. The roadmap becomes a record of choices, evidence, and trade-offs rather than a queue controlled by the loudest request.

Prioritize the change before you prioritize the work

An output is something the team delivers. An outcome is an observable change the team intends to cause. Launching an onboarding flow is an output. Increasing the share of new customers who complete setup successfully is an outcome. The distinction matters because a team can deliver the first without achieving the second.

A usable outcome needs more than a metric name. It should identify who is affected, what behavior or condition should change, why that change matters, how it will be observed, and which guardrails must remain healthy. If you cannot describe how the world should be different after the work succeeds, the item is not ready to compete for priority.

Use an outcome card before accepting solution proposals:

Decision context: the strategic problem that makes a choice necessary.
Target population: the customer segment, user role, or workflow affected.
Current state: the observed behavior, baseline signal, or product condition.
Desired movement: the direction of change and, when the evidence supports it, a meaningful target.
Strategic connection: how the change supports growth, retention, trust, efficiency, or another declared priority.
Guardrails: the signals that must not be harmed while the primary outcome improves.
Review trigger: the evidence or constraint change that would cause leadership to reconsider the outcome.

Do not invent a precise target when no baseline exists. The first commitment may need to be instrumentation, observation, or a small test that establishes the current state. False precision makes an outcome look settled while hiding the most important uncertainty.

The following layers prevent strategy, outcomes, opportunities, bets, and outputs from collapsing into one roadmap item:

Layer	Decision question	Illustrative setup example
Strategic intent	Why does this area matter?	Make first use dependable for new customers.
Outcome	What observable change should occur?	Increase the share of new administrators who finish setup without support.
Opportunity	What unmet need or obstacle prevents that change?	Administrators cannot tell which permissions are required.
Bet	What intervention might address the opportunity?	Test guided permission configuration.
Output	What would the team actually deliver?	Release the validated setup change.

This separation gives you several places to change course. If the bet fails but the opportunity remains important, try another solution. If evidence shows the opportunity was misdiagnosed, investigate another obstacle. If the outcome no longer supports strategy, stop the entire branch. Without these layers, leaders often preserve a feature commitment long after its original reasoning has failed.

A company-level result such as revenue can be valid, but it may be too distant for a product team to manage directly. Connect it to customer behavior and product signals the team can influence. Pair each primary signal with a guardrail: setup completion with setup errors, faster resolution with customer-reported quality, or increased usage with reliability. A metric can improve through the wrong mechanism, so success needs a boundary as well as a direction.

Translate strategy into a decision boundary teams can use

Outcome-led leadership does not mean selecting a metric and disappearing. Leadership owns the strategic context, the outcome boundary, the investment constraints, and the conflicts that individual teams cannot resolve. The team needs room to investigate opportunities, compare solutions, and stop weak bets without asking permission at every step.

Training teams in discovery while leaders continue to manage through feature requests, static roadmaps, and approval gates teaches the organization that customer evidence is secondary. Teams may perform interviews and experiments, but they will still optimize for getting a predetermined feature approved and shipped.

A clear outcome statement can act as a decision boundary:

For [target segment] in [specific situation], improve [behavior or product condition], observed through [primary signal], because [strategic reason], while protecting [guardrails]. Explore opportunities within [scope and constraints] without assuming [requested solution].

The last clause is important. A feature hidden inside an outcome statement is still a feature mandate. Improve adoption of the new dashboard assumes the dashboard is the answer. Help account owners notice and act on performance risks leaves room to discover whether a dashboard, alert, workflow change, or no new interface is the better intervention.

Build a driver tree when the connection between strategy and team behavior is unclear:

Place the business result at the top.
Identify customer behaviors or product conditions that may contribute to it.
Attach observable product signals to those drivers.
Map the customer opportunities that could change each driver.
Mark every unproven connection as an assumption, not a fact.

The tree is not proof of causality. It is a visible model of the current reasoning. That visibility helps teams choose what to validate and helps leaders see where a confident roadmap rests on a weak connection.

Before assigning an outcome, leadership should answer four practical questions:

Why does this outcome deserve investment ahead of the alternatives?
Which constraints are fixed, and which are merely preferences?
Which decisions can the team make without another approval?
What evidence would cause leadership to change the outcome or its investment?

A team cannot genuinely own an outcome when every solution needs executive approval, critical dependencies remain unresolved, or performance is judged only by shipping. That arrangement gives the team accountability without authority. The leadership task is to remove those contradictions before asking the team to move a metric.

Prioritize opportunities with evidence, then shape the portfolio

Use an eligibility gate before a ranking formula

I prefer a gate before a rank. It prevents a polished request with a confident sponsor from competing against a well-understood opportunity merely because both have feature names and effort estimates.

A candidate should become eligible for prioritization only when its decision brief covers:

Outcome relevance: the specific outcome it could affect.
Target evidence: the segment, situation, and observed problem behind it.
Mechanism: the reason this intervention might change the outcome.
Measurement: the primary signal, guardrails, and method of learning.
Critical assumption: the belief most likely to invalidate the bet.
Constraint fit: the technical, operational, and sequencing limits that matter.
Opportunity cost: the work, learning, or outcome investment that would be displaced.
Reversibility: the cost of changing course if the assumption proves wrong.

If a candidate cannot name its outcome or target population, return it to intake. That does not mean it lacks value. It means the organization does not yet have enough information to compare it honestly.

Scoring models can help expose disagreement, but arithmetic should not make weak evidence look objective. Record the reasoning behind each score. Ask which uncertain input has the greatest effect on the ranking. If a small change to that input reverses the decision, investigate the assumption before committing substantial capacity.

Compare opportunities before comparing solutions. Several feature requests may be different guesses about the same customer obstacle. Combining them at the opportunity level can reveal a smaller or more effective intervention. Conversely, two similar-looking features may serve different segments and outcomes, which means one score should not flatten them into a false equivalence.

Use the Kano Model to balance protection, improvement, and exploration

Outcome relevance tells you why an opportunity matters. The Kano Model adds a customer-expectation lens by separating capabilities into must-haves, satisfiers, and delighters.

Must-haves protect the baseline. When they are missing or broken, trust and satisfaction suffer even if the product has innovative features.
Satisfiers create more value as their performance improves. Compare the expected incremental outcome movement with the effort and risk required.
Delighters create unexpected value and differentiation. Treat them as hypotheses worth testing, not as compensation for a broken baseline.

Run the classification by segment and context. A capability can be essential for an advanced customer and irrelevant to a new user. Ask how the target customer would feel if the capability existed and how that same customer would feel if it did not. Pairing these functional and dysfunctional questions is more informative than collecting positive reactions to a proposed feature in isolation.

Do not translate the categories into equal allocations. The right portfolio depends on product maturity, strategic intent, and the condition of the core experience. Make the allocation explicit instead: which investments protect required value, which improve an outcome customers already care about, and which explore future differentiation?

Revisit the classification after meaningful releases or market changes. A delighter can become an expected baseline, so yesterday’s differentiator may no longer justify the same investment. Usage, experiments, interviews, retention patterns, and support evidence should update the portfolio rather than merely confirm the original roadmap.

Run leadership reviews that force choices, not status reports

An outcome-led roadmap can still become output-led in the review meeting. If leaders ask only about delivery dates, scope, and percentage complete, teams will optimize for those signals. Separate the conversations that answer different questions:

Outcome review: Is the customer behavior or product condition moving, for which segment, and with what guardrail effects?
Discovery review: What changed in the team’s understanding of the opportunity, mechanism, or critical assumption?
Commitment review: Which bet should start, continue, change, or stop, and what does that choice displace?

These conversations can share a meeting, but they should not share one vague status label. On track can mean delivery is proceeding to plan while the underlying evidence is weakening. Healthy delivery and healthy product reasoning are different states.

Use a compact review board with the outcome and segment, current signal relative to baseline, strongest new evidence, largest unresolved assumption, active bet, decision required, and displaced work. Feature completion belongs in the delivery portion of the review. It should not stand in for evidence that the outcome is becoming more likely.

Leaders should repeatedly ask:

What did the team learn that it did not believe before?
Which evidence supports or weakens the proposed mechanism?
Is the outcome still right even if the current solution is wrong?
What is the smallest next commitment that resolves the most consequential uncertainty?
What will stop or move if this work receives priority?
Does the team need a decision, a constraint removed, or simply space to continue?

Set decision conditions before attachment to a solution grows. Continue a bet when the evidence strengthens its mechanism. Change the bet when the outcome and opportunity remain valid but the solution does not. Move to another opportunity when the original problem is weaker than expected. Reconsider the outcome when its strategic premise or target segment changes. Stopping a bet is not abandoning outcome ownership; it is one of the ways outcome ownership becomes real.

Stakeholder requests need the same discipline. Translate each requested feature into an intake record that identifies the affected customer, the situation, the observed problem, the evidence, the desired behavior change, the timing constraint, and any alternatives already tried. A request earns evaluation, not an automatic roadmap position.

A useful escalation rule is simple: anyone asking to add committed work must identify what should leave, or explain which outcome or constraint has changed. This turns hidden priority overrides into visible strategy decisions. Seniority may change who has decision rights, but it should not erase opportunity cost.

Before changing the entire organization, use a pilot team to surface decision bottlenecks, incentive conflicts, stakeholder friction, and policy barriers. Track where the team still needs feature approval, where evidence loses to hierarchy, and where another function is rewarded for behavior that undermines the outcome. Those blockers are leadership work. Scaling the workflow without resolving them only distributes the same conflict more widely.

Key takeaways for your next prioritization review

Prioritize an observable customer, product, or business change before ranking proposed outputs.
Give each outcome a target population, baseline signal, strategic connection, guardrails, and review trigger.
Separate outcomes, opportunities, solution bets, and outputs so a failed solution does not preserve itself as a permanent commitment.
Use an evidence gate before scoring, and expose the assumption that could reverse the ranking.
Balance Kano must-haves, satisfiers, and delighters deliberately instead of treating every request as the same kind of value.
Make leadership reviews decide what starts, changes, stops, or gets displaced.
Convert stakeholder urgency into evidence, constraints, and explicit opportunity cost.

At your next roadmap review, take the highest-ranked feature and rewrite it as an outcome statement. Require competing bets to name their evidence, critical assumption, guardrails, and displaced work. If the team cannot do that yet, commit to resolving the uncertainty rather than pretending the feature is ready.

At the following review, ask what changed in the customer signal or the team’s belief before asking what shipped. That question reveals whether your operating system actually rewards outcomes or merely uses outcome language around a feature queue.

References

December 9, 2025

Outcome-Driven Go-to-Market: From Launch Plan to Growth Loop

Your launch is on schedule. Product is shipping, marketing has a campaign, sales has enablement, and customer success has an adoption plan. Yet the leadership review still gets stuck on a basic question: what customer outcome should all this activity produce?

If you cannot answer that question with observable evidence, you do not have a go-to-market system yet. You have coordinated output. Outcome-driven go-to-market execution connects the product promise to a change in customer behavior, then connects that behavior to a commercial result. It gives every function the same causal chain and enough evidence to decide what to change when the chain breaks.

Write the outcome contract before the launch plan

The usual go-to-market plan starts with deliverables: finish the landing page, train the sales team, publish the campaign, launch the product tour, and brief customer success. Those tasks matter, but completing them does not demonstrate that the market understood the value or that customers received it.

An outcome contract establishes what the cross-functional team is trying to make true. Start with a specific segment and ideal customer profile, because a result stated for everyone will be too vague to guide positioning, product decisions, or sales execution. The contract should also identify the customer situation that makes the offer relevant. Industry and company size alone rarely explain why a buyer needs to act.

Write the contract before functions turn the strategy into separate workstreams. It needs these elements:

Segment and situation: Identify who has the problem, what has changed in their environment, and who is explicitly outside the initial motion.
Customer outcome: State what becomes easier, faster, safer, or more valuable for that segment. Describe the change in the customer’s world, not the feature being delivered.
Value behavior: Name the observable action that indicates a user has begun receiving the promised value. This becomes the activation hypothesis, not merely another engagement event.
Commercial result: Choose the business result the motion is expected to influence, such as qualified progression, paid conversion, retention, or expansion. Add guardrails so that improving an early metric cannot conceal damage later in the journey.
Evidence window: Agree when the team expects each leading signal to become visible. Do not wait for a lagging revenue result to discover that the message or onboarding failed much earlier.
Decision owner: Identify who convenes the functions, resolves conflicting interpretations, and records the decision when the evidence is weak.
Failure condition: State what would cause the team to change the segment, promise, proof, onboarding, offer, or product instead of adding more activity.

A usable contract can fit into a single sentence: For [segment] facing [situation], this motion should produce [customer outcome]. We will see early evidence in [value behavior] and commercial evidence in [business result], without harming [guardrail]. If [required evidence] is absent by [decision point], [owner] will reopen [assumption or lever].

This is the practical difference between outcome and output OKRs. A completed product tour is an output. More target users reaching the value behavior with stronger downstream retention is an outcome. The tour earns continued investment only if it contributes to that outcome.

The contract also prevents each function from quietly optimizing for a different definition of success. Marketing can still manage audience and response metrics. Sales can still manage opportunity progression. Product can still manage activation. Customer success can still manage adoption and retention. The difference is that those measures now describe connected parts of the same customer journey.

Carry the buyer from a credible promise to acceptable proof

Positioning is not a launch slogan. It is the logic that helps a buyer recognize the problem, understand why the product is relevant, distinguish it from alternatives, and believe that choosing it is safe enough.

Build that logic before producing channel assets. A useful message architecture contains:

Situation: The trigger, constraint, or unmet job that makes action relevant now.
Promise: The customer outcome the product can credibly help create.
Points of parity: The capabilities buyers expect before they will consider the product a legitimate option.
Differentiation: The meaningful reason this approach is better suited to the target situation than the available alternatives.
Mechanism: How the product creates the promised outcome. This keeps the claim connected to product truth.
Proof: The evidence a buyer should accept at the current decision stage.
Risk response: How the motion addresses implementation, security, procurement, switching, and organizational concerns.
Next decision: The smallest credible commitment that advances the buyer without pretending the entire decision has already been made.

The core promise should remain stable, but its expression should change with context. Different segments, buying stages, and channels need different versions of the message. An advertisement may help a buyer recognize a problem. A landing page must establish relevance and differentiation. A sales conversation must diagnose the use case. An in-product guide must help the user experience value. Repeating identical copy in every context produces consistency of wording, not consistency of meaning.

Enterprise execution adds another complication: the buyer is not a single person. The user wants the product to improve a workflow. A functional leader wants a measurable operating result. The economic buyer wants a credible business case. Security wants controlled risk. Procurement wants terms it can evaluate and govern. A multi-threaded buying committee needs the same value proposition translated into each stakeholder’s decision.

Do not solve this by inventing a different promise for every role. Preserve the outcome and mechanism, then change the evidence. The user may need to see a workflow completed. The economic buyer may need quantified value. Security may need an approved control narrative. Procurement may need a clear scope, packaging model, and path to renewal. If those artifacts imply different product truths, the motion will lose credibility as stakeholders compare notes.

Use an asset test before anything enters the launch plan: This asset should move [audience] from [current belief] to [next belief or action]. I will observe that change through [leading signal], and I will validate it against [downstream outcome]. If the team cannot complete that sentence, the asset is a calendar commitment without a strategic job.

For complex accounts, design the proof of value as part of the offer rather than improvising it after a promising sales call. A proof of value should specify:

the business outcome and the baseline against which change will be judged;
the scoped use case, users, and workflow included in the evaluation;
the product behavior expected to indicate initial value;
the data, access, privacy, security, and governance constraints;
the stakeholders who must accept the evidence;
the instrumentation required to collect that evidence;
the criteria for expansion, redesign, or stopping; and
the commercial decision that follows a successful evaluation.

A proof of value is not a longer demo. It is a controlled way to test whether the promised outcome can survive contact with the customer’s environment. If the customer and seller cannot agree in advance on what counts as sufficient evidence, a successful pilot can still end in indecision.

This discipline is particularly important when buying cycles are longer and switching costs are higher. Quantifying outcomes early and aligning pricing and packaging with willingness to pay reduces ambiguity at the point where technical success must become a commercial decision.

Measure the causal chain, not a pile of channel metrics

A dashboard can contain accurate numbers and still be useless for go-to-market decisions. The test is whether the measures reveal where the customer journey is breaking and which lever the team should change.

Map the journey from targeted attention through paid expansion. For every stage, name the question, the evidence, and the likely response to a weak signal.

Journey stage	Decision question	Useful evidence	Response when weak
Targeted attention	Are relevant customers recognizing the problem?	Qualified response by segment and situation	Revisit targeting, problem framing, or channel context
Evaluation	Do buyers understand the promise and difference?	Use-case engagement, progression, and objection patterns	Clarify positioning, mechanism, or supporting proof
Commitment	Has enough buyer risk been removed?	Proof-of-value acceptance and security, procurement, or approval progress	Resolve the specific risk or make decision criteria explicit
Activation	Are users reaching initial value?	Activation behavior, time-to-value, and abandonment points	Fix access, onboarding, product guidance, or product friction
Durable use	Does the value behavior repeat?	Core behavior frequency and retention by relevant cohort	Test whether the activation event predicts lasting value
Expansion	Is value spreading or deepening?	Adoption breadth, additional use cases, and paid expansion	Revisit packaging, enablement, customer success, or the next use case

This chain makes leading and lagging measures work together. Revenue is essential, but it arrives too late and aggregates too many causes to diagnose execution by itself. Click-through rate arrives early, but it says little about whether customers receive value. Activation and retention connect the two, provided the chosen activation event represents a meaningful step toward the promised outcome.

That proviso matters. Teams often label a convenient event as activation because it is easy to instrument. Account creation, a login, or a page view may only show access. The stronger question is: what behavior would be unlikely unless the user had begun to receive the value described in the positioning?

Instrument identity and events across the relevant systems so that exposure can be followed through the funnel. A unified analytics journey from first touch to paid expansion needs product behavior, campaign exposure, CRM stage, account context, and commercial status to be reconcilable. Perfect attribution is not required to improve decisions, but incompatible definitions will create debates that no amount of dashboarding can settle.

Create a shared measurement dictionary for every outcome-critical event. Record what triggers the event, what does not, which user or account entity it belongs to, when it became reliable, and which decision it supports. If marketing, product, and sales use the word qualified or activated differently, fix the definition before interpreting the trend.

Experiments should test a link in the causal chain, not simply generate a winner. Before running an A/B test, write down:

the segment and journey stage being tested;
the customer belief or behavior expected to change;
the intervention, such as a message, product tour, in-app guide, onboarding flow, or offer;
the primary outcome metric and downstream guardrails;
the minimum detectable effect that would matter to the business;
the stopping and decision rules; and
the action the team will take for a positive, negative, or inconclusive result.

Setting the minimum detectable effect before reading the result protects the team from declaring a noisy change meaningful because the preferred variant appears slightly ahead. Guardrails protect against local optimization. If creative improves click-through but reduces downstream activation, it has made the funnel busier rather than better.

The pattern of movement often tells you where to look. Strong attention with weak qualified progression points toward targeting or positioning. Strong conversion with weak activation suggests an expectation, handoff, or onboarding problem. Strong activation with weak retention means the supposed aha moment may not represent durable value. Strong retention with weak expansion can indicate packaging, permissions, enablement, or use-case discovery friction. These are diagnostic hypotheses, not automatic verdicts; use qualitative evidence to identify the mechanism before changing the system.

Turn the launch into a decision loop that can scale

Outcome-driven execution needs a cadence that converts evidence into decisions. A status meeting asks whether the planned work shipped. A decision meeting asks what changed in the customer journey, what explains the change, and what the team will do next.

Use a weekly cross-functional review for the active motion. Keep the agenda anchored to the outcome contract:

Outcome and guardrail movement: Review the agreed measures, not a rotating collection of favorable metrics.
Segment and cohort variance: Check whether the aggregate hides a strong or weak response in the target group.
Current bottleneck: Identify the earliest important break in the causal chain. Later weaknesses may be consequences of that break.
Evidence: Bring behavioral data, experiment results, customer language, sales objections, and proof-of-value findings together.
Diagnosis: Decide whether the barrier is primarily belief, access, capability, risk, or commercial fit.
Next intervention: Choose the smallest change capable of testing the diagnosis.
Decision record: Capture the owner, expected signal, review point, and the assumption being tested.

Different evidence answers different questions. Analytics shows where behavior changes and how broadly. Customer conversations help explain motives and language. Field feedback reveals objections and decision friction. Controlled experiments provide stronger evidence that an intervention caused a change. None is sufficient alone, and a forceful anecdote should not automatically overrule a stable segment pattern.

Give each function responsibility for maintaining its link in the chain. Marketing maintains evidence about audience, problem recognition, and message response. Sales maintains evidence about diagnosis, objections, stakeholders, and commitment. Product maintains evidence about access, activation, and the ability to realize value. Customer success maintains evidence about adoption, durable outcomes, and expansion readiness. No function owns the entire customer outcome alone, but each must be able to explain its part without retreating into output metrics.

When the evidence points to a product constraint, the issue belongs in product prioritization and sprint planning. When it points to a credibility gap, another feature may be less valuable than better proof. Empowered product teams, product trios, and field insights from enterprise pilots keep those choices connected to the market without turning every objection into an unexamined roadmap request.

Use QBRs for the larger strategic questions: Is the segment still attractive? Does the product create a repeatable advantage? Are pricing and packaging aligned with received value? Should resources move between acquisition, activation, retention, and expansion? A quarterly review cannot replace the weekly learning loop, and the weekly loop should not repeatedly reopen strategy without material evidence.

Scale the motion only when its success is becoming repeatable rather than heroic. Look for:

a target segment that responds for a consistent reason;
a value proposition that survives across channels and buyer roles;
an activation behavior that has a credible relationship with retention;
a proof process with explicit evidence and decision criteria;
objections that are predictable enough to address through enablement or product changes;
instrumentation reliable enough to locate funnel breakdowns;
pricing and packaging that support the value customers are willing to buy; and
a playbook that another team can execute without recreating the strategy from memory.

A bespoke enterprise win can be valuable evidence, but it is not yet a repeatable motion. Before treating it as the model, separate what was essential from what depended on exceptional access, custom work, executive attention, or a uniquely motivated customer. Scale the elements that explain the outcome. Preserve the rest as a conscious exception or remove it from the standard motion.

If the bottleneck survives repeated tactical changes, stop expanding the activity around it. Reopen the underlying assumption. The segment may not feel the problem strongly enough, the promise may not be differentiated, the proof may not reduce the relevant risk, or the product may not deliver the claimed value. An outcome-driven system makes that uncomfortable conclusion visible early enough to act on it.

Key takeaways

Start with an outcome contract that links a target customer’s result to an observable value behavior and a commercial result.
Use a stable value proposition across the motion, but adapt the evidence and next decision to the segment, channel, stage, and buyer role.
Measure the full causal chain from targeted attention through activation, retention, and expansion; no single channel metric can represent go-to-market success.
Design experiments with a declared hypothesis, meaningful effect threshold, downstream guardrails, and decision rule before results arrive.
Run a weekly decision loop, reserve QBRs for strategic changes, and scale only after the motion is measurable, teachable, and repeatable.

At your next go-to-market review, put the causal chain on the first slide instead of the workstream tracker. Ask where the earliest important evidence breaks, name the assumption behind that break, and fund the smallest intervention that can test it. That is how a launch plan becomes a growth loop.

References

December 4, 2025

From Output to Outcomes: How I Align Stakeholders Around a True Product Operating Model

When I push our organization to adopt the product operating model, I’m emphasizing a foundational shift—from “shipping roadmaps of features (output)” to solving real customer and business problems, measured by “business results (outcomes)”. That’s the difference between activity and impact, and it’s the only way to build durable value at scale.
This change inevitably reaches beyond the product organization. It reshapes how company stakeholders in Sales, Marketing, Customer Success, Finance, Legal, Security, and Operations engage with product teams, and it reframes what they expect from us. Instead of asking, “When will feature X ship?” they learn to ask, “How will we move the outcome that matters?”
In practice, the product operating model is a contract: product teams commit to outcomes, and stakeholders commit to partnership. That partnership means we co-own the problem, align on evidence, and share accountability for results. The reward is clarity—everyone sees how their work ladders to strategy and why the sequence of work makes sense.
Here’s how I align stakeholders around this model. First, I ground everything in outcomes vs output OKRs. We replace feature roadmaps with a clear strategy, prioritized problems, and measurable objectives. Our product roadmapping and sprint planning then serve the objectives—not the other way around—so capacity is allocated to the highest-leverage bets.
Second, I build empowered product teams around product trios (product, design, engineering). We practice continuous discovery with stakeholders: we share opportunity trees, test riskiest assumptions early, and bring partners into research when it informs go-to-market strategy, pricing, or enablement. This keeps us honest and avoids late-stage surprises.
Third, I establish operating rhythms that make outcomes visible. Monthly stakeholder reviews focus on progress toward objectives and what we’re learning—not status theater. Quarterly, we connect OKRs to business performance so leaders can see the throughline from discovery and delivery to pipeline, retention, or margin. If priorities shift, we renegotiate objectives explicitly.
Fourth, I define metrics that stakeholders trust. We use a balanced set of leading indicators (activation, engagement, cycle time) and lagging indicators (revenue, retention, unit economics). We socialize definitions early so no one debates the scoreboard mid-game. The result: faster decisions and less “data whiplash.”
Fifth, I invest in change management. Moving from outputs to outcomes can feel threatening if your success has historically been measured by launch volume or roadmap commitments. I address this head-on with training, transparent comms, and clear decision rights. The message is simple: outcomes create more autonomy for empowered product teams and more predictability for stakeholders.
At HighLevel, this approach has been especially powerful when cross-functional dependencies are high. For example, when we set an objective to improve user activation for a new CRM integration, we didn’t promise a bundle of features. We committed to a measurable lift in activation and a shorter time-to-value, co-owned with Customer Success and Marketing. That alignment unlocked smarter experiments, tighter enablement, and a more credible launch narrative.
The anti-patterns are predictable: treating OKRs as a renaming of the roadmap, equating discovery with indecision, or isolating product decisions from go-to-market strategy. The cure is equally consistent: bring stakeholders into discovery, attach every bet to an objective, and show progress with evidence—not just demos.
Ultimately, the product operating model is a leadership choice. It asks us to trade certainty theater for learning velocity, and feature checklists for business impact. When stakeholders see that shift pay off—in faster cycles, clearer priorities, and results that matter—support for the model moves from compliance to conviction.

Inspired by this post on SVPG.

December 1, 2025
Unlock AI Product Roadmaps: Essential Tools Every PM Needs to Prioritize and Ship Faster

In my role leading product teams, the AI product roadmap isn’t just a plan—it’s the operating system for how we discover value, prioritize with rigor, and ship with confidence. The pace has changed, the stakes are higher, and the best product managers are now orchestrating AI capabilities, data, and customer insight in near-real time.

Master the evolving art of the AI product roadmap. Prioritize smarter, turn data into direction and insight into action, only much faster.

When I say “AI product roadmap,” I’m talking about a living system that blends strategy, discovery, and delivery. It’s less about dates and more about outcomes, risk reduction, and sequencing learning. In practice, that means combining AI Strategy with product roadmapping and sprint planning, then validating each bet with real customer signals.

For prioritization, I anchor on outcomes vs output OKRs and connect them to measurable signals across the funnel. Continuous discovery keeps insights flowing, while a unified approach to analytics and retention analysis tells me where the lift is. This lets me rank initiatives not just by impact and effort, but by how quickly we can learn, iterate, and compound value.

On discovery, product trios are non-negotiable. We prototype early with gen ai and LLMs for product managers to accelerate concept validation and reduce ambiguity. When customers can co-create through in-app guides or lightweight product tours, we turn vague needs into crisp problem statements and testable hypotheses far faster.

On delivery, I pair tight feedback loops with experimentation. A deliberate cadence of A/B testing and strong instrumentation ensures we’re learning every sprint, not just launching. The goal is to de-risk decisions quickly, keep momentum high, and translate signals into roadmap movement without thrash.

Under the hood, the AI stack matters. I rely on a retrieval-first pipeline to ground models in trusted data, and I’m intentional about privacy-by-design and data governance from day one. As agentic AI patterns emerge, I put evaluation workflows in place so we can ship confidently—and safely—without slowing down innovation.

Finally, alignment is the multiplier. Clear narrative roadmaps tied to customer outcomes help stakeholders see trade-offs, while crisp interfaces with go-to-market and CRM integration close the loop from roadmap to revenue. When everyone can trace a line from AI strategy to shipped value, prioritization becomes easier and trust grows.

If you’re feeling the acceleration, you’re not alone. With the right AI product toolbox—rooted in discovery, grounded in data, and delivered through tight feedback loops—you can move faster, learn smarter, and build products your customers can’t live without.

Inspired by this post on Product School.

December 1, 2025
AI Product Owner in 2026: The High-Impact Role Every Team Needs to Win With AI

By 2026, the AI Product Owner will be the keystone role that turns AI strategy into measurable business outcomes. In my teams, this seat bridges market insight, model capability, data governance, and shipping velocity—so product decisions are not just clever, but compliant, reliable, and fast.

I often describe the remit simply: "Here is your clear guide to the AI product owner role (skills, responsibilities, how it differs from PM) and ways AI tools supercharge delivery." In practice, the AI Product Owner translates business goals into model-backed experiences, aligns cross-functional execution, and ensures the product’s AI behavior remains safe, lawful, and on-brand under real-world constraints.

How does this differ from a traditional PM? While Product Management sets portfolio strategy, positioning, and market narratives, the AI Product Owner owns the AI experience end-to-end—data readiness, evaluation harnesses, safety guardrails, and the iterative model improvements that drive outcomes vs output OKRs. I anchor the role inside empowered product teams and product trios (PM/Design/ML Eng) to keep discovery continuous and delivery disciplined.

On responsibilities, I expect four pillars. First, discovery: continuous discovery with customers and internal experts to uncover use cases where generative AI or LLMs beat the status quo. Second, experience: define the right interaction patterns for AI UX, including retrieval-first pipeline choices, context window management, and feedback loops for human-in-the-loop correction. Third, governance: privacy-by-design, AI risk management, data governance, and regulatory compliance baked into the roadmap. Fourth, delivery: CI/CD for models and prompts, observable evaluation with A/B testing and minimum detectable effect (MDE), and SRE-grade incident management when AI behavior drifts.

Skills-wise, I look for product sense plus technical fluency. That includes LLMs for product managers (prompting, grounding, RAG), analytics mastery (Amplitude analytics, retention analysis, activation metrics), and comfort with DORA metrics and deployment frequency to keep iteration high but safe. Strong stakeholder management and clear writing are non-negotiable—AI capabilities evolve fast, and leaders must see risk, cost, and ROI with no ambiguity.

AI tools truly supercharge delivery when they eliminate bottlenecks. My practical stack: an AI product toolbox with Claude Code and a ChatGPT connector for rapid prototyping; CustomGPT workflows for support triage and internal knowledge; Pendo product tours and in-app guides to validate behavior changes; Intercom for customer support ai strategy; and tight CRM integration via HubSpot to measure revenue impact. The outcome is faster idea-to-learning cycles, sharper telemetry, and far cleaner handoffs.

For roadmapping, I prioritize thin slices that prove value early—shipping narrowly scoped assistants or copilots, then expanding with product roadmapping and sprint planning that ties capability unlocks to outcomes. A unified analytics platform helps compare human-only baselines to augmented workflows, while agentic AI patterns automate routine steps under strict guardrails.

Risk is a product surface, not a side task. I require explicit policy gates (PII handling, red-teaming, bias audits), clear escalation paths, and incident playbooks. When we treat policy and reliability as features, customers reward us with deeper adoption and higher trust.

If you’re pursuing the AI Product Owner path, build a portfolio around shipped learnings: the experiment you killed with data, the safety constraint you designed, the postmortem you led, and the business metric you moved. That story—evidence of disciplined discovery, responsible delivery, and real-world results—is exactly what teams (and boards) want to see in 2026.

Inspired by this post on Product School.

November 26, 2025
How I Use ChatGPT to Supercharge Product Management: Workflows, Prompts, and PM Playbooks

I treat ChatGPT as a force multiplier across the entire product lifecycle—from discovery and strategy to delivery and growth. Unlock workflows, prompts, and real PM tips showing how ChatGPT quietly reshapes product management behind the scenes.

My goal is pragmatic: turn generative AI into repeatable, measurable leverage for product discovery, product roadmapping and sprint planning, stakeholder management, and product-led growth without sacrificing quality, privacy-by-design, or judgment. This is how I apply LLMs for product managers in a way that strengthens customer empathy and speeds up decision cycles.

In discovery, I use ChatGPT to synthesize interviews, categorize sentiment, and surface emergent themes faster than a manual pass. I’ll feed it anonymized notes and ask for Jobs-to-be-Done statements, contradictory signals to validate, and the top three risks to our hypotheses. When the corpus gets large, I pair it with a retrieval-first pipeline and apply context window management so outputs stay grounded in real customer data.

On strategy and positioning, I draft and refine a crisp value proposition, clarify points of parity, and identify competitive differentiation. I ask ChatGPT to convert inputs into outcomes vs output OKRs, pressure-test assumptions, and produce a one-page narrative that even non-technical stakeholders can engage with. The result is faster alignment and fewer meetings to get to the same level of clarity.

For planning and delivery, I use ChatGPT to accelerate PRD outlines, user stories, and acceptance criteria, while explicitly requesting edge cases, failure states, and non-functional requirements. I’ll have it map risks to mitigations and suggest simple instrumentation aligned to DORA metrics and incident management readiness—useful when we’re iterating within a CI/CD cadence.

In experimentation, ChatGPT helps me frame strong A/B testing plans, calculate a minimum detectable effect (MDE), and sanity-check sample sizes. I also use it to translate metrics into plain language updates for the team, connect learnings to the next experiment, and propose follow-up analyses for retention analysis or activation bottlenecks.

For growth and onboarding, I prompt ChatGPT to generate hypotheses for user activation, in-app guides, and tooltip design that match personas and JTBDs. It drafts variations I can quickly test through Pendo or similar tools, supports product-led growth motions, and helps craft contextual copy that aligns with our value proposition without adding cognitive load.

Stakeholder communications get sharper and faster. I’ll ask for concise executive summaries, a version tailored for engineering leaders, and another for customer-facing teams. It’s especially effective for QBRs vs OKRs updates, where I need crisp narratives tied to outcomes, plus a plain-English articulation of risks and trade-offs for empowered product teams.

The guardrails matter. I set clear AI risk management boundaries, prevent any sensitive data from entering prompts, and align usage with data governance and regulatory compliance requirements. I also version and review prompts just like product artifacts, so the best ones evolve into a durable AI product toolbox the whole team can use.

If you’re getting started, pick one high-friction workflow—say, interview synthesis or PRD drafting—and timebox a week to build a repeatable prompt set and review rubric. Measure cycle-time savings and quality deltas, then expand to a second workflow. Within a month, you’ll have a lightweight operating model for AI Strategy that compounds across your roadmap.

Inspired by this post on Product School.

November 20, 2025

Tag: outcomes vs output OKRs

What you should be able to do after 12 months

Months 1-3: Learn enough AI to make sound product decisions

Build an operator’s mental model

Apply the foundations to product discovery

Months 4-6: Prototype the experience and build its evaluation system

Use evaluation as a development method

Instrument behavior before launch

Months 7-9: Ship a bounded workflow, not an open-ended assistant

Map the production loop before building it

Monitor the experience at four levels

Months 10-12: Turn one successful product into a repeatable system

Use common analytics without erasing product context

References

Start with the decision, not the dashboard

Instrument the journey so the numbers can be trusted

Read growth as a sequence of customer behaviors

Turn a behavioral signal into a disciplined experiment

Build an operating cadence that changes the roadmap

Run a weekly journey review

Use a monthly portfolio review for allocation

Key takeaways

References

Measure one customer journey at two levels

Treat activation as a hypothesis about future retention

Use a driver tree to show exactly where growth breaks

Segment for decisions, then use benchmarks for calibration

B2B PLG measurement FAQ

Should the headline metric count users or accounts?

What is the best north-star metric for B2B product-led growth?

Should sales-assisted accounts be excluded?

When should the activation definition change?

References

Choose a problem that becomes more defensible with time

Convert a 25-year belief into present-day decisions

Key takeaways

Own only the stack required to keep the promise

Replace planning theater with a customer-learning system

Make patience accountable in your next strategy review

References

Prioritize the change before you prioritize the work

Translate strategy into a decision boundary teams can use

Prioritize opportunities with evidence, then shape the portfolio

Use an eligibility gate before a ranking formula

Use the Kano Model to balance protection, improvement, and exploration

Run leadership reviews that force choices, not status reports

Key takeaways for your next prioritization review

References

Write the outcome contract before the launch plan

Carry the buyer from a credible promise to acceptable proof

Measure the causal chain, not a pile of channel metrics

Turn the launch into a decision loop that can scale

Key takeaways

References