Month: June 2026

Behavioral Analytics for AI Agent Activation and Retention
AI agent growth is not simply a matter of attracting more users or generating more conversations. The central product question is whether people reach a useful outcome quickly enough to return, and whether the organization can respond intelligently when that journey breaks down.

The two source accounts describe complementary parts of that challenge. The Pendo account focuses on measuring and improving the path from first use to recurring engagement, while the Amplitude account focuses on turning observed behavior into workflows across product and go-to-market systems. Together, they suggest an operating model in which analytics first identifies meaningful behavior and then helps teams act on it.

Treat the agent as a measurable product experience

An AI agent can appear busy without becoming valuable. Conversation counts, prompt volume, and feature exposure show activity, but they do not establish that users completed meaningful work. Behavioral analytics becomes more useful when the agent is treated as an end-to-end product experience rather than an isolated interface.

The Pendo account describes mapping the journey from activation and a first successful task through repeat usage and habit formation. It also reports that the team defined stickiness around the agent’s jobs to be done instead of relying on an unspecified generic engagement measure. That distinction matters because a meaningful return pattern depends on the work the agent is intended to support.

The Amplitude account extends the same reasoning beyond analysis. It describes agents operating on verified product events, including high-intent milestones, changes in feature adoption, and signals associated with churn risk. In this model, instrumentation is not merely a reporting layer. It supplies the evidence used to trigger a subsequent decision or workflow.

A practical measurement chain therefore begins with eligibility and exposure, continues through an attempted interaction and a verified first success, and then examines whether users achieve additional useful outcomes over later sessions. The exact events must reflect the agent’s purpose. The durable principle is to measure completed value, not just interface activity.

Define activation as the first meaningful success

Activation is most informative when it marks a result that demonstrates the agent’s value. Opening the agent, viewing a suggested prompt, or sending a message may be necessary steps, but none necessarily proves that the user accomplished the intended task.

Pendo’s account reports that activation contained unnecessary cognitive load and that the first-session path did not consistently lead users to a quick win. The reported response included simplifying onboarding, clarifying prompts, and using in-app guidance to make valuable capabilities easier to recognize. This connects activation analysis directly to product design: when users stall before a first success, the remedy may involve reducing choices, clarifying expectations, or improving contextual guidance rather than adding more agent functionality.

Journey analysis should separate several different failure modes. A user who never starts may not understand the value proposition. A user who starts but abandons the task may encounter interaction friction. A user who receives an answer but does not act on it may lack confidence, context, or a clear next step. Combining these outcomes into one conversion rate would hide the product decision each one implies.

Activation should also be connected to the behavior that follows it. If an event labelled as success has no observable relationship with later value, it may be a convenient instrumentation point rather than a meaningful milestone. Behavioral cohorts can help compare subsequent engagement among users who reached different early outcomes, although those relationships should initially be treated as diagnostic evidence rather than proof of causation.

Measure retention as repeated value, not raw frequency

Retention analysis asks whether users continue to obtain value after activation. For an AI agent, that requires more context than a simple count of returning users. A return can indicate trust and usefulness, but it can also reflect an unresolved task, repeated correction, or a workflow that unnecessarily forces the user back.

The Pendo account presents stickiness as a proxy for trust and reports a 61% increase after the team established Agent Analytics and ran a series of product experiments. The same source associates stronger return behavior with proactive anticipation of intent and associates context-rich interactions, supported by timely nudges and in-app guides, with deeper engagement over later sessions. These are reported findings from one product account, not an independently verified benchmark for other agents.

The more transferable lesson is methodological. Teams can segment retention by the early behavior users completed, the type of task attempted, and the context surrounding the interaction. They can then examine whether retained users are repeating successful work, expanding into additional useful tasks, or merely revisiting the same point of friction.

This approach also guards against optimizing stickiness in isolation. Frequent use is desirable only when it reflects repeated useful outcomes. Where the agent’s job is to resolve work efficiently, fewer interactions may sometimes represent a better experience than a longer conversation. The retention definition must therefore stay anchored to the user’s intended result.

Turn behavioral signals into controlled interventions

Analytics creates leverage when it changes what the product or organization does next. The sources cover two levels of intervention. Pendo describes changes inside the experience, such as onboarding simplification, prompt clarification, contextual guides, tuned triggers, and tighter feedback loops. Amplitude describes workflows that cross system boundaries, such as initiating outreach for churn risk, triggering experimentation when adoption falls, activating users after high-intent milestones, and updating CRM records.

These approaches are complementary. In-product interventions can help a user complete the current journey, while cross-functional workflows can coordinate actions that require product, sales, or customer-success involvement. The behavioral signal should determine which response is appropriate: interface friction calls for a product change, an unmet need may call for research, and an account-level risk signal may justify a carefully governed human follow-up.

Automation does not remove the need for experimentation. Pendo reports using A/B tests to evaluate changes, while the Amplitude account emphasizes success criteria, governance guardrails, observability, iteration, and aligned performance measures. A sound operating loop combines those ideas: define the target behavior, verify the underlying events, choose an intervention, test its effect, monitor unintended outcomes, and retain only changes that improve the intended user result.

That loop is especially important when an agent both interprets behavior and initiates action. Event quality, ambiguous thresholds, or drifting agent performance can otherwise scale an incorrect decision. Human ownership, visible workflow history, and clear evaluation criteria help distinguish useful orchestration from automated noise.

Key takeaways
- Define activation around a verified first useful outcome, not merely opening the agent or sending a prompt.
- Analyze each stage between exposure, attempted use, successful completion, and later return so different forms of friction remain visible.
- Interpret retention through repeated value and task context; activity alone is not sufficient evidence of trust.
- Use behavioral cohorts to generate hypotheses, then apply controlled experiments before treating an observed relationship as causal.
- Match interventions to the signal: improve the experience when friction is local, and use governed cross-functional workflows when follow-through spans multiple systems or teams.
- Monitor data quality and agent performance because automated actions can amplify both accurate and inaccurate interpretations.
The next stage of AI agent maturity will depend less on adding visible capabilities and more on connecting meaningful outcomes to disciplined follow-through. Teams that can measure the first win, recognize repeated value, and govern the actions between them will be better positioned to turn agent adoption into durable product behavior.

References
- Shivam.Consulting Blog – Stop Guessing: Deploy AI Agents That Act on Real User Behavior with Amplitude Workflows
- Shivam.Consulting Blog – Inside the 61% Stickiness Lift for Pendo’s AI Agent: My Agent Analytics Playbook
June 23, 2026
Designing Awe: Intentional, Sensory-Rich Experiences to Elevate Product Leadership

What makes an event truly unforgettable—and what can product teams learn from it? As I listened to an illuminating conversation about crafting experiences, I found myself reflecting on how the same principles translate directly to product strategy, continuous discovery, and the day-to-day work of product management leadership.

Listen to this episode on: Spotify | Apple Podcasts

In this episode, the conversation explores how Petra Wille and her co-organizer Arne design experiences (not just events) at Product at Heart and their Product Leadership gatherings. From a candlelit speakers' dinner in a rosemary-covered greenhouse to a disco ball that appeared for exactly 20 seconds, the details reveal how intentional design, sensory cues, and a little bit of goofy magic help people shed their corporate armor and open up to real inspiration and connection. The parallels back to product design are unmistakable—from designing for delight and awe, to the classic question of who you're choosing to serve.

In my role leading product teams, I see how these choices map directly to empowered product teams and the rigor of product discovery: you can’t please everyone, so you design deliberately for the right someone. That means curating for depth over breadth, and giving people agency through self-select paths—much like the "Hard Problems Club"—so niche audiences feel seen within a broader experience. It’s the same discipline we apply to product strategy and value proposition: clarity about the segment, the problem, and the kind of transformation we’re creating.

The programming choices here are also instructive. The team designed the Product at Heart Leadership Event across one and a half days, including a farm excursion and a leadership improv workshop. Those decisions weren’t ornamental; they were part of a deliberate journey that builds safety, curiosity, and connection—precisely the conditions that help leaders generate better ideas and have the real conversations that move work forward. In product, we build that journey through thoughtful onboarding, product tours, and progressive discovery.

I was struck by the role of sensory experience in unlocking inspiration—rosemary, zucchinis-as-instruments, and a three-meter disco ball. Too often, we conflate more features with more value; in practice, well-placed sensory or interaction details do more to create delight than another settings panel ever will. The same is true in software: microinteractions, purposeful motion, and small moments of surprise can change how people feel about your product, which changes how they use it.

What Petra calls "serendipity moments" resonated with me. Creating space for people to shed their corporate armor and make unexpected connections is as critical in community and conference networking as it is in a product’s information architecture. When we design pathways that invite contribution—opt-in tracks, intimate circles, and unstructured time—we invite the kind of learning and collaboration most teams say they want but rarely experience by accident.

The reflections on the World Domination Summit and the idea of designing for awe added a useful distinction: the difference between novelty and awe. Novelty is pleasant but fleeting; awe takes people out of the mundane and expands what feels possible. In product terms, awe is the moment a user realizes a new capability not only solves a task but changes how they think about their work. That’s the bar I want my teams aiming for in our roadmapping and journey mapping.

There’s also a pragmatic lesson in investment. The details that seem extravagant are often the ones that matter most—and not because they’re expensive, but because they’re intentional. A disco ball that appears for exactly 20 seconds signals care, timing, and narrative. In product, that’s the difference between a scattered backlog and a cohesive story: choosing the few standout moments that deliver meaning, not just motion.

For product leaders, the translation is clear: define who you serve, design for choice and delight, and invest in the details that unlock connection and insight. Whether it’s a farm excursion and leadership improv or a carefully crafted advanced-user path, the goal is the same—create conditions for real breakthroughs and lasting behavior change.

"If we can get through that armor and shut off the business reflexes, then inspiration is more likely to hit." — Petra Wille

Resources & Links

Follow Teresa Torres: https://ProductTalk.org

Follow Petra Wille: https://Petra-Wille.com

Mentioned in this episode

Strong Product People by Petra Wille

Product at Heart — Speakers Dinner Leadership (see the rosemary garden!)

Reflections on Product at Heart’s 2026 Leadership Event

Arne Kittler of Product at Heart

Product at Heart Conference — Hamburg 2026 (read about the Hard Problem Clubs)

House of Beautiful Business — an event that inspired Petra and Arne's approach to sensory experience

Petra’s recap for this year’s House of Beautiful Business in Tangier — Rituals, Rugs, and Radical Tenderness – My Experience at the House of Beautiful Business in Tangier

World Domination Summit — founded by Chris Guillebeau; "How to live a remarkable life in a conventional world"

Derek Sivers — mentioned as a spoken word contributor at experiential events

Have thoughts on this episode? I’d love to hear your perspective in the comments—what “awe moments” are you intentionally designing for your teams and your users?

Inspired by this post on Product Talk.

June 23, 2026
Migrate Analytics Platforms Without Chaos: 7 Proven Lessons to Plan, Move, and Land Cleanly

I’ve led and rescued more analytics migrations than I can count, and I know the pressure: every event, dashboard, and decision pipeline depends on getting it right. Migrating analytics platforms doesn't have to be painful. Get seven lessons from Human37 and Amplitude to help your team plan, migrate, and land cleanly.

Here’s how I approach this work so teams keep momentum, regain trust in their numbers, and accelerate product-led growth on a unified analytics platform—without the rework and stakeholder fatigue that typically follow.

Lesson 1 — Start with outcomes, not events. Before moving a single event, I align leaders on the questions we must answer and the decisions we must speed up: activation, retention, and expansion. I map those goals to a simple driver tree, then back into the behavioral analytics we need. This trims noise, tightens scope, and ensures Amplitude analytics (or any destination) is instrumented for decisions, not vanity metrics.

Lesson 2 — Audit and map your data with rigor. I inventory current events, properties, IDs, and sources, then define a target schema with clear naming conventions, ownership, and versioning. Data governance and privacy-by-design are non-negotiable: we separate PII, document consent paths, and remove legacy debris. This step prevents schema drift and makes platform scalability sustainable.

Lesson 3 — De-risk the cutover with a phased plan. Rather than a big-bang switch, I dual-run critical flows, compare telemetry, and use feature flags to roll forward (and back) safely. Observability and anomaly detection are my guardrails: I monitor volume, cardinality, and event timeliness to spot regressions early—long before executives notice broken charts.

Lesson 4 — Treat instrumentation like product code. I wire schema checks into CI/CD, enforce typed analytics wrappers, and validate payloads pre-merge. With docs-as-code, the tracking plan stays current and reviewable. This keeps quality high at scale and avoids the slow death of broken funnels caused by well-meaning quick fixes.

Lesson 5 — Enable the people, not just the platform. Tools don’t create insight—teams do. I run hands-on enablement with product tours and in-app guides tailored to each role, establish communities of practice, and publish short playbooks for common questions (activation analysis, cohort retention, and journey mapping). When customer success and growth marketers can self-serve, adoption sticks.

Lesson 6 — Land cleanly with fast, visible wins. Within the first two weeks post-cutover, I showcase analyses that matter: retention analysis by use-case, friction points via session replay and heatmaps, and conversion lift by segment. These quick proofs build confidence, reinforce the value proposition, and keep stakeholders engaged through the longer tail of hardening.

Lesson 7 — Govern and evolve continuously. After go-live, I schedule schema reviews, backlog grooming, and QBRs to prune events and refine definitions. Ownership is explicit, and changes flow through the same review process as code. This keeps the unified analytics platform trustworthy as the product (and org) changes.

I’ve seen this playbook turn skepticism into momentum. In one migration I inherited mid-flight, we refocused on decisions, tightened governance, and phased the rollout; the team moved from fire drills to confident launches—and stakeholders finally believed the numbers again.

If your team is staring down a migration, anchor on outcomes, automate quality, and invest in enablement. With disciplined execution readiness and the lessons I’ve applied alongside partners like Human37 and platforms like Amplitude, you can move fast, reduce risk, and land cleanly—without the chaos.

Inspired by this post on Amplitude – Perspectives.

June 22, 2026
How I Make AI Agents Speak Like Our Team: A Conversation Design Playbook That Lifts CSAT

If nobody on our team trains the Agent on how to communicate, it will sound like an LLM when it speaks to customers—because it is one. I never want a customer to feel like they’re talking to a machine that doesn’t get them. That’s why I treat conversation design as a core product capability, not an afterthought.

Conversation design is an emerging discipline in AI-first support teams built to solve this exact problem. In practice, I make someone explicitly own how the Agent communicates—tone, structure, level of detail, customer experience, and the handoff and escalation process—because that’s where trust is won or lost.

When there’s no clear owner and no explicit guidance, the Agent starts making its own choices. I’ve seen it over-explain when a short answer would do, reply in a flat tone when a customer is frustrated, or trigger a handoff too late. None of those are model problems; they’re design problems.

The cost is measurable. Customers who get awkwardly structured responses won’t trust the answer—even when it’s accurate—so they escalate to a human to hear the same thing phrased differently. Others will skip the Agent entirely. And when the Agent does hand off, a poor transition means the support rep inherits a frustrated customer. Every one of these outcomes is avoidable; conversation design exists to prevent them.

I’ve seen A/B tests where a warmer, more conversational opening message meaningfully lifted customer satisfaction—CSAT moved from 72.8% to 78.4%. A single design change, applied to the very first message, drove a measurable difference. That’s the kind of leverage I look for as a product leader.

Here’s the scope I use when I talk about conversation design—five areas that shape the customer experience end to end:

1) Tone and personality: Define the Agent’s voice, level of detail, and how formal or casual it should sound—and specify where that register adapts to the situation (for example, urgent access issues versus exploratory product questions).

Design how your AI agent talks. Set tone, style, and product naming rules, then preview replies instantly. Clear callouts showcase brand voice consistency and flexible formatting so your bot communicates like your team.

2) Response structure: Ensure the Agent matches the level of detail to the customer’s request, keeping answers tight when the ask is simple and expanding only when complexity demands it.

3) Handoff logic: Decide when to escalate, how to communicate the transition, and what context to carry over so the human teammate can help immediately without rework.

4) Interaction flow: Map how a conversation progresses—clarifying questions, answers, resolution, or handoff—and design for smooth pivots when customers change direction.

5) Response quality: Go beyond technical correctness to ensure answers feel clear, helpful, and on-brand. Accuracy without clarity erodes trust.

To put this into practice, I start with the feel of the conversation. Before tuning individual responses, I write down one tight paragraph describing the Agent’s voice. I don’t need a full brand bible—just a north star I can use to make consistent decisions about tone. The voice stays consistent, while the register adapts to the context: a locked-out customer needs directness and speed; a feature explorer might value more context and examples.

I design the handoff with extreme care because it’s one of the highest-friction moments. Customers shouldn’t have to re-explain anything. The support rep should receive the full conversation history, the underlying context, what the Agent already tried, and why the escalation happened. Even the phrasing matters—“Let me connect you with a teammate who can help with this” feels very different from a silent handover.

The new CX Score adds context to every conversation: a donut chart surfaces drivers like policy feedback and effort, while a side panel explains why this interaction earned a 3 based on signals from an AI agent chat.

I also build a failsafe. If the Agent can’t resolve the issue cleanly, a graceful fallback still gives the customer a smooth experience. A customer might be frustrated with AI at that point, but a well-handled transition can turn that around.

Follow-ups deserve the same rigor as handoffs. If someone drops mid-conversation—with the Agent or a human—how do we reach back out to confirm they got what they needed? Most teams miss this moment; customers don’t.

Another common pitfall is over-explaining. The Agent has access to a lot of information, and left unguided, it will overshare. The fix is simple: match the answer’s depth to the question. A password reset shouldn’t take three paragraphs; a complex integration might. When there’s more to offer, the Agent should ask before expanding.

I also design for the conversation the customer is actually having—not the script I wish they’d follow. Customers change direction, stack questions, or bring up unrelated follow-ups. The Agent should pivot with them, not force them back into a rigid flow. I also consider whether flows vary by channel and whether different segments merit distinct experiences.

On the instruction side, I keep guidance short. Teams often react to edge cases by adding more rules until the LLM is parsing paragraphs before it can reply. I’ve seen it everywhere. My rule: if it’s about content or information, it belongs in the knowledge base. If it’s about tone or handling specific situations, it belongs in the Agent’s instructions. “Be direct about pricing” does more than a paragraph explaining the philosophy behind your pricing communication strategy.

If you’re using Fin, much of this work happens in Guidance. It’s where conversation design takes shape, helping you define how the Agent should sound, how much it should say, and how it should respond in different situations.

On a crisp grid, 'Blueprint' appears as editable vector paths, underscoring a methodical plan. The image promotes the AI Agent Blueprint—a framework to launch and scale customer service automation with confidence.

Most teams won’t hire a dedicated conversation designer on day one—that’s fine. But someone still needs to own the Agent’s communication, even if it’s part of an existing role. I’ve often seen this start within support operations or knowledge management. As the Agent scales to more conversations, the responsibility becomes formal—and eventually becomes a dedicated role.

Here’s how I’d start, step by step:

1) Name an owner. Make accountability explicit; it doesn’t have to be a new hire.

2) Pick one conversation type that isn’t landing well. Look for cases where the Agent answered correctly but the customer still escalated or left negative feedback. If you’re using Fin, CX Score can help you surface these; it shows which topics and conversation types are scoring poorly and why, so you can see whether the issue is answer quality, customer effort, or something else.

3) Audit the Agent’s instructions. If they’ve grown beyond a few focused rules, trim them. Move content into the knowledge base and keep instructions focused on behavior.

4) Fix your worst handoff. Review a handful of conversations that escalated. Did the customer have to repeat themselves? Did the rep have enough context? Redesign that single transition first.

The impact of these small improvements compounds. A warmer opening can lift CSAT, trimming instructions makes responses sharper, and a better handoff prevents reps from inheriting frustrated customers. None of this requires new knowledge—just someone paying close attention to the conversation itself and designing it with intention.

Inspired by this post on The Intercom Blog.

June 18, 2026
A Systematic Product Launch Strategy Beyond Announcement Day
A product launch is most useful when treated as an operating system for adoption, not a communications deadline. The central challenge is to connect a technically credible product story with clear positioning, coordinated execution, and evidence that customers are reaching the intended outcomes.

The supplied practitioner account from Shivam.Consulting Blog provides one perspective rather than a set of independently corroborated benchmarks. Its value lies in connecting solutions engineering, product marketing, partner coordination, and product analytics into a coherent launch model that teams can adapt to their own context.

Launch strategy begins with a credible customer problem

The source describes Darshil Gandhi as a Director of Product Marketing at Amplitude responsible for product and partner launches, with previous experience as a solutions engineering principal. It argues that this combination is valuable because solutions engineering develops customer intimacy and technical credibility, while product marketing adds segmentation, positioning, and narrative discipline.

That career path points to a broader launch principle: positioning should not be created separately from the conditions in which customers evaluate and use the product. A technically accurate message can still fail if it does not identify a meaningful audience or outcome. A polished market narrative can likewise fail if sales teams cannot defend it, demonstrations do not substantiate it, or the product experience does not deliver on it.

The practical unit of launch planning is therefore not the feature alone. It is the connection among a target customer, a recognizable problem, a product capability, and an observable outcome. That connection should shape the value proposition, demonstration, enablement materials, onboarding path, and success measures. When those elements describe different versions of the product, friction appears between initial interest and sustained use.

Readiness requires one narrative across the organization

The source recommends crisp ownership and recurring execution-readiness reviews. It also emphasizes alignment among product management, engineering, solutions engineering, sales, and partner teams around a shared narrative, demonstration story, and definition of readiness. This frames stakeholder management as part of the launch design rather than an administrative task performed near release.

A useful readiness review should test whether the launch can survive contact with a customer. Product and engineering can confirm what the product does and where its boundaries lie. Solutions engineering can identify implementation questions, proof requirements, and likely objections. Product marketing can ensure that the message identifies a relevant audience and differentiates the offer without exceeding the evidence. Sales and customer-facing teams can verify whether the story is usable in real conversations.

Clear ownership does not mean that one function performs every task. It means that decision rights are visible: who approves positioning, who verifies product claims, who owns enablement, who decides whether an unresolved issue blocks launch, and who monitors adoption afterward. Recurring reviews then become decision forums rather than status meetings. Their purpose is to expose contradictions early enough to correct the message, demonstration, onboarding, or product experience.

Partner launches must reduce adoption risk

Partner launches introduce a second organization, another audience, and additional dependencies. According to the source, effective co-marketing should extend beyond a feature announcement to include validated use cases, shared success measures, and coordinated enablement. This shifts the objective from maximizing announcement visibility to making the combined proposition easier to understand, evaluate, and adopt.

A shared use case is particularly important because an integration can be technically functional without having an obvious customer purpose. The joint narrative should explain what the customer can accomplish through the combination, which part each product plays, and what conditions must be present for the experience to work. The joint demonstration should then show that same value path rather than presenting two adjacent product tours.

Shared success measures also prevent each partner from declaring success against a different outcome. Attention may matter to marketing teams, while activation, repeated use, retention, or expansion may matter more to the business case. The appropriate measures will vary by product, but they should be agreed upon before launch and connected to a defined customer behavior. Partner enablement should use the same language and proof so that customers do not receive conflicting explanations from the two companies.

Measurement turns launch activity into a learning loop

The source advocates instrumenting execution with Amplitude analytics, defining activation, conducting retention analysis, and using A/B testing across important touchpoints to evaluate messaging. These are reported practices from the supplied account, not independently verified evidence that a particular tool or method will produce the same result in every organization.

The larger strategic lesson is that launch measurement should follow the customer journey. Awareness metrics can show whether the market encountered the message, but they cannot establish whether the promise led to meaningful product use. Activation measures whether users reach an early behavior associated with value. Retention analysis examines whether that behavior continues. Experiments can help determine whether changes to messaging or onboarding improve a defined outcome, provided teams specify the hypothesis and success measure in advance.

This creates a feedback path from behavior to strategy. If the intended audience engages with the message but does not activate, the break may lie in qualification, onboarding, product friction, or a mismatch between promise and experience. If users activate but do not return, the initial use case may lack durable value or require stronger enablement. If a message variant improves response without improving product behavior, the team has learned about attention rather than adoption.

Measurement should therefore influence decisions after the release date. Teams can refine positioning when customer behavior challenges the original assumptions, improve onboarding where the value path breaks, and revise enablement when customer-facing teams repeatedly encounter the same confusion. The launch becomes repeatable when these lessons are preserved and applied to the next release rather than disappearing into a retrospective.

Key takeaways
- Build the launch around a target customer, a meaningful problem, a defensible capability, and an observable outcome.
- Combine technical credibility with segmentation and positioning so that the promise is both persuasive and supportable.
- Use readiness reviews to resolve contradictions across the narrative, demonstration, enablement, onboarding, and product experience.
- Treat partner launches as joint adoption programs with a shared use case, coordinated enablement, and agreed success measures.
- Connect awareness to activation and retention, then use behavioral evidence to improve the message and customer journey.
The strongest launch capability compounds over time: each release improves the organization’s understanding of its customers, its cross-functional decision process, and its ability to translate product value into sustained behavior. The next launch should begin with the evidence and unresolved questions left by the last one.

References
- Shivam.Consulting Blog — From Solutions Engineering to Product Marketing: Battle-Tested Launch Lessons from Amplitude
June 17, 2026
AI Inference Economics: Optimize for Value, Not Cost
AI inference economics cannot be reduced to the price of a model call. The financially relevant question is whether a change in model, latency, caching, or token use improves total product value after its effects on conversion, retention, support, and revenue are included.

A reported decision to reject a projected $2 million in inference savings illustrates the distinction. The supplied source describes lower infrastructure costs alongside weaker downstream product signals, making the proposed optimization look attractive in a FinOps report but less compelling at the business level.

The correct unit of analysis is the customer outcome

Cost per request is useful for operating an AI product, but it is not a complete measure of its economics. A cheaper request can still be expensive if it makes a user more likely to abandon a session, fail a task, contact support, or leave the product.

The source article reports that routing traffic to lower-cost options produced immediate cloud cost optimization. It also associates small increases in time to first token with greater session abandonment, subtle quality declines with lower task completion, and weaker performance in support deflection. According to the account, the resulting revenue exposure exceeded the projected expense reduction.

This reframes inference efficiency as a value equation. Direct serving cost belongs on one side; incremental conversion, retained revenue, successful task completion, and avoided support demand belong on the other. The decision should be based on the net effect rather than whichever metric is easiest to retrieve from a cloud bill.

Cost, latency, and quality form a coupled system

Model cost, response speed, and output quality are often managed as separate workstreams. In practice, changing one can move the others. A smaller or cheaper model may reduce inference expense while changing answer quality. More restrictive token limits may shorten responses but remove information needed to complete a task. Caching may improve both cost and speed for repeatable requests, yet become unsuitable where fresh or highly contextual output matters.

The source argues for treating these variables as one product system. That view prevents a local optimization from being mistaken for an overall improvement. It also makes latency distributions more informative than a single average: even when aggregate performance appears acceptable, slower experiences within particular workflows may coincide with abandonment or failed completion.

The same principle applies to quality. A model-level score matters only insofar as it represents what users need from the workflow. For a support agent, that might involve resolving an issue without escalation. For another product experience, it might involve completing a task, activating a feature, or continuing to use the service. Business instrumentation gives technical measures an economic interpretation.

Experiments must detect product harm, not just cost movement

The reported evaluation combined eval-driven development with A/B testing and defined success through conversion, retention cohorts, and Net Recurring Revenue rather than cost per call alone. It also used minimum detectable effect calculations to determine whether the tests had enough statistical power to reveal meaningful changes in latency and answer quality.

That approach suggests two complementary layers of evidence. Evaluations can identify whether model behavior changes on representative tasks, while controlled product experiments can show whether those changes matter to users and the business. Neither layer is sufficient by itself: an offline quality score may miss behavioral consequences, and a topline business metric may conceal the mechanism behind a regression.

Guardrails are especially important when the expected saving is immediate but the product damage may emerge later. Infrastructure spend can fall as soon as traffic moves. Retention and recurring-revenue effects may take longer to appear. Conversion, task completion, session abandonment, support deflection, and cohort retention therefore provide signals across different time horizons.

The evidence supplied here is one first-person case account, not independent corroboration. Its projected $2 million saving, observed correlations, and business conclusion should consequently be treated as case-specific rather than universal benchmarks. The transferable value lies in the measurement framework, not in assuming that every higher-cost model will produce a better commercial outcome.

Key takeaways
- Evaluate inference changes against total product value, including conversion, retention, support demand, and recurring revenue.
- Measure cost, latency, and AI quality together because an intervention in one dimension can alter the others.
- Pair task-level evaluations with controlled product experiments and size tests to detect economically meaningful regressions.
- Apply optimization selectively: a technique is valuable where evidence shows that it lowers cost without harming the customer outcome.
A selective optimization roadmap

The alternative to indiscriminate cost cutting is not unlimited inference spending. The source describes a balanced roadmap built around targeted caching where experiments showed no adverse outcome, dynamic routing for task-specific workloads, and stronger observability to detect quality regressions early.

Each method addresses a different part of the economics. Targeted caching can remove redundant work in stable interactions. Dynamic routing can reserve more capable models for tasks that justify them while sending simpler work to less expensive paths. End-to-end observability can connect routing, model, token, latency, and quality data with the behavior that follows.

This also clarifies governance. FinOps teams can continue applying pressure to unit costs, while product teams define outcome guardrails and analytics teams verify the net effect. A proposed saving becomes ready for broader rollout only when the organization can see both the expense reduction and the customer or revenue impact.

As AI products scale, the strongest operating discipline will be selective rather than reflexive: spend less where evidence supports it, invest more where inference creates measurable value, and revisit routing decisions as workflows and user behavior change.

References
- Shivam.Consulting Blog — Why I Rejected $2M in AI Inference Savings to Protect Conversion, Retention, and Revenue
June 17, 2026
How I Use Novus, the First Product Agent, to Turn Rapid Releases into Measurable Wins

In a world of relentless CI/CD and accelerating release trains, product leaders like me can’t afford lagging signals or fuzzy readouts on what’s truly moving the needle. I need immediate, trustworthy feedback that connects code shipped to outcomes achieved and customer value created.

Coding agents compress weeks of development into hours, but the faster your codebase changes, the harder it is to know what’s actually helping end-users.

That tension is exactly why I brought Novus into my product toolbox. To keep up with the pace of development, over 600 product teams are already using Novus, the first-of-its-kind product agent, to automatically set itself up, monitor product data, and tell you what to do next.

From my chair, that promise matters only if it translates into clear decisions. With Novus, I’ve been able to tighten the loop between experimentation and learning: it pairs eval-driven development with behavioral analytics and observability so I can see how a release influences activation, engagement, and retention—without spelunking through fragmented dashboards. The agentic AI backbone reduces the manual stitching I used to do across events, cohorts, and funnels, letting me focus on prioritization and product strategy instead of report wrangling.

Day to day, Novus fits naturally into our AI workflows. It surfaces anomalies early, clarifies trade-offs, and frames next-best actions in the language of outcomes. Because it plugs into a unified analytics platform approach, I can maintain continuous discovery at scale while preserving the rigor of Agent Analytics: hypotheses are explicit, telemetry is consistent, and results are traceable. That’s the operating cadence I expect from modern product management leadership.

If your roadmap moves faster than your learning loops, a product agent can be the missing link between speed and certainty. Novus helps me convert rapid releases into measurable wins, keeping the team aligned and confident about what to build next—and just as importantly, what to stop doing.

Inspired by this post on Pendo – Best Practices.

June 17, 2026
Stop Forcing Organizational Change: How I Create Impactful Product Habits Without Burnout

Organizational change is exhausting—so I stopped trying to force it. After years of leading product teams, I’ve learned that trying to fix the people and processes around me is almost always wasted energy. If you’re eager to champion a better way of working inside a resistant organization, there’s a more sustainable path that actually drives results.

Here’s my starting point: individuals can’t change their organizations. I’m often asked to “train the PMs” or “install discovery practices,” but without executive sponsorship, organizational pain, and urgency, nothing moves. I now decline those well-intentioned requests and focus instead on creating the conditions for change.

My readiness check is simple and ruthless. Pain — organizational pain felt by leadership, not just you. Urgency — there has to be a cost to inaction. Awareness — people need to know solutions exist. If I can’t articulate these three clearly, I narrow the scope to what my team and I can control and demonstrate.

Practically, I elevate organizational pain by making it visible and quantifiable: missed outcomes vs output OKRs, customer churn tied to unmet needs, increased operational load from legacy workflows, or cycle time and deployment friction that slow learning. I create urgency by modeling cost-of-delay and showing the trade-offs we’re already making. And I build awareness by running small, transparent experiments that show there’s a credible alternative—continuous discovery, empowered product teams, and product trios solving for outcomes, not output.

“Organizational change starts with you — but it starts with you changing you, not your organization.” I take that literally. I refine my own discovery habits, make my assumptions explicit, and raise the quality bar on evidence. Whether it’s adopting AI responsibly in our workflow or redesigning how we do customer interviews, I change me first and let the results speak.

Show your work, don’t advocate your conclusions. Instead of arguing for “the right way,” I surface the pain, share how I reached my conclusion, and let others draw their own insights. I circulate decision logs that link customer evidence to product decisions, include short snippets from interviews, and map outcomes to proposals. That transparency lowers defenses, builds stakeholder buy-in, and shifts the conversation from opinion to observable facts.

Working within constraints, not against them. Stuck in a rigid, feature-factory process? You don’t have to change quarterly planning to do great discovery. Add customer context. Frame features around outcomes. Layer in the habits without touching the formal process. I’ve embedded discovery into existing rituals: adding customer insights to PRDs, tying features to measurable outcomes, and using thin-slice experiments that fit inside current delivery cadences. Over time, those habits compound.

The ripple effect is real. Teams that do great work and show it publicly become the ones everyone wants to emulate. That’s how influence actually spreads. I make results visible—brief Looms walking through our reasoning, dashboards that track outcome movement, and internal write-ups that highlight how the work changed a customer behavior. Visibility turns quiet wins into organization-wide momentum.

If you want a place to start this week, try this: define a sharp outcome, run three quick customer interviews, share your notes and decision rationale openly, and ship one small experiment tied to that outcome. Use the data to refine your next step and repeat. In a month, you’ll have a trail of evidence, not a pitch deck—and that’s what shifts minds.

In the end, sustainable change comes from consistent practice, not fiery advocacy. Focus on outcomes, make the pain and cost-of-inaction undeniable, and keep showing your work. The organization will move when it’s ready—your job is to make “ready” happen sooner by modeling what good looks like and making it impossible to ignore.

Inspired by this post on Product Talk.

June 16, 2026
Why Product Engineers Are Transforming Software Delivery: Ownership, Speed, and Real Impact

I’ve watched the rise of product engineering up close, and it’s reshaping how we build software. The old model of rigid handoffs and separate functions is giving way to small, empowered product teams where engineers own the customer problem end to end. That shift isn’t just cultural—it’s a performance advantage that compounds with every release.

I often summarize it this way: “Product engineers are taking over. They ship code, talk to users, and own outcomes—no handoff required. Here’s what the role is, and why it matters now.”

When I say “product engineer,” I’m describing a builder who goes beyond writing code. I expect them to partner in product trios with product management and design, participate in continuous discovery, and make decisions grounded in product strategy and real customer insight. They don’t toss features over a wall; they own the problem, the solution, and the measurable outcome.

Why now? Modern delivery practices like CI/CD and feature flags compress feedback loops, while behavioral analytics and session replay make customer friction visible in real time. As expectations rise for quick iterations and clear value, teams that reduce handoffs and align around outcomes outperform on DORA metrics such as deployment frequency and lead time for changes.

Day to day, a strong product engineer blends discovery and delivery. They join customer interviews, review support tickets, analyze usage patterns, and run A/B testing to validate hypotheses. Then they ship code in small, safe increments, instrument telemetry, and watch adoption and retention signals to confirm they’re moving the numbers that matter.

Team shape matters. I favor compact, cross-functional squads anchored by product trios, each with explicit outcomes vs output OKRs. Product engineers often operate like forward deployed engineers, partnering with customer success and solutions engineering to learn at the edge of real-world usage. This proximity to customers turns ambiguity into insight—and insight into product leverage.

Accountability is concrete. We track DORA metrics for delivery health and pair them with product outcomes such as activation, time-to-value, and Net Recurring Revenue (NRR) drivers. The combination keeps us honest about both how fast we move and whether what we ship truly works for customers.

The hiring profile is distinct. I look for engineers who are curious about the “why,” comfortable with trade-offs, and energized by customer conversations. They can navigate architectural complexity, but they also translate user feedback into crisp product bets. Many grow into natural facilitators of discovery rituals and developer evangelism across the organization.

If you’re getting started, pilot a single squad. Establish clear outcomes vs output OKRs, invest in CI/CD and feature flags, and commit to continuous discovery with weekly customer interviews. Give the team ownership of a KPI tied to product strategy, and measure progress with DORA metrics plus usage and retention signals. The early wins—fewer handoffs, faster learning, tighter feedback loops—build momentum quickly.

In short, product engineers thrive where accountability, autonomy, and user empathy meet. They reduce wasteful coordination, shorten the path from insight to impact, and ensure we ship code that customers actually adopt. That’s why this role is reshaping how software gets built—and why the teams that embrace it will set the pace for everyone else.

Inspired by this post on Pendo – Perspectives.

June 15, 2026
Salesforce to Acquire Fin for ~$3.6B: Powerful AI Synergy, Product Strategy Takeaways

I’m processing a milestone moment for SaaS, AI strategy, and product leadership. One statement captures the news with clarity: “We’re excited to share that we just signed an agreement for Salesforce to acquire Fin for ~$3.6B. The transaction is expected to close in the fourth quarter of Salesforce’s fiscal year 2027.” As a product leader, I see this as a high-conviction bet on agentic AI, Customer Agents, and CRM integration at massive scale.

The backstory matters, and it’s remarkable: “Fin started as Intercom 15 years ago. We changed our name to cap our transformation just weeks ago. We were a darling of the SaaS era and invented so many of the patterns you see in software today. Nearly four years ago, in need of a reboot, we jumped on weeks-old modern LLMs to create and define the category we know as Customer Agents today.” That arc—from SaaS pioneer to LLM-powered category creator—illustrates how bold pivots, shipped with urgency and clear product strategy, can reset the trajectory of a company and a market.

From a product management lens, this deal reinforces a few truths: category creation rewards those who move first with conviction; “reboots” succeed when they’re anchored in genuine customer value; and modern LLMs, applied through disciplined roadmapping and eval-driven development, can unlock step-change outcomes in customer support ai strategy and product-led growth. It also signals the rising centrality of agentic AI and operational AI workflows inside the CRM.

The leadership dimension is just as instructive. As the announcement framed it: “Salesforce invented modern software and SaaS. And Marc Benioff is like the final boss of tech founder CEOs. In seat for 27 years, he’s one of the last of his era. Still pushing, pivoting, placing big bets.” That ethos—placing big, principled bets while adapting the operating model—sets the tone for what sustained product management leadership looks like at scale.

Customer continuity and acceleration are clearly emphasized: “To our customers: Over the past few years we’ve been shipping intensely. Including recently our groundbreaking model, Apex, and our paradigm-defining internal agent, Operator. With the resources of Salesforce this will only accelerate. And yet little will practically change. I’ll still be CEO, Des will still be running R&D, we’ll both still be committed to continuing to lead this category. Thank you very sincerely and deeply for your belief in us.” For practitioners, the signal is strong: continued focus on shipping, sharper execution readiness, and tighter integration paths inside the Salesforce ecosystem.

Smiles, clinking glasses, and a roundtable toast in a cozy private room capture the energy of a big day—celebrating Salesforce's definitive agreement to acquire Fin and the teams joining forces for what's next.

There’s a human heartbeat here too: “While this is not the end, it is a major, pivotal, special, and emotional moment for us.” Moments like this remind me that building enduring products is equal parts craft and courage—powered by teams who commit to the long game, navigate uncertainty, and still ship relentlessly.

Strategically, I expect near-term priorities to center on secure data flow and governance, deep CRM integration, and unifying telemetry for Agent Analytics across channels. On the roadmap, I’d anticipate tighter alignment between LLM safety, retrieval-first pipelines, and enterprise-grade observability—plus thoughtful go-to-market strategy enabling sales-led growth to complement product-led growth. The real unlock comes when Customer Agents are natively orchestrated with Service, Sales, and Marketing workflows—measured with clear outcomes vs output OKRs and reinforced by robust knowledge management.

For fellow product leaders, the takeaways are actionable: define category boundaries with crisp value propositions; balance speed with governance; invest in eval-driven development and continuous discovery; and keep your product trios aligned around measurable customer outcomes. Above all, build the operating cadence—metrics, rituals, and talent—that lets you compound small wins into durable differentiation.

And I appreciate the spirit of this closing line: “And now, time to get back to work. See you at our next product launch in a few weeks. (:” That’s the mindset that turns a headline into execution: celebrate briefly, then ship the next proof point.

Inspired by this post on The Intercom Blog.

June 15, 2026
Claude Code for Product Managers: Accelerate Prototypes, Validate Faster, Ship with Confidence

I build products under constant pressure to learn faster without breaking trust. Claude Code has become a pragmatic addition to my AI product toolbox because it helps me move from idea to evidence with less friction—while keeping engineering, design, and compliance in the loop.

“Claude Code for Product Managers explained: what it is, why it matters, and how it helps PMs prototype, validate, and move faster.” That line captures the essence. In practice, I use it to turn ambiguous problem statements into tangible artifacts—API stubs, SQL queries, test data, and lightweight prototypes—that sharpen conversation and accelerate decision cycles.

What is it in PM terms? A code-aware assistant that helps me prototype safely and quickly. I can generate example API calls, transform messy CSVs for retention analysis, draft instrumentation plans for Amplitude analytics, or spin up a mock service to validate an integration. Because it understands structure, it’s effective at scaffolding small utilities (e.g., a data cleaner or a CLI harness) that make discovery and validation faster.

Day to day, Claude Code reduces handoffs. If I’m exploring a new partner integration, I’ll have it produce a curl library and a Postman collection, then annotate each step with acceptance criteria and expected responses. When I’m shaping a feature, I lean on it to outline event taxonomies and feature flags so that engineering can wire telemetry without guesswork. For insights work, I’ll ask it to propose SQL for cohort, funnel, and retention analysis—always verifying against source schemas before anything touches production.

Speed is only useful when it improves signal quality. I anchor the workflow in continuous discovery: small hypotheses, thin-slice prototypes, and fast instrumentation. Claude Code helps me estimate A/B testing readiness (including minimum detectable effect), generate smoke tests for critical user paths, and structure an eval-driven development loop so we learn from every iteration. It also supports context window management by summarizing long PRDs into the few constraints a prototype must respect.

Governance matters. I apply AI readiness and AI risk management principles: never paste secrets or PII, isolate sandboxes, and log prompts as docs-as-code for auditability. I prefer a retrieval-first pipeline that feeds approved product docs, OpenAPI specs, and design tokens so generations stay grounded. When tools are integrated, I favor the Model Context Protocol (MCP) to constrain capabilities and maintain least-privilege access. Human-in-the-loop review is non-negotiable—especially for anything that might influence customer data or pricing.

The best outcomes show up in product trios. I’ll facilitate a live session with design and engineering: we co-create prompts, compare alternatives, and converge on a thin slice we can ship. That collaboration keeps us empowered, reduces interpretation drift, and turns Claude Code into an accelerant rather than a sidecar. Over time, the trio curates a reusable prompt library for PRD outlines, experiment checklists, and integration playbooks.

Getting started is straightforward: define a safe environment, assemble your authoritative corpus (requirements, specs, taxonomies), and codify a few high-value templates—API exploration, instrumentation plans, sandbox data generators, and acceptance tests. Track impact with simple, objective metrics: cycle time from hypothesis to instrumented prototype, time-to-first-signal, and the proportion of decisions made with data versus opinion.

There are pitfalls. Hallucinated fields can creep into API calls, schema drift can break generated queries, and “clever” refactors may miss edge cases. I mitigate this by grounding generations in current specs, asking for unit tests alongside any code, and validating against a staging environment before anyone talks about production. Treat Claude Code as a collaborator, not an oracle.

If your mandate is to learn faster, de-risk bets, and ship with confidence, Claude Code is worth adopting. Used thoughtfully, it compresses the distance between questions and answers, elevates product discovery, and lets teams validate more ideas with fewer meetings—without compromising on governance or quality.

Inspired by this post on Product School.

June 12, 2026
Beyond Black‑Box Scores: Custom AI That Elevates Trust & Safety Without Burnout

What do you do when off-the-shelf moderation scores aren't good enough—and the alternative is paying human contractors to spend their days reviewing traumatizing content at scale? I’ve wrestled with that exact trade-off in enterprise environments, and it’s why I was eager to unpack how custom AI can raise the bar on trust and safety without compromising accuracy, latency, or the well-being of our teams.

In this episode of Just Now Possible, I sit down with Nikki Marinsek (Data Scientist), Brian McCaffrey (Software Engineer), and Dan Means (Machine Learning Engineer) from Musubi, an AI-native trust and safety toolkit for content platforms. Musubi builds custom-trained ML models and LLM-powered moderation tools that adapt to each platform's unique policies—from dating apps to social networks to AI inference endpoints. As a product leader, I’m drawn to their blend of eval-driven development, agentic AI, and pragmatic deployment pipelines that actually meet real-world SLAs.

We walk through their full journey—starting with a first prototype on tabular data—then discovering the system was sometimes catching issues human moderators missed. That insight became a forcing function to formalize evaluation, calibrate thresholds, and design feedback loops that help humans and models converge. Just as importantly, they built a policy optimizer that uses agentic flows so non-technical trust and safety teams can iterate on LLM moderation policies without needing a data scientist in the room.

If you’ve ever had to balance latency, accuracy, and cost at scale, you’ll appreciate how Musubi tests trade-offs across traditional ML, embedding-driven classification, and LLMs. Their approach mirrors the patterns I expect in high-throughput stacks: cache and pre-compute where possible, contain worst-case latencies, and push evaluation tooling to customers so policy changes are safe, observable, and fast to deploy.

What resonated most with me is their core product strategy: put eval tools directly in customers’ hands. When teams can benchmark AI against humans, referee disagreements using “LLM as judge,” and make policy gaps visible, trust increases and operational drift decreases. That’s the foundation for durable product strategy in sensitive domains like content moderation, fraud management, and risk scoring.

Listen to this episode on: Spotify | Apple Podcasts

Guests: Nikki Marinsek, Data Scientist, Musubi; Brian McCaffrey, Software Engineer, Musubi; Dan Means, Machine Learning Engineer, Musubi.

In this episode: Why off-the-shelf moderation scores fail and how custom-trained models fix that; How Musubi combines traditional ML with LLMs for different moderation tasks; The discovery that AI can outperform human moderators—and how to communicate that to clients; Using AI as a judge to referee disagreements between AI and human decisions; How Musubi onboards new customers with "reverse demos"; What custom model training actually means: fine-tuning, feature engineering, and reusable deployment pipelines; The policy optimizer: an agentic flow that helps customers iterate on their LLM moderation policies; Why pushing eval tools directly to customers is a core product strategy; How Musubi is building flexible orchestration workflows for non-technical trust and safety teams.

From a product management lens, a few highlights stand out. First, the disciplined separation of concerns: use traditional ML for high-precision, low-latency pattern detection and LLMs for nuanced policy interpretation. Second, invest in golden sets and policy loops early so you can quantify improvement and avoid subjective debates. Third, productize customization—create reusable deployment pipelines, parameterized policies, and self-serve evaluation—so each customer’s “custom model” still scales like a platform.

I also appreciated the onboarding tactic of "reverse demos." Rather than a canned walkthrough, the team invites customers to bring real policies and edge cases, then instruments the workflow live. That move builds credibility, accelerates discovery, and surfaces the fastest paths to value—an approach I recommend whenever you’re selling complex AI workflows to non-technical stakeholders.

If you’re navigating cost and latency trade-offs, the conversation goes deep on techniques like embedding-driven classification, fine-tuning vs. training, and when to route decisions through LLM adjudication. My takeaway: treat the router, the evaluator, and the policy as first-class products. When those elements are observable and testable, you can raise quality without exploding compute costs or creating operational bottlenecks.

Resources & Links: Musubi — AI-powered trust and safety toolkit for content platforms. Maven AI Evals Course — AI evals course.

Chapters: 00:00 Meet the Team; 01:18 Why Everyone Wears Product; 02:32 What Musubi Builds; 04:51 AI for Human Moderation; 09:59 Adversaries and Asymmetry; 11:48 Early Days and Low Latency; 13:35 First Prototype Slice; 15:33 Traditional ML Meets LLMs; 19:52 Benchmarking Against Humans; 23:09 LLM as Judge and Policy Gaps; 29:53 From Prototype to Platform; 31:15 Customer Onboarding Reverse Demos; 36:08 Custom Models Per Customer; 38:05 Fine Tuning vs Training; 39:14 Embedding Driven Classification; 40:04 Cost and Latency Tradeoffs; 43:21 Productizing Customization; 49:16 Scaling Prototypes to Production; 51:58 Golden Sets and Policy Loops; 56:17 Coaching Customers Safely; 01:02:06 Gamified Feedback Signals; 01:06:19 Agentic Toolkit Roadmap; 01:09:05 Workflow Orchestration Future; 01:12:06 Wrap Up and Thanks.

Ultimately, this is a playbook for modern trust and safety: align your models to your policies, make evals a habit not an event, and empower non-technical teams with agentic workflows and transparent metrics. That’s how we move beyond black-box scores to systems we can measure, manage, and trust.

Inspired by this post on Product Talk.

June 11, 2026