Category: AI Strategy

How Snapbar Turned Crisis Into an AI-Native Photo Experience Revolution

What does it take to reinvent a 14-year-old company, not once, but twice? I ask that question often when I look at mature product organizations, because the hardest transformations rarely start with a clean slate. They start with real customers, legacy expectations, operational muscle memory, and a market that suddenly refuses to behave the way it used to.

Snapbar is a useful case study in that kind of transformation. The company began as a wedding photo booth side hustle, grew into a national events company, and then watched COVID wipe out the entire business overnight. As a product leader, I find that moment especially important because it separates teams that are attached to the current expression of their product from teams that understand the deeper customer need underneath it.

The deeper need was never just a physical photo booth. It was identity, participation, memory, brand engagement, and a shareable experience that people could take with them. When in-person events disappeared, Snapbar went from physical photo booths to a cross-platform virtual product built on WebRTC in spring 2020. That was not a cosmetic pivot. It was a first-principles rebuild under pressure.

I have seen many teams talk about innovation when conditions are favorable. Snapbar’s story is more interesting because the team had to innovate when the existing business model was unavailable. That kind of constraint can be clarifying. It forces product teams to ask: What job are we really doing for customers, and what parts of our current solution are merely historical artifacts?

The next reinvention came from generative AI. Pushed by declining repeat business, Snapbar dove deep into Stable Diffusion, custom LoRA fine-tunes on H100/H200 GPUs, and eventually a reasoning-model-powered generative image and video pipeline. What stands out to me is not simply that the team adopted gen ai. It is that they connected AI capabilities to a domain they already understood deeply: photography, events, brand activations, and experiential marketing.

This distinction matters. In product management, technology FOMO can lead teams to bolt AI onto workflows without a clear strategic advantage. Snapbar appears to have moved differently. They used 14 years of industry knowledge to identify where AI could change the experience itself, not just automate a back-office task or generate a novelty output.

The product evolution is a strong example of applied AI. Snapbar integrated Stable Diffusion 1.5 as their first generative AI model and ran custom LoRA fine-tunes on H100/H200 GPUs to produce brand-quality outputs nobody else in their space could match. That level of execution shows the difference between experimenting with a model and building a differentiated product system around it.

I also appreciate the way the team moved from negative prompts to reasoning model long-form prompts. In brand environments, creative control and safety control are not optional. A brand activation must feel imaginative, but it also has to remain on-message, inclusive, and predictable enough for a live event setting. Better prompt engineering becomes part of the product’s trust layer.

One of the most important product details is the meta-prompting pre-processing pipeline designed to ensure user likenesses, including non-obvious details like disabilities, are accurately represented in generated images. That is not a minor implementation detail. It reflects a more mature view of AI risk management, representation, and customer experience.

From my perspective, this is where product strategy and ethical technology intersect. Generative AI systems can easily flatten people into generic outputs. A thoughtful product team has to decide what fidelity means, what consent means, and how much control users and brands should have over the final artifact. Snapbar’s approach suggests that representation is not just a model-quality problem; it is a product-design problem.

Just Now Possible spotlights Snapbar’s journey from photo booths to AI-powered brand experiences, framing reinvention, creativity, and applied AI as the center of the conversation.

The company’s experiential marketing platform lets brands “world build” at conferences, trade shows, and live events by bringing fans into branded creative worlds. That phrase matters because it reframes the photo booth from a capture device into a participatory brand system. The user is not merely photographed. The user becomes part of a designed world.

I see this as a broader shift in product experience. Static brand impressions are giving way to co-created moments. Snapbar added participatory user inputs through Mad Lib-style prompts and prompt injection, turning photo experiences into co-creation moments between brands and their audiences. That is a more durable engagement loop than simply asking someone to pose in front of a branded backdrop.

The operational story is just as relevant for product and engineering leaders. Snapbar used Claude Code and Codex to build and ship features rapidly as a small bootstrap team, and developed a four-pillar agent orchestration framework: context, tools, verification, and workflows. I like that framing because it treats AI-assisted development as a system of work, not a magic shortcut.

In my own product leadership work, I keep coming back to the same lesson: AI workflows only become reliable when the team defines the surrounding operating model. Context determines whether the agent understands the problem. Tools determine what it can actually do. Verification determines whether the output is trustworthy. Workflows determine whether the capability compounds across the organization.

Snapbar is now building customer-facing “vibe coding” using the Claude Agent SDK so brands can configure and create experiences themselves within Snapbar’s platform. That is a meaningful product move. It shifts creation closer to the customer while keeping the workflow inside a controlled product environment. For brand teams, that could reduce dependency on custom service work while still preserving creative flexibility.

This is the kind of AI Strategy I find most compelling: not a generic claim that AI will transform everything, but a specific path from domain expertise to product capability to customer empowerment. Snapbar did not abandon its past. It converted years of event, photography, and brand knowledge into a new interface for generative AI.

The core lesson for product teams is clear. Reinvention does not always mean discarding the original business. Sometimes it means identifying the durable customer need, rebuilding the delivery mechanism, and then using new technology to expand what the experience can become. Snapbar’s journey from wedding photo booths to virtual WebRTC experiences to AI world building shows how a team can preserve its market intuition while changing nearly everything about the product surface.

For product leaders evaluating gen ai opportunities, I would take three practical lessons from this story. First, start with the customer experience, not the model. Second, treat brand safety, representation, and verification as product requirements from the beginning. Third, use agentic AI internally only when the team has a clear framework for context, tools, verification, and workflows.

Snapbar’s story resonates because it is not about chasing a trend. It is about a team using necessity, curiosity-led self-education, and disciplined product thinking to build something that feels native to the generative AI era. That is the difference between adopting AI and becoming AI-native.

Inspired by this post on Product Talk.

July 9, 2026
From Static Scores to Adaptive Customer Health Intelligence
Customer health should help a team change an account outcome, not merely describe it after the fact. That requires moving beyond a fixed score toward intelligence that detects meaningful changes, explains their likely significance, and supports timely intervention.

The supplied source frames this transition as a response to changing product usage, buyer behavior, and support patterns. Its larger implication is operational: customer health becomes a continuously examined hypothesis about adoption, value, risk, and expansion rather than a permanent formula embedded in a dashboard.

Static health fails when its assumptions stop matching the account

A conventional health score usually compresses several indicators into one status or number. This can make a portfolio easier to scan, but the simplicity conceals a critical dependency: the result is only as useful as the rules, weights, thresholds, and data behind it.

The source argues that those assumptions gradually diverge from reality as customer behavior and product usage change. A score may retain the appearance of precision even when it reflects an earlier version of the product, customer journey, or commercial relationship. The resulting problem is not simply stale data. It is model drift: the organization continues interpreting current accounts through assumptions that may no longer describe them.

This limitation becomes especially consequential when customer success teams are expected to protect Net Recurring Revenue (NRR) and improve retention analysis. A delayed score may confirm that adoption has weakened or support pressure has increased, yet arrive too late to influence the underlying outcome. Portfolio visibility is useful, but retrospective classification alone does not provide the cause, urgency, or appropriate response.

Adaptive intelligence connects signals, interpretation, and action

Adaptive customer health is better understood as a system than as a more sophisticated score. The source identifies behavioral analytics, anomaly detection, journey mapping, AI workflows, and risk scoring as capabilities that can reveal movement before a formal review or escalation makes it obvious. It also calls for a connected view spanning onboarding, adoption, support activity, value realization, and expansion potential.

Those elements perform different jobs. Behavioral analytics describes how engagement is changing. Anomaly detection calls attention to departures from an account’s expected pattern. Journey mapping places activity within a stage or intended path. Risk scoring estimates the significance of the combined evidence. Workflow then routes that interpretation to a person or process capable of acting on it.

The distinction matters because faster calculation is not necessarily adaptation. A fixed formula refreshed in real time can still reproduce obsolete assumptions. A genuinely adaptive approach must re-examine which changes are meaningful, compare signals in context, and make its reasoning visible enough for a team to judge. The useful output is therefore not just a revised number, but an intelligible account narrative: what changed, why it may matter, how urgent it appears, and what action deserves consideration.

Product and customer success need one behavioral model

The source positions product management and customer success as parts of the same operating system. That connection is essential because many health signals originate in the product, while their meaning often depends on commercial and relationship context. Product data can show a change in activation or adoption; customer success can add knowledge about expected value, organizational priorities, stakeholder changes, and renewal conversations.

Neither perspective is sufficient by itself. A decline in activity can be concerning, expected, or irrelevant depending on the customer’s journey and intended outcomes. Conversely, positive usage can coexist with unresolved support friction or weak value recognition. Combining product behavior with support and relationship context reduces the risk that one visible metric becomes a misleading proxy for the entire account.

This shared model also creates a feedback loop. Customer success teams can identify alerts that were useful, noisy, or missing important context. Product teams can use recurring patterns to examine onboarding, activation, and adoption barriers. The health system then becomes more than an account-ranking mechanism: it becomes a structured way to learn how product experience and customer outcomes interact.

Key takeaways
- A health score is only reliable while its underlying assumptions continue to reflect customer behavior and the product experience.
- Adaptive health combines signals across onboarding, adoption, support, value realization, and expansion rather than treating one metric as the complete account story.
- Anomaly detection and behavioral analytics become operationally useful when they are connected to context, urgency, and workflow.
- Product management supplies behavioral and journey insight, while customer success contributes relationship and outcome context.
- The practical test is whether the system helps a team choose an appropriate action while the account outcome remains changeable.
Accountable action matters more than algorithmic complexity

The source does not argue for removing human judgment. It explicitly retains a role for experienced customer success managers, executive conversations, and disciplined business reviews, while proposing that these activities should be informed by timely signals rather than retrospective summaries. This establishes a useful boundary: intelligence should augment account judgment, not disguise uncertain inferences as facts.

That boundary has design implications. Teams need to know which evidence triggered an alert, whether the evidence is complete, and how strongly it supports the proposed interpretation. They also need a way to record what action was taken and whether it helped. Without that feedback, an AI-assisted workflow can scale noise as easily as insight.

Evaluation should consequently focus on decision quality rather than dashboard sophistication. A useful system should help distinguish meaningful change from ordinary variation, reveal the factors behind a risk assessment, place the account within its journey, and connect the finding to an accountable next step. Its models and thresholds should also be reviewed as products, customer behavior, and business priorities evolve.

The next stage of customer health intelligence will be defined less by a universal score than by an organization’s ability to learn from changing behavior. Teams that preserve explainability, human review, and workflow accountability can make adaptation practical without mistaking automated confidence for customer understanding.

References
- Shivam.Consulting Blog — Why Static Customer Health Scores Are Failing Modern Customer Success Teams
July 3, 2026
How to Operate AI Customer Agents as a Reliable CX System
AI customer agents are expanding from answering routine questions toward handling complex workflows and potentially supporting more of the customer lifecycle. The operational challenge is no longer simply whether an agent can produce a plausible answer. It is whether the organization can keep that agent accurate, controlled, measurable, and ready whenever the business changes.

Taken together, the source reports point to a practical operating model: connect product releases to knowledge updates, test behavior before exposure, measure the full interaction rather than a narrow survey sample, and assign people to improve the system continuously. That turns an AI agent from a channel feature into managed CX infrastructure.

Key takeaways
- Agent reliability depends on a continuous train, test, deploy, and analyze cycle, not a one-time implementation.
- A product release is not operationally complete until the agent has current, unambiguous, and retrievable information about it.
- Pre-release evaluation should test realistic customer questions, policy conditions, system actions, and required human handoffs.
- Survey metrics remain useful, but conversation-level analysis provides broader visibility into answer quality, effort, sentiment, and recurring friction.
- Human roles increasingly shift toward knowledge stewardship, exception handling, policy design, evaluation, and cross-functional CX improvement.
Treat the agent as a product system, not a chatbot

The Pioneer 2025 report describes Fin 3 through four operating stages: training, testing, deployment, and analysis. It reports that Procedures combines natural-language instructions with deterministic controls for complex work, while Simulations is intended to test behavior before customers encounter it. The report also describes deployment across additional channels, including Slack and Discord, improvements to Voice, and analytics features such as CX Score Reasons and Topic Trends.

These are vendor-reported capabilities, but the underlying operating principle applies beyond one platform. An agent that can act in business systems needs more than fluent language generation. It needs explicit procedures, boundaries on what it may do, test cases that expose failure modes, controlled channel deployment, and evidence showing what happened after release.

The same report presents a longer-term Customer Agent vision built around roles, goals, persistent memory, business knowledge, and interoperability. That vision should be distinguished from currently reported product functionality. It nevertheless clarifies the governance challenge: as an agent gains continuity and operational reach, errors can travel across more stages of the customer journey. Ownership of objectives, data, permissions, escalation, and measurement therefore becomes part of CX design.

This also changes how success should be framed. Resolution volume is an operational output, but a dependable CX system must also answer whether the agent followed policy, used current knowledge, completed the intended action, recognized an exception, and left the customer with an acceptable amount of effort. Automation without those checks can move work while concealing deterioration in the experience.

Move agent readiness into the product release process

The NPI playbook focuses on a common source of agent failure: products change faster than their supporting knowledge. When a feature launches without usable documentation, the source reports that the agent may hand conversations to people just as launch-related volume rises. The resulting backlog is therefore not only a support problem; it is a release-readiness problem.

A stronger definition of done includes agent readiness. The NPI source recommends bringing support or knowledge specialists into product walkthroughs, product marketing kick-offs, and pre-release testing. It also calls for a named owner, whether an NPI manager, knowledge manager, support lead, or product operations owner. The title can vary, but accountability cannot be distributed so widely that nobody verifies readiness.

The required knowledge must be designed for retrieval as well as human reading. According to the source, documentation should include both internal feature names and the phrases customers actually use, expand acronyms, state plan and availability conditions explicitly, and reproduce the substance of screenshots or videos in text. This is important because information can be technically present yet remain difficult for an agent to retrieve or apply correctly.

Release work must also remove knowledge that a launch has invalidated. Searching related articles, macros, notes, and workflows can reveal stale or contradictory guidance. Duplicate content deserves particular attention: competing versions of an answer can create inconsistent agent behavior even when the newest article is accurate.

Testing then connects knowledge preparation to customer outcomes. The NPI playbook recommends assembling likely questions from launch content, beta feedback, and early support conversations; running them in the environment customers will use; rating the answers; correcting the underlying content or structure; and repeating the evaluation. Conditions such as phased rollout, plan eligibility, regional availability, and mandatory human escalation require explicit coverage rather than an assumption that the agent will infer the right behavior.

This creates a two-speed control model. Before launch, teams test expected questions and known edge cases. After launch, they watch real conversations for unexpected language, missing scenarios, or product behavior that the original documentation did not anticipate. The feedback should return to the release tracker, knowledge source, procedure, or product team according to the root cause.

Measure experience at conversation scale

Release evaluation shows whether an agent appears ready, but production measurement shows whether that readiness survives real customer behavior. The CX measurement source reports that CSAT captures less than 10% of conversations and that respondents tend to represent more extreme reactions. On that account, survey results leave a large unobserved middle and cannot by themselves explain whether dissatisfaction arose from service, product behavior, or policy.

The source describes an alternative in which AI evaluates every human and agent interaction across dimensions such as service quality, resolution, and customer effort. It reports that Intercom’s CX Score assigns interactions a score from 1 to 5, exposes reasons behind the score, and gives most teams roughly five times the coverage of CSAT alone. Those product-specific claims are reported by the source rather than independently verified here, but they illustrate the broader distinction between voluntary feedback and systematic conversation review.

Fuller coverage does not make direct customer feedback obsolete. CSAT can still capture what a customer chooses to say, while conversation analysis can detect repeated explanations, handoff friction, weak answer quality, unresolved intent, and neutral interactions that generate no survey response. The two signals answer different questions and should be interpreted together rather than forced into a single interchangeable benchmark.

New coverage also requires new baselines. The measurement source cautions against transferring an old CSAT target directly to a conversation-scoring system because the populations and methods differ. It recommends correlating the new score with operational measures such as first response time and time to close, then examining underlying attributes including answer quality, customer effort, and product feedback. Its illustrative targets of 80% for Fin support, 70% for human support, and 78% overall are examples derived from the scenario described in that article, not universal standards.

Segmentation is equally important. Complex, high-touch cases should not automatically be compared with transactional contacts, and aggregate results can hide a poorly performing topic or channel. Useful analysis separates agent and human conversations, examines topics and handoffs, and preserves context about case type. The most actionable output is not the score alone but a reason that can be routed to a responsible owner.

Build one improvement loop across CX, product, and knowledge

The sources approach AI customer agents from different angles: the Pioneer report emphasizes expanding capabilities and a broader customer-agent vision; the NPI playbook concentrates on release and knowledge readiness; and the measurement article addresses visibility after deployment. Their combined implication is that these activities cannot remain separate programs.

A low-quality interaction might originate in several places. The knowledge may be missing or contradictory, the procedure may express the wrong policy, the product may behave unexpectedly, the agent may fail to retrieve applicable information, or the case may require a human specialist. Conversation-level reasons help locate the problem, but the organization still needs a route from evidence to correction and then to re-evaluation.

That operating loop changes human work. Customer-facing specialists remain essential for sensitive, ambiguous, or exceptional cases, while also contributing customer language, testing scenarios, escalation criteria, and knowledge improvements. Product and engineering teams become accountable for the support consequences of releases. Knowledge teams manage information as production input, and CX leaders set objectives that balance resolution, effort, policy compliance, and service quality.

The most revealing opportunities may sit in interactions that are neither failures nor successes. Broader conversation analysis can surface answers that were technically acceptable but unnecessarily difficult, impersonal, or incomplete. Improving that middle ground requires more than tuning a model: it may require clearer documentation, a better workflow, a product fix, or a different escalation rule.

As agents acquire more roles, memory, knowledge, and access to business systems, CX operations will increasingly resemble product operations for a continuously changing service. Organizations that establish release gates, evaluation sets, conversation-level diagnostics, and unambiguous ownership will be better positioned to expand agent responsibility without allowing reliability to become an afterthought.

References
July 3, 2026
AI Product Leadership: Faster Learning, Safer Systems
AI-enabled product leadership is not primarily a contest to automate more work. The stronger opportunity is to shorten learning loops while improving the quality, traceability, and safety of product decisions.

Across the five source articles, a common operating model emerges: begin with bounded problems, connect AI to real customer evidence, define quality through domain expertise, and make safeguards proportional to the consequences of failure. This model applies both to internal product workflows and to customer-facing AI systems.

Move from an AI tool stack to an evidence system

The article on essential tools for product managers presents AI as a working layer across product intelligence, research, analytics, roadmapping, design, prioritization, and delivery. Its most useful implication is that tool selection should begin with the decision a team needs to improve, not with the number of AI features available.

A feedback summarizer, behavioral analytics platform, prototyping assistant, and requirements generator can each save time. Their strategic value appears when their outputs are connected: qualitative feedback helps explain observed behavior, behavioral evidence tests assumptions raised in interviews, and both inform prioritization. The product manager still has to reconcile customer pain, business outcomes, engineering effort, differentiation, and stakeholder expectations.

The practical guide to finding AI use cases reaches the same conclusion from a different direction. It recommends starting with a concrete item from everyday work, testing how AI might help, and studying the gap between the desired result and the output. It specifically proposes a 15-minute daily practice and treats an initially poor result as evidence about instructions, context, constraints, or model capability.

Together, these perspectives suggest two complementary levels of adoption. At the individual level, task-first experimentation builds judgment about what AI can do. At the team level, connected evidence workflows turn that judgment into a repeatable product operating system. Buying tools without the first creates shallow adoption; isolated personal experiments without the second produce scattered efficiency rather than organizational learning.

Use AI to deepen discovery, not to create distance from customers

The 2026 roadmap article frames roadmaps as portfolios of experiments involving products, learning methods, teaching models, and choices about what to stop doing. It argues that AI can reduce tedious discovery work and provide feedback on demanding skills, including interviewing, assumption testing, and opportunity mapping. At the same time, it warns against substituting agents or dashboards for human curiosity and direct customer contact.

That tension supplies an important boundary for AI-enabled discovery. Models can organize notes, identify recurring themes, critique an interview guide, expose possible confirmation bias, or compare evidence across sources. They cannot independently determine whether the team asked the right customers, understood the social context, or interpreted ambiguous language correctly. Those remain product and research judgments.

The safety-first consent coach described in the Override Labs article illustrates why context matters. According to that account, the nonprofit examined 2,000 Reddit posts per subreddit to validate demand and understand how vulnerable questions were expressed. The discovery material included uncertainty, shame, peer pressure, and the possibility that someone might be seeking permission rather than reflection. A conventional feature request or decontextualized summary could have obscured those conditions.

The cross-team review reinforces this point through other domains. It reports that former teachers at eSpark created evaluation rubrics based on how educators assess student work and enriched educational content with domain-specific metadata when generic embeddings produced weak matches. It also describes how local-government knowledge at Zencity changed the interpretation of sentiment, and how incident-response experience informed Incident.io’s investigation architecture. Across these examples, AI increased the importance of domain expertise because people still had to define what relevance, quality, and failure meant.

Let the consequence of failure determine the product architecture

Not every AI-assisted task needs the same controls. A weak draft of an internal stakeholder update can be reviewed and corrected cheaply. A response that could be interpreted as permission in a consent-related situation has a fundamentally different risk profile. Responsible product development begins by distinguishing those cases before selecting architecture or interaction patterns.

The Override Labs account offers the clearest high-stakes pattern. The team reportedly defined a "South star" around the worst outcome: a teenager using the product response as a green light for harmful action. The product therefore avoids giving a green-flag verdict. It runs deterministic risk classification before calling Claude, adjusts responses by risk tier, and uses a structure that validates, reflects, and invites further reflection. A licensed therapist contributed to the evaluation rubric, while positive masculinity coaches helped shape the tone.

The underlying principle is broader than that implementation. A generative model should operate inside a product-defined safety system rather than becoming the safety system. Product leaders can translate that principle into four design questions: what outcome must never be encouraged, which decisions require deterministic handling, when should generation be constrained or withheld, and which domain experts are qualified to judge the response?

The review of AI product teams adds another trust boundary: deciding when a system should admit that it does not know. This is both a model-quality issue and a product behavior. Teams need to specify what insufficient evidence looks like, what the interface communicates in that state, and whether the user should retry, provide more context, consult a person, or stop the workflow.

This risk-based approach avoids two unhelpful extremes. Applying high-stakes controls to every low-consequence drafting task can make experimentation needlessly heavy. Treating sensitive decisions like ordinary content generation can leave critical failure modes to probabilistic behavior. The appropriate control set follows the plausible harm, reversibility, affected population, and user’s ability to detect an error.

Make evaluation, privacy, and leadership part of delivery

The production-team review describes evaluation as an evolving operational capability rather than a final test. It reports that Stack Overflow ran about 50 experiments across five pods in three months, produced four versions of an AI-powered search product, and ultimately stopped that effort. Arize began building its Alyx agent before established agent frameworks were available, while eSpark’s former teachers learned to write evaluation code with LLM assistance. These are source-reported examples, not independently verified benchmarks, but they demonstrate how structured learning can support both shipping and stopping decisions.

Evaluation should therefore start when the use case is defined. Early rubrics can be simple: representative tasks, expected properties, unacceptable outputs, and a review process. As the product matures, teams can add risk tiers, regression sets, production observations, and explicit release criteria. The goal is not to claim that a model is universally good; it is to establish whether a particular system performs acceptably within a bounded workflow.

Privacy belongs in the same product definition. The consent-coach article reports that the service uses no accounts, cookies, or cross-session tracking. That choice limits conventional retention analytics, but it also supports the trust required for a sensitive interaction. It shows that less data can be a deliberate product feature when identification or surveillance would discourage honest use.

Leadership determines whether these practices persist. The roadmap article argues that training alone does not change an organization when leaders continue to reward old behaviors. Its proposed learning model combines on-demand material, AI-generated feedback, coaching resources, and human support. The practical-use-case article similarly recommends peer demonstrations and structured practice. Both suggest that AI readiness is a management system: teams need permission to experiment, shared examples, quality standards, and leaders who reinforce evidence-based behavior.

Key takeaways
- Start with a bounded task and a defined outcome; use repeated practice to learn where AI adds leverage and where it fails.
- Connect research, feedback, behavioral data, prioritization, and delivery so that AI improves decisions rather than producing isolated artifacts.
- Keep direct customer contact and domain expertise at the center of discovery, synthesis, and quality judgment.
- Define the worst credible outcome before designing a customer-facing AI experience, then match controls to that risk.
- Build evaluation and privacy into the product operating model, including criteria for refusing, escalating, or admitting uncertainty.
- Measure AI leadership by better learning and safer outcomes, not by tool count, output volume, or automation alone.
Building the next product operating rhythm

The next step for product organizations is not a universal AI playbook. It is a disciplined rhythm in which teams choose a real problem, gather contextual evidence, define acceptable and unacceptable behavior, test a bounded intervention, and revise or stop it based on results. As AI capabilities change, that rhythm can remain stable. It gives product leaders a way to pursue faster learning without treating speed as a substitute for responsibility.

References
July 3, 2026
Connecting Product Analytics, Attribution, and Growth Decisions
Connected product analytics is not simply a larger collection of events, dashboards, and campaign reports. Its practical value comes from preserving the context behind customer behavior, applying consistent definitions, and carrying trustworthy insights into the systems where teams make decisions.

The four source articles describe complementary parts of that operating model: journey-aware attribution, governed product data, AI-assisted analysis across tools, and continuous measurement. Combined, they offer a framework for turning scattered signals into more defensible growth decisions.

Key takeaways
- Attribution becomes more informative when relevant campaign, session, and product context remains connected to later outcomes.
- Persisted context can reveal associations across a journey, but it does not by itself prove that a touchpoint caused a conversion.
- Naming standards, ownership, metadata, and shared customer definitions determine whether connected analytics can be trusted.
- AI agents and connectors can reduce the effort required to investigate and communicate insights, provided permissions and analytical boundaries are explicit.
- Growth improves through a repeatable learning loop that connects observed behavior to a decision, an intervention, and subsequent measurement.
Attribution improves when journey context survives the final click

The source on persisted properties challenges the idea that the last recorded interaction adequately explains a conversion. It reports that customer decisions may be shaped by activity distributed across sessions, channels, campaigns, and product experiences. In its examples, an e-commerce purchase may follow product discovery, promotions, and cart activity; a financial-services outcome may depend on education, trust-building, eligibility checks, and compliance-sensitive steps; and a B2B lead may emerge after product tours, comparison pages, demos, onboarding interactions, stakeholder reviews, and CRM touchpoints.

Persisted properties address part of this measurement problem by retaining meaningful context as a user continues through a journey. This gives analysts more than the attributes attached to the final event and supports questions such as which acquisition context is associated with later activation, which discovery experience precedes stronger conversion, or which onboarding path appears among retained users.

That richer context should not be confused with automatic causal proof. Attribution assigns or interprets credit according to available data and a chosen analytical approach. A recurring touchpoint may be a useful signal, a proxy for user intent, or an actual contributor to an outcome. Connected journey data makes those possibilities easier to investigate, while controlled experiments and other appropriate evaluation methods remain necessary when a team needs to establish whether changing a touchpoint changes the result.

The practical shift is therefore from asking which interaction deserves all the credit to asking which sequence of interactions warrants attention. That framing is more useful for product roadmaps, campaign investment, onboarding design, and retention analysis because it treats conversion as the outcome of a journey rather than an isolated click.

Data governance supplies the shared meaning behind every signal

More connected data creates more analytical value only when teams agree on what the data represents. The Pendo administration source emphasizes naming conventions, ownership rules, and review cycles for pages, features, segments, guides, and reports. It also describes visitor, account, and product metadata as a strategic asset that should reflect concepts such as onboarding stage, plan type, activation, customer-success motion, and retention.

The marketing analytics source approaches the same requirement from an organizational angle. It argues that analytics works best as a shared language across product, marketing, sales, and customer success. Instead of allowing each function to interpret campaign and product signals independently, teams can align around customer journeys, funnel behavior, and the points at which users find value or leave.

Together, these sources show that the semantic layer is as important as the technical connection. A campaign label, user segment, account tier, activation event, and retention definition must remain intelligible when they move between an analytics platform, a CRM integration, a product report, or an AI-assisted workflow. Otherwise, a connected system can distribute ambiguity more efficiently without improving judgment.

Governance also affects interventions, not just reports. The Pendo source recommends contextual and concise in-app guides, product tours, and tooltips tied to measurable outcomes. This connects the measurement layer to the product experience: the same governed definitions used to identify friction should inform who receives guidance, what behavior the guidance is intended to change, and how the result will be evaluated.

AI connectors reduce workflow friction but do not repair weak analytics

The agent-connectors source extends connected analytics beyond dashboards. It describes an agent working across tools already used by product, analytics, and go-to-market teams, allowing context, analysis, and action to be brought into a more unified interaction. Its central benefit is operational: people can spend less effort moving information between tabs and systems while maintaining the flow of an investigation.

The marketing source similarly presents AI as most useful when paired with behavioral analytics, customer context, disciplined measurement, positioning, and a clear go-to-market strategy. In that account, AI workflows improve the scale and speed of judgment; they do not create durable growth independently of a sound measurement practice.

This distinction matters because an agent can make an answer easier to obtain without making its underlying evidence more reliable. If event definitions conflict, metadata is incomplete, or attribution assumptions are hidden, a connected agent may produce a fluent response to the wrong question. The connector source therefore places importance on permissions, appropriate context, governance, and boundaries alongside prompt design.

A well-designed workflow should preserve the path from a business question to the supporting behavioral evidence. It should also make clear which system supplied the context, which segment or journey definition was used, and whether the result is a descriptive association, an attributed outcome, or evidence from a stronger evaluation. That transparency helps an agent accelerate analysis without becoming an unexamined source of truth.

A connected growth loop joins evidence, intervention, and learning

The sources converge on a continuous operating loop even though each enters it at a different point. Persisted properties preserve the journey context needed to form a better question. Governance and metadata make the relevant users, accounts, features, and outcomes consistently identifiable. Behavioral analytics helps teams locate meaningful movement or friction. Product guidance, campaigns, positioning changes, and go-to-market decisions then become interventions whose effects can be measured.

The Pendo source makes this learning loop explicit by recommending that initiatives record the expected behavior, the observed result, the change in the customer journey, and the team’s next response. The marketing source adds that product, marketing, sales, and customer success should use those findings collectively. The agent-connectors source supplies a potential interface for carrying the analysis across their tools, while the attribution source supplies the longitudinal context needed to avoid judging the intervention solely by the final interaction.

This model also clarifies what a useful growth insight looks like. It is not merely a rising metric or a generated explanation. It connects a defined audience and journey to an observable outcome, states the limits of the attribution, identifies a decision the organization can make, and establishes what should be measured afterward. That standard directs attention toward learning and resource allocation rather than dashboard activity.

The next stage of connected analytics will depend less on adding isolated reports and more on maintaining reliable context as questions move across teams and tools. Organizations that preserve that context, govern its meaning, and test the decisions made from it will be better positioned to turn analytics and AI into a durable growth capability.

References
July 3, 2026
Behavioral Analytics for AI Agent Activation and Retention
AI agent growth is not simply a matter of attracting more users or generating more conversations. The central product question is whether people reach a useful outcome quickly enough to return, and whether the organization can respond intelligently when that journey breaks down.

The two source accounts describe complementary parts of that challenge. The Pendo account focuses on measuring and improving the path from first use to recurring engagement, while the Amplitude account focuses on turning observed behavior into workflows across product and go-to-market systems. Together, they suggest an operating model in which analytics first identifies meaningful behavior and then helps teams act on it.

Treat the agent as a measurable product experience

An AI agent can appear busy without becoming valuable. Conversation counts, prompt volume, and feature exposure show activity, but they do not establish that users completed meaningful work. Behavioral analytics becomes more useful when the agent is treated as an end-to-end product experience rather than an isolated interface.

The Pendo account describes mapping the journey from activation and a first successful task through repeat usage and habit formation. It also reports that the team defined stickiness around the agent’s jobs to be done instead of relying on an unspecified generic engagement measure. That distinction matters because a meaningful return pattern depends on the work the agent is intended to support.

The Amplitude account extends the same reasoning beyond analysis. It describes agents operating on verified product events, including high-intent milestones, changes in feature adoption, and signals associated with churn risk. In this model, instrumentation is not merely a reporting layer. It supplies the evidence used to trigger a subsequent decision or workflow.

A practical measurement chain therefore begins with eligibility and exposure, continues through an attempted interaction and a verified first success, and then examines whether users achieve additional useful outcomes over later sessions. The exact events must reflect the agent’s purpose. The durable principle is to measure completed value, not just interface activity.

Define activation as the first meaningful success

Activation is most informative when it marks a result that demonstrates the agent’s value. Opening the agent, viewing a suggested prompt, or sending a message may be necessary steps, but none necessarily proves that the user accomplished the intended task.

Pendo’s account reports that activation contained unnecessary cognitive load and that the first-session path did not consistently lead users to a quick win. The reported response included simplifying onboarding, clarifying prompts, and using in-app guidance to make valuable capabilities easier to recognize. This connects activation analysis directly to product design: when users stall before a first success, the remedy may involve reducing choices, clarifying expectations, or improving contextual guidance rather than adding more agent functionality.

Journey analysis should separate several different failure modes. A user who never starts may not understand the value proposition. A user who starts but abandons the task may encounter interaction friction. A user who receives an answer but does not act on it may lack confidence, context, or a clear next step. Combining these outcomes into one conversion rate would hide the product decision each one implies.

Activation should also be connected to the behavior that follows it. If an event labelled as success has no observable relationship with later value, it may be a convenient instrumentation point rather than a meaningful milestone. Behavioral cohorts can help compare subsequent engagement among users who reached different early outcomes, although those relationships should initially be treated as diagnostic evidence rather than proof of causation.

Measure retention as repeated value, not raw frequency

Retention analysis asks whether users continue to obtain value after activation. For an AI agent, that requires more context than a simple count of returning users. A return can indicate trust and usefulness, but it can also reflect an unresolved task, repeated correction, or a workflow that unnecessarily forces the user back.

The Pendo account presents stickiness as a proxy for trust and reports a 61% increase after the team established Agent Analytics and ran a series of product experiments. The same source associates stronger return behavior with proactive anticipation of intent and associates context-rich interactions, supported by timely nudges and in-app guides, with deeper engagement over later sessions. These are reported findings from one product account, not an independently verified benchmark for other agents.

The more transferable lesson is methodological. Teams can segment retention by the early behavior users completed, the type of task attempted, and the context surrounding the interaction. They can then examine whether retained users are repeating successful work, expanding into additional useful tasks, or merely revisiting the same point of friction.

This approach also guards against optimizing stickiness in isolation. Frequent use is desirable only when it reflects repeated useful outcomes. Where the agent’s job is to resolve work efficiently, fewer interactions may sometimes represent a better experience than a longer conversation. The retention definition must therefore stay anchored to the user’s intended result.

Turn behavioral signals into controlled interventions

Analytics creates leverage when it changes what the product or organization does next. The sources cover two levels of intervention. Pendo describes changes inside the experience, such as onboarding simplification, prompt clarification, contextual guides, tuned triggers, and tighter feedback loops. Amplitude describes workflows that cross system boundaries, such as initiating outreach for churn risk, triggering experimentation when adoption falls, activating users after high-intent milestones, and updating CRM records.

These approaches are complementary. In-product interventions can help a user complete the current journey, while cross-functional workflows can coordinate actions that require product, sales, or customer-success involvement. The behavioral signal should determine which response is appropriate: interface friction calls for a product change, an unmet need may call for research, and an account-level risk signal may justify a carefully governed human follow-up.

Automation does not remove the need for experimentation. Pendo reports using A/B tests to evaluate changes, while the Amplitude account emphasizes success criteria, governance guardrails, observability, iteration, and aligned performance measures. A sound operating loop combines those ideas: define the target behavior, verify the underlying events, choose an intervention, test its effect, monitor unintended outcomes, and retain only changes that improve the intended user result.

That loop is especially important when an agent both interprets behavior and initiates action. Event quality, ambiguous thresholds, or drifting agent performance can otherwise scale an incorrect decision. Human ownership, visible workflow history, and clear evaluation criteria help distinguish useful orchestration from automated noise.

Key takeaways
- Define activation around a verified first useful outcome, not merely opening the agent or sending a prompt.
- Analyze each stage between exposure, attempted use, successful completion, and later return so different forms of friction remain visible.
- Interpret retention through repeated value and task context; activity alone is not sufficient evidence of trust.
- Use behavioral cohorts to generate hypotheses, then apply controlled experiments before treating an observed relationship as causal.
- Match interventions to the signal: improve the experience when friction is local, and use governed cross-functional workflows when follow-through spans multiple systems or teams.
- Monitor data quality and agent performance because automated actions can amplify both accurate and inaccurate interpretations.
The next stage of AI agent maturity will depend less on adding visible capabilities and more on connecting meaningful outcomes to disciplined follow-through. Teams that can measure the first win, recognize repeated value, and govern the actions between them will be better positioned to turn agent adoption into durable product behavior.

References
- Shivam.Consulting Blog – Stop Guessing: Deploy AI Agents That Act on Real User Behavior with Amplitude Workflows
- Shivam.Consulting Blog – Inside the 61% Stickiness Lift for Pendo’s AI Agent: My Agent Analytics Playbook
June 23, 2026
How I Make AI Agents Speak Like Our Team: A Conversation Design Playbook That Lifts CSAT

If nobody on our team trains the Agent on how to communicate, it will sound like an LLM when it speaks to customers—because it is one. I never want a customer to feel like they’re talking to a machine that doesn’t get them. That’s why I treat conversation design as a core product capability, not an afterthought.

Conversation design is an emerging discipline in AI-first support teams built to solve this exact problem. In practice, I make someone explicitly own how the Agent communicates—tone, structure, level of detail, customer experience, and the handoff and escalation process—because that’s where trust is won or lost.

When there’s no clear owner and no explicit guidance, the Agent starts making its own choices. I’ve seen it over-explain when a short answer would do, reply in a flat tone when a customer is frustrated, or trigger a handoff too late. None of those are model problems; they’re design problems.

The cost is measurable. Customers who get awkwardly structured responses won’t trust the answer—even when it’s accurate—so they escalate to a human to hear the same thing phrased differently. Others will skip the Agent entirely. And when the Agent does hand off, a poor transition means the support rep inherits a frustrated customer. Every one of these outcomes is avoidable; conversation design exists to prevent them.

I’ve seen A/B tests where a warmer, more conversational opening message meaningfully lifted customer satisfaction—CSAT moved from 72.8% to 78.4%. A single design change, applied to the very first message, drove a measurable difference. That’s the kind of leverage I look for as a product leader.

Here’s the scope I use when I talk about conversation design—five areas that shape the customer experience end to end:

1) Tone and personality: Define the Agent’s voice, level of detail, and how formal or casual it should sound—and specify where that register adapts to the situation (for example, urgent access issues versus exploratory product questions).

Design how your AI agent talks. Set tone, style, and product naming rules, then preview replies instantly. Clear callouts showcase brand voice consistency and flexible formatting so your bot communicates like your team.

2) Response structure: Ensure the Agent matches the level of detail to the customer’s request, keeping answers tight when the ask is simple and expanding only when complexity demands it.

3) Handoff logic: Decide when to escalate, how to communicate the transition, and what context to carry over so the human teammate can help immediately without rework.

4) Interaction flow: Map how a conversation progresses—clarifying questions, answers, resolution, or handoff—and design for smooth pivots when customers change direction.

5) Response quality: Go beyond technical correctness to ensure answers feel clear, helpful, and on-brand. Accuracy without clarity erodes trust.

To put this into practice, I start with the feel of the conversation. Before tuning individual responses, I write down one tight paragraph describing the Agent’s voice. I don’t need a full brand bible—just a north star I can use to make consistent decisions about tone. The voice stays consistent, while the register adapts to the context: a locked-out customer needs directness and speed; a feature explorer might value more context and examples.

I design the handoff with extreme care because it’s one of the highest-friction moments. Customers shouldn’t have to re-explain anything. The support rep should receive the full conversation history, the underlying context, what the Agent already tried, and why the escalation happened. Even the phrasing matters—“Let me connect you with a teammate who can help with this” feels very different from a silent handover.

The new CX Score adds context to every conversation: a donut chart surfaces drivers like policy feedback and effort, while a side panel explains why this interaction earned a 3 based on signals from an AI agent chat.

I also build a failsafe. If the Agent can’t resolve the issue cleanly, a graceful fallback still gives the customer a smooth experience. A customer might be frustrated with AI at that point, but a well-handled transition can turn that around.

Follow-ups deserve the same rigor as handoffs. If someone drops mid-conversation—with the Agent or a human—how do we reach back out to confirm they got what they needed? Most teams miss this moment; customers don’t.

Another common pitfall is over-explaining. The Agent has access to a lot of information, and left unguided, it will overshare. The fix is simple: match the answer’s depth to the question. A password reset shouldn’t take three paragraphs; a complex integration might. When there’s more to offer, the Agent should ask before expanding.

I also design for the conversation the customer is actually having—not the script I wish they’d follow. Customers change direction, stack questions, or bring up unrelated follow-ups. The Agent should pivot with them, not force them back into a rigid flow. I also consider whether flows vary by channel and whether different segments merit distinct experiences.

On the instruction side, I keep guidance short. Teams often react to edge cases by adding more rules until the LLM is parsing paragraphs before it can reply. I’ve seen it everywhere. My rule: if it’s about content or information, it belongs in the knowledge base. If it’s about tone or handling specific situations, it belongs in the Agent’s instructions. “Be direct about pricing” does more than a paragraph explaining the philosophy behind your pricing communication strategy.

If you’re using Fin, much of this work happens in Guidance. It’s where conversation design takes shape, helping you define how the Agent should sound, how much it should say, and how it should respond in different situations.

On a crisp grid, 'Blueprint' appears as editable vector paths, underscoring a methodical plan. The image promotes the AI Agent Blueprint—a framework to launch and scale customer service automation with confidence.

Most teams won’t hire a dedicated conversation designer on day one—that’s fine. But someone still needs to own the Agent’s communication, even if it’s part of an existing role. I’ve often seen this start within support operations or knowledge management. As the Agent scales to more conversations, the responsibility becomes formal—and eventually becomes a dedicated role.

Here’s how I’d start, step by step:

1) Name an owner. Make accountability explicit; it doesn’t have to be a new hire.

2) Pick one conversation type that isn’t landing well. Look for cases where the Agent answered correctly but the customer still escalated or left negative feedback. If you’re using Fin, CX Score can help you surface these; it shows which topics and conversation types are scoring poorly and why, so you can see whether the issue is answer quality, customer effort, or something else.

3) Audit the Agent’s instructions. If they’ve grown beyond a few focused rules, trim them. Move content into the knowledge base and keep instructions focused on behavior.

4) Fix your worst handoff. Review a handful of conversations that escalated. Did the customer have to repeat themselves? Did the rep have enough context? Redesign that single transition first.

The impact of these small improvements compounds. A warmer opening can lift CSAT, trimming instructions makes responses sharper, and a better handoff prevents reps from inheriting frustrated customers. None of this requires new knowledge—just someone paying close attention to the conversation itself and designing it with intention.

Inspired by this post on The Intercom Blog.

June 18, 2026
AI Inference Economics: Optimize for Value, Not Cost
AI inference economics cannot be reduced to the price of a model call. The financially relevant question is whether a change in model, latency, caching, or token use improves total product value after its effects on conversion, retention, support, and revenue are included.

A reported decision to reject a projected $2 million in inference savings illustrates the distinction. The supplied source describes lower infrastructure costs alongside weaker downstream product signals, making the proposed optimization look attractive in a FinOps report but less compelling at the business level.

The correct unit of analysis is the customer outcome

Cost per request is useful for operating an AI product, but it is not a complete measure of its economics. A cheaper request can still be expensive if it makes a user more likely to abandon a session, fail a task, contact support, or leave the product.

The source article reports that routing traffic to lower-cost options produced immediate cloud cost optimization. It also associates small increases in time to first token with greater session abandonment, subtle quality declines with lower task completion, and weaker performance in support deflection. According to the account, the resulting revenue exposure exceeded the projected expense reduction.

This reframes inference efficiency as a value equation. Direct serving cost belongs on one side; incremental conversion, retained revenue, successful task completion, and avoided support demand belong on the other. The decision should be based on the net effect rather than whichever metric is easiest to retrieve from a cloud bill.

Cost, latency, and quality form a coupled system

Model cost, response speed, and output quality are often managed as separate workstreams. In practice, changing one can move the others. A smaller or cheaper model may reduce inference expense while changing answer quality. More restrictive token limits may shorten responses but remove information needed to complete a task. Caching may improve both cost and speed for repeatable requests, yet become unsuitable where fresh or highly contextual output matters.

The source argues for treating these variables as one product system. That view prevents a local optimization from being mistaken for an overall improvement. It also makes latency distributions more informative than a single average: even when aggregate performance appears acceptable, slower experiences within particular workflows may coincide with abandonment or failed completion.

The same principle applies to quality. A model-level score matters only insofar as it represents what users need from the workflow. For a support agent, that might involve resolving an issue without escalation. For another product experience, it might involve completing a task, activating a feature, or continuing to use the service. Business instrumentation gives technical measures an economic interpretation.

Experiments must detect product harm, not just cost movement

The reported evaluation combined eval-driven development with A/B testing and defined success through conversion, retention cohorts, and Net Recurring Revenue rather than cost per call alone. It also used minimum detectable effect calculations to determine whether the tests had enough statistical power to reveal meaningful changes in latency and answer quality.

That approach suggests two complementary layers of evidence. Evaluations can identify whether model behavior changes on representative tasks, while controlled product experiments can show whether those changes matter to users and the business. Neither layer is sufficient by itself: an offline quality score may miss behavioral consequences, and a topline business metric may conceal the mechanism behind a regression.

Guardrails are especially important when the expected saving is immediate but the product damage may emerge later. Infrastructure spend can fall as soon as traffic moves. Retention and recurring-revenue effects may take longer to appear. Conversion, task completion, session abandonment, support deflection, and cohort retention therefore provide signals across different time horizons.

The evidence supplied here is one first-person case account, not independent corroboration. Its projected $2 million saving, observed correlations, and business conclusion should consequently be treated as case-specific rather than universal benchmarks. The transferable value lies in the measurement framework, not in assuming that every higher-cost model will produce a better commercial outcome.

Key takeaways
- Evaluate inference changes against total product value, including conversion, retention, support demand, and recurring revenue.
- Measure cost, latency, and AI quality together because an intervention in one dimension can alter the others.
- Pair task-level evaluations with controlled product experiments and size tests to detect economically meaningful regressions.
- Apply optimization selectively: a technique is valuable where evidence shows that it lowers cost without harming the customer outcome.
A selective optimization roadmap

The alternative to indiscriminate cost cutting is not unlimited inference spending. The source describes a balanced roadmap built around targeted caching where experiments showed no adverse outcome, dynamic routing for task-specific workloads, and stronger observability to detect quality regressions early.

Each method addresses a different part of the economics. Targeted caching can remove redundant work in stable interactions. Dynamic routing can reserve more capable models for tasks that justify them while sending simpler work to less expensive paths. End-to-end observability can connect routing, model, token, latency, and quality data with the behavior that follows.

This also clarifies governance. FinOps teams can continue applying pressure to unit costs, while product teams define outcome guardrails and analytics teams verify the net effect. A proposed saving becomes ready for broader rollout only when the organization can see both the expense reduction and the customer or revenue impact.

As AI products scale, the strongest operating discipline will be selective rather than reflexive: spend less where evidence supports it, invest more where inference creates measurable value, and revisit routing decisions as workflows and user behavior change.

References
- Shivam.Consulting Blog — Why I Rejected $2M in AI Inference Savings to Protect Conversion, Retention, and Revenue
June 17, 2026
How I Use Novus, the First Product Agent, to Turn Rapid Releases into Measurable Wins

In a world of relentless CI/CD and accelerating release trains, product leaders like me can’t afford lagging signals or fuzzy readouts on what’s truly moving the needle. I need immediate, trustworthy feedback that connects code shipped to outcomes achieved and customer value created.

Coding agents compress weeks of development into hours, but the faster your codebase changes, the harder it is to know what’s actually helping end-users.

That tension is exactly why I brought Novus into my product toolbox. To keep up with the pace of development, over 600 product teams are already using Novus, the first-of-its-kind product agent, to automatically set itself up, monitor product data, and tell you what to do next.

From my chair, that promise matters only if it translates into clear decisions. With Novus, I’ve been able to tighten the loop between experimentation and learning: it pairs eval-driven development with behavioral analytics and observability so I can see how a release influences activation, engagement, and retention—without spelunking through fragmented dashboards. The agentic AI backbone reduces the manual stitching I used to do across events, cohorts, and funnels, letting me focus on prioritization and product strategy instead of report wrangling.

Day to day, Novus fits naturally into our AI workflows. It surfaces anomalies early, clarifies trade-offs, and frames next-best actions in the language of outcomes. Because it plugs into a unified analytics platform approach, I can maintain continuous discovery at scale while preserving the rigor of Agent Analytics: hypotheses are explicit, telemetry is consistent, and results are traceable. That’s the operating cadence I expect from modern product management leadership.

If your roadmap moves faster than your learning loops, a product agent can be the missing link between speed and certainty. Novus helps me convert rapid releases into measurable wins, keeping the team aligned and confident about what to build next—and just as importantly, what to stop doing.

Inspired by this post on Pendo – Best Practices.

June 17, 2026
Salesforce to Acquire Fin for ~$3.6B: Powerful AI Synergy, Product Strategy Takeaways

I’m processing a milestone moment for SaaS, AI strategy, and product leadership. One statement captures the news with clarity: “We’re excited to share that we just signed an agreement for Salesforce to acquire Fin for ~$3.6B. The transaction is expected to close in the fourth quarter of Salesforce’s fiscal year 2027.” As a product leader, I see this as a high-conviction bet on agentic AI, Customer Agents, and CRM integration at massive scale.

The backstory matters, and it’s remarkable: “Fin started as Intercom 15 years ago. We changed our name to cap our transformation just weeks ago. We were a darling of the SaaS era and invented so many of the patterns you see in software today. Nearly four years ago, in need of a reboot, we jumped on weeks-old modern LLMs to create and define the category we know as Customer Agents today.” That arc—from SaaS pioneer to LLM-powered category creator—illustrates how bold pivots, shipped with urgency and clear product strategy, can reset the trajectory of a company and a market.

From a product management lens, this deal reinforces a few truths: category creation rewards those who move first with conviction; “reboots” succeed when they’re anchored in genuine customer value; and modern LLMs, applied through disciplined roadmapping and eval-driven development, can unlock step-change outcomes in customer support ai strategy and product-led growth. It also signals the rising centrality of agentic AI and operational AI workflows inside the CRM.

The leadership dimension is just as instructive. As the announcement framed it: “Salesforce invented modern software and SaaS. And Marc Benioff is like the final boss of tech founder CEOs. In seat for 27 years, he’s one of the last of his era. Still pushing, pivoting, placing big bets.” That ethos—placing big, principled bets while adapting the operating model—sets the tone for what sustained product management leadership looks like at scale.

Customer continuity and acceleration are clearly emphasized: “To our customers: Over the past few years we’ve been shipping intensely. Including recently our groundbreaking model, Apex, and our paradigm-defining internal agent, Operator. With the resources of Salesforce this will only accelerate. And yet little will practically change. I’ll still be CEO, Des will still be running R&D, we’ll both still be committed to continuing to lead this category. Thank you very sincerely and deeply for your belief in us.” For practitioners, the signal is strong: continued focus on shipping, sharper execution readiness, and tighter integration paths inside the Salesforce ecosystem.

Smiles, clinking glasses, and a roundtable toast in a cozy private room capture the energy of a big day—celebrating Salesforce's definitive agreement to acquire Fin and the teams joining forces for what's next.

There’s a human heartbeat here too: “While this is not the end, it is a major, pivotal, special, and emotional moment for us.” Moments like this remind me that building enduring products is equal parts craft and courage—powered by teams who commit to the long game, navigate uncertainty, and still ship relentlessly.

Strategically, I expect near-term priorities to center on secure data flow and governance, deep CRM integration, and unifying telemetry for Agent Analytics across channels. On the roadmap, I’d anticipate tighter alignment between LLM safety, retrieval-first pipelines, and enterprise-grade observability—plus thoughtful go-to-market strategy enabling sales-led growth to complement product-led growth. The real unlock comes when Customer Agents are natively orchestrated with Service, Sales, and Marketing workflows—measured with clear outcomes vs output OKRs and reinforced by robust knowledge management.

For fellow product leaders, the takeaways are actionable: define category boundaries with crisp value propositions; balance speed with governance; invest in eval-driven development and continuous discovery; and keep your product trios aligned around measurable customer outcomes. Above all, build the operating cadence—metrics, rituals, and talent—that lets you compound small wins into durable differentiation.

And I appreciate the spirit of this closing line: “And now, time to get back to work. See you at our next product launch in a few weeks. (:” That’s the mindset that turns a headline into execution: celebrate briefly, then ship the next proof point.

Inspired by this post on The Intercom Blog.

June 15, 2026
Claude Code for Product Managers: Accelerate Prototypes, Validate Faster, Ship with Confidence

I build products under constant pressure to learn faster without breaking trust. Claude Code has become a pragmatic addition to my AI product toolbox because it helps me move from idea to evidence with less friction—while keeping engineering, design, and compliance in the loop.

“Claude Code for Product Managers explained: what it is, why it matters, and how it helps PMs prototype, validate, and move faster.” That line captures the essence. In practice, I use it to turn ambiguous problem statements into tangible artifacts—API stubs, SQL queries, test data, and lightweight prototypes—that sharpen conversation and accelerate decision cycles.

What is it in PM terms? A code-aware assistant that helps me prototype safely and quickly. I can generate example API calls, transform messy CSVs for retention analysis, draft instrumentation plans for Amplitude analytics, or spin up a mock service to validate an integration. Because it understands structure, it’s effective at scaffolding small utilities (e.g., a data cleaner or a CLI harness) that make discovery and validation faster.

Day to day, Claude Code reduces handoffs. If I’m exploring a new partner integration, I’ll have it produce a curl library and a Postman collection, then annotate each step with acceptance criteria and expected responses. When I’m shaping a feature, I lean on it to outline event taxonomies and feature flags so that engineering can wire telemetry without guesswork. For insights work, I’ll ask it to propose SQL for cohort, funnel, and retention analysis—always verifying against source schemas before anything touches production.

Speed is only useful when it improves signal quality. I anchor the workflow in continuous discovery: small hypotheses, thin-slice prototypes, and fast instrumentation. Claude Code helps me estimate A/B testing readiness (including minimum detectable effect), generate smoke tests for critical user paths, and structure an eval-driven development loop so we learn from every iteration. It also supports context window management by summarizing long PRDs into the few constraints a prototype must respect.

Governance matters. I apply AI readiness and AI risk management principles: never paste secrets or PII, isolate sandboxes, and log prompts as docs-as-code for auditability. I prefer a retrieval-first pipeline that feeds approved product docs, OpenAPI specs, and design tokens so generations stay grounded. When tools are integrated, I favor the Model Context Protocol (MCP) to constrain capabilities and maintain least-privilege access. Human-in-the-loop review is non-negotiable—especially for anything that might influence customer data or pricing.

The best outcomes show up in product trios. I’ll facilitate a live session with design and engineering: we co-create prompts, compare alternatives, and converge on a thin slice we can ship. That collaboration keeps us empowered, reduces interpretation drift, and turns Claude Code into an accelerant rather than a sidecar. Over time, the trio curates a reusable prompt library for PRD outlines, experiment checklists, and integration playbooks.

Getting started is straightforward: define a safe environment, assemble your authoritative corpus (requirements, specs, taxonomies), and codify a few high-value templates—API exploration, instrumentation plans, sandbox data generators, and acceptance tests. Track impact with simple, objective metrics: cycle time from hypothesis to instrumented prototype, time-to-first-signal, and the proportion of decisions made with data versus opinion.

There are pitfalls. Hallucinated fields can creep into API calls, schema drift can break generated queries, and “clever” refactors may miss edge cases. I mitigate this by grounding generations in current specs, asking for unit tests alongside any code, and validating against a staging environment before anyone talks about production. Treat Claude Code as a collaborator, not an oracle.

If your mandate is to learn faster, de-risk bets, and ship with confidence, Claude Code is worth adopting. Used thoughtfully, it compresses the distance between questions and answers, elevates product discovery, and lets teams validate more ideas with fewer meetings—without compromising on governance or quality.

Inspired by this post on Product School.

June 12, 2026
Beyond Black‑Box Scores: Custom AI That Elevates Trust & Safety Without Burnout

What do you do when off-the-shelf moderation scores aren't good enough—and the alternative is paying human contractors to spend their days reviewing traumatizing content at scale? I’ve wrestled with that exact trade-off in enterprise environments, and it’s why I was eager to unpack how custom AI can raise the bar on trust and safety without compromising accuracy, latency, or the well-being of our teams.

In this episode of Just Now Possible, I sit down with Nikki Marinsek (Data Scientist), Brian McCaffrey (Software Engineer), and Dan Means (Machine Learning Engineer) from Musubi, an AI-native trust and safety toolkit for content platforms. Musubi builds custom-trained ML models and LLM-powered moderation tools that adapt to each platform's unique policies—from dating apps to social networks to AI inference endpoints. As a product leader, I’m drawn to their blend of eval-driven development, agentic AI, and pragmatic deployment pipelines that actually meet real-world SLAs.

We walk through their full journey—starting with a first prototype on tabular data—then discovering the system was sometimes catching issues human moderators missed. That insight became a forcing function to formalize evaluation, calibrate thresholds, and design feedback loops that help humans and models converge. Just as importantly, they built a policy optimizer that uses agentic flows so non-technical trust and safety teams can iterate on LLM moderation policies without needing a data scientist in the room.

If you’ve ever had to balance latency, accuracy, and cost at scale, you’ll appreciate how Musubi tests trade-offs across traditional ML, embedding-driven classification, and LLMs. Their approach mirrors the patterns I expect in high-throughput stacks: cache and pre-compute where possible, contain worst-case latencies, and push evaluation tooling to customers so policy changes are safe, observable, and fast to deploy.

What resonated most with me is their core product strategy: put eval tools directly in customers’ hands. When teams can benchmark AI against humans, referee disagreements using “LLM as judge,” and make policy gaps visible, trust increases and operational drift decreases. That’s the foundation for durable product strategy in sensitive domains like content moderation, fraud management, and risk scoring.

Listen to this episode on: Spotify | Apple Podcasts

Guests: Nikki Marinsek, Data Scientist, Musubi; Brian McCaffrey, Software Engineer, Musubi; Dan Means, Machine Learning Engineer, Musubi.

In this episode: Why off-the-shelf moderation scores fail and how custom-trained models fix that; How Musubi combines traditional ML with LLMs for different moderation tasks; The discovery that AI can outperform human moderators—and how to communicate that to clients; Using AI as a judge to referee disagreements between AI and human decisions; How Musubi onboards new customers with "reverse demos"; What custom model training actually means: fine-tuning, feature engineering, and reusable deployment pipelines; The policy optimizer: an agentic flow that helps customers iterate on their LLM moderation policies; Why pushing eval tools directly to customers is a core product strategy; How Musubi is building flexible orchestration workflows for non-technical trust and safety teams.

From a product management lens, a few highlights stand out. First, the disciplined separation of concerns: use traditional ML for high-precision, low-latency pattern detection and LLMs for nuanced policy interpretation. Second, invest in golden sets and policy loops early so you can quantify improvement and avoid subjective debates. Third, productize customization—create reusable deployment pipelines, parameterized policies, and self-serve evaluation—so each customer’s “custom model” still scales like a platform.

I also appreciated the onboarding tactic of "reverse demos." Rather than a canned walkthrough, the team invites customers to bring real policies and edge cases, then instruments the workflow live. That move builds credibility, accelerates discovery, and surfaces the fastest paths to value—an approach I recommend whenever you’re selling complex AI workflows to non-technical stakeholders.

If you’re navigating cost and latency trade-offs, the conversation goes deep on techniques like embedding-driven classification, fine-tuning vs. training, and when to route decisions through LLM adjudication. My takeaway: treat the router, the evaluator, and the policy as first-class products. When those elements are observable and testable, you can raise quality without exploding compute costs or creating operational bottlenecks.

Resources & Links: Musubi — AI-powered trust and safety toolkit for content platforms. Maven AI Evals Course — AI evals course.

Chapters: 00:00 Meet the Team; 01:18 Why Everyone Wears Product; 02:32 What Musubi Builds; 04:51 AI for Human Moderation; 09:59 Adversaries and Asymmetry; 11:48 Early Days and Low Latency; 13:35 First Prototype Slice; 15:33 Traditional ML Meets LLMs; 19:52 Benchmarking Against Humans; 23:09 LLM as Judge and Policy Gaps; 29:53 From Prototype to Platform; 31:15 Customer Onboarding Reverse Demos; 36:08 Custom Models Per Customer; 38:05 Fine Tuning vs Training; 39:14 Embedding Driven Classification; 40:04 Cost and Latency Tradeoffs; 43:21 Productizing Customization; 49:16 Scaling Prototypes to Production; 51:58 Golden Sets and Policy Loops; 56:17 Coaching Customers Safely; 01:02:06 Gamified Feedback Signals; 01:06:19 Agentic Toolkit Roadmap; 01:09:05 Workflow Orchestration Future; 01:12:06 Wrap Up and Thanks.

Ultimately, this is a playbook for modern trust and safety: align your models to your policies, make evals a habit not an event, and empower non-technical teams with agentic workflows and transparent metrics. That’s how we move beyond black-box scores to systems we can measure, manage, and trust.

Inspired by this post on Product Talk.

June 11, 2026