Author: Shivam Tiwari

Churn Prediction: A Practical Build-Versus-Buy Framework

You need a churn score soon. Customer success wants a prioritized account list, engineering wants requirements, and finance wants to know whether it is funding a vendor contract or a permanent internal capability. A polished model can still leave all three teams waiting if nobody has decided what happens after an account is flagged.

Start with the retention decision, not the algorithm. Once you know who will act, what they will do, and how you will measure the result, the build-versus-buy choice becomes much clearer.

Decide which capability you actually need to own

Churn prediction is often discussed as if it were a single model. In practice, it is an operating loop with several layers:

Define the outcome. Specify which customers can churn, what event counts as churn, and the prediction window that gives your team enough time to intervene.
Assemble the signals. Connect product usage, account attributes, engagement, support, billing, and other permitted data to a consistent customer identity.
Estimate risk. Produce a score, category, or ranking that separates accounts requiring attention from the rest of the portfolio.
Activate the prediction. Route the result into the CRM, customer-success workflow, lifecycle message, or in-product experience where somebody can respond.
Learn from the intervention. Measure whether the action changed retention, adoption, engagement, or Net Recurring Revenue rather than assuming that a plausible score created value.

You do not necessarily need to own every layer. A vendor might provide behavioral analytics, scoring, in-app guides, and CRM integration while you retain ownership of the churn definition, intervention policy, and experiment design. Conversely, you might build a specialized risk model but continue using commercial tools to collect events and deliver treatments.

My default is to separate model ownership from outcome ownership. Your company must own the definition of success, the permitted uses of the score, and the learning loop. It only needs to own the model code when that ownership creates a strategic advantage.

Before evaluating an architecture or vendor, complete this sentence:

When a customer in [defined population] crosses [risk condition], [named owner] will take [specific action] through [named system], and I will judge the intervention using [business outcome].

If you cannot complete it, pause the model decision. You have an intervention-design problem. Buying software will automate the ambiguity, while building will make the ambiguity more expensive.

Run six decision gates before choosing a path

The right answer depends on more than whether your team can train a model. Use these gates to expose the constraint that should control the decision.

Decision gate	Evidence to inspect	What pushes you toward a path
Time to value	Decision deadline, current churn visibility, and readiness of the first intervention	Urgent activation favors buying; a longer strategic horizon makes building more viable
Data readiness	Outcome labels, identity resolution, event consistency, signal freshness, and usable history	Immature data favors a packaged baseline while you repair foundations; reliable proprietary data strengthens the case to build
Strategic differentiation	Signals or decisions competitors and general-purpose vendors cannot reproduce	A must-have retention capability favors buying; a defensible product advantage favors building
Operating talent	Named owners for data pipelines, production scoring, monitoring, governance, and intervention design	Missing ownership favors buying; durable cross-functional capacity makes building credible
Activation fit	CRM, customer-success, messaging, analytics, and in-product delivery requirements	Standard integrations favor buying; specialized actions or product-embedded scoring may require a build or hybrid approach
Risk and explainability	Privacy, access, retention, audit, explanation, and regulatory requirements	Standard controls may fit a vendor platform; domain-specific constraints can justify owning selected layers

Time to value: is speed useful, or merely urgent?

A short deadline only matters when an intervention is ready. If customer success already knows what it will do with a high-risk account, buying can put usable signals into existing workflows sooner. If the team has not agreed on an action, a fast score simply creates a faster queue of unanswered alerts.

Ask for the date on which a real user must receive the first actionable score. Then work backward through integration, workflow design, governance review, enablement, and experiment setup. This prevents a vendor demonstration or model prototype from being mistaken for operational readiness.

Data readiness: can your records support the decision?

A custom model cannot rescue an unstable churn definition or inconsistent customer identity. Inspect whether product events can be joined to the correct account, whether the churn outcome is recorded consistently, whether important segments have comparable coverage, and whether signals arrive early enough to support action.

Do not interpret weak data as an automatic reason to buy. A vendor cannot manufacture missing labels or repair every instrumentation gap. It can, however, give you a practical baseline using the signals already available while your team improves the data foundation.

Differentiation: would model ownership change your product advantage?

Build when proprietary context can materially improve the decision. That may include distinctive behavioral signals, domain-specific anomaly detection, specialized explanations, or a risk score embedded directly into your product. These are stronger reasons than a general preference to own technology.

If competitors could buy an equivalent capability and churn prediction mainly helps customer success prioritize outreach, ownership is unlikely to be the differentiator. Put product and engineering attention into the intervention, customer experience, and learning loop instead.

Talent: can you operate the system after launch?

Having someone who can train a model is not the same as having an operating team. A production capability also needs data engineering, scoring infrastructure, monitoring for drift, feature maintenance, incident ownership, governance, and a product owner who connects model changes to retention outcomes.

Put a name beside every continuing responsibility. An empty cell is not a future hiring plan; it is part of the build cost. If the same scarce people are also responsible for your core product, include the opportunity cost of redirecting them.

Activation: can the score reach the moment of action?

A prediction trapped in a dashboard has little retention value. Confirm that a score can create the right CRM task, customer-success play, lifecycle message, product tour, contextual tooltip, or in-app nudge. The recipient also needs enough explanation to choose an appropriate response.

Evaluate activation with a concrete scenario, not a feature checklist. Give a candidate vendor or internal team one representative account and ask it to show the full path from new behavior to updated risk, reason, assigned owner, intervention, and measured outcome. Any manual handoff in that path belongs in the decision record.

Governance: what must remain controlled and explainable?

Document which data may be used, who may see the result, how long inputs and scores are retained, what explanations users need, and how a customer could be affected by a mistaken classification. Privacy-by-design, data governance, regulatory compliance, and AI risk management apply whether the prediction is purchased or built.

Building gives you more design control, but it also transfers the burden of evidence, monitoring, and remediation to your organization. Buying transfers implementation work, not accountability. Require the same governance review for both paths.

The pattern is straightforward: buy when speed, standard coverage, and workflow activation dominate; build when proprietary signals, specialized explanations, or product differentiation dominate; blend when you need results now but have a credible reason to own selected layers later. A useful default is to buy a working baseline and build only where your context can create an outsized advantage.

Compare the full economics, not a license and a prototype

The most common cost comparison is structurally wrong: an annual software license is placed beside the effort required to train an initial model. One is closer to an operating capability; the other is an experiment. Compare both options across the same time horizon and include four cost classes: starting, running, changing, and exiting.

What belongs in the buy case

License, usage, seat, and service costs that apply to the intended customer population.
Implementation work for event collection, identity mapping, historical data, and system integrations.
Security, privacy, legal, regulatory, and procurement review.
Internal administration, score interpretation, workflow ownership, and user enablement.
Configuration or services needed for segments, reason codes, guides, alerts, and experiments.
Limits on data access, exports, custom features, scoring frequency, and downstream activation.
Migration effort if the vendor no longer fits, including preservation of historical scores and experiment records.

What belongs in the build case

Instrumentation, data quality, identity resolution, label construction, and feature pipelines.
Exploration, training, evaluation, explanation design, and production validation.
Batch or real-time scoring, storage, APIs, access control, and reliability engineering.
CRM, messaging, customer-success, analytics, and in-product integrations.
Monitoring for drift, broken inputs, coverage gaps, and unexpected segment behavior.
Retraining, feature maintenance, documentation, incident response, and ongoing product ownership.
Privacy controls, audit evidence, risk review, retention rules, and regulatory compliance.
Replacement or migration work when the architecture, churn definition, or business workflow changes.

Add cost of delay to both cases. Buying may carry a visible contract cost, but waiting for a custom capability can defer retention experiments and leave customer-success capacity poorly targeted. Building may require more internal investment, but a vendor that cannot express your signals or deliver the required intervention can delay learning in a different way.

Keep benefit assumptions separate from cost estimates. The model’s theoretical accuracy is not a financial return. Estimate value only through an intervention that can plausibly affect customer behavior, then validate that assumption with an experiment.

Your comparison should therefore show three views for each path:

Capability: which parts of the signal-to-action loop will actually work?
Economics: what will it cost to start, operate, change, and exit?
Evidence: what experiment will determine whether the capability improves retention or NRR?

If one option looks cheaper only because a row is blank, resolve the missing responsibility before approving it.

Use a hybrid path without creating two disconnected systems

A hybrid strategy is more than running a vendor score and an internal score at the same time. Done well, it sequences the work: buy the common layers needed for speed and activation, learn which proprietary signals matter, and build only the components that earn their continuing cost.

Phase one: establish a usable baseline

Choose one defined customer population, one churn outcome, and one intervention. Configure the purchased capability to produce a risk signal and a usable reason, then route both into the workflow where the named owner can act.

Record three different kinds of evidence:

Prediction evidence: coverage, signal freshness, ranking or precision, stability across relevant segments, and the usefulness of explanations.
Operational evidence: whether scores arrive in time, whether users understand them, and whether a flagged account reliably receives the intended treatment.
Business evidence: whether the intervention changes retention, adoption, engagement, or NRR.

Do not use prediction quality to claim business impact. It is possible to identify high-risk accounts accurately and still deliver an ineffective intervention. It is also possible for a broad model to create value because it reaches the right team at the right moment. These are different questions and need different measures.

Phase two: test where proprietary context adds value

Use retention analysis to identify behaviors that appear meaningfully connected to continued use or churn. Focus on information a general-purpose platform cannot represent well, such as domain-specific sequences, unusual account structures, specialized failure states, or product-specific anomalies.

Introduce one material improvement at a time. Compare the resulting decisions with the baseline: which accounts move, whether the reason becomes more actionable, and whether the intervention performs better. A more complex score is not automatically a better product.

Use A/B testing or another appropriate controlled rollout to evaluate the intervention. Set the minimum detectable effect before the test so the team agrees on the smallest change worth detecting and whether the experiment can support the decision. Where withholding an intervention is inappropriate, compare credible treatments or use a phased rollout rather than treating measurement as optional.

Phase three: build only the layer that proved distinctive

The result may not be a complete vendor replacement. You might own a proprietary feature pipeline, domain-specific anomaly detector, custom explanation layer, or specialized risk score while retaining commercial analytics and activation. That is often a cleaner boundary than recreating collection, dashboards, integrations, guides, and workflow delivery.

Before moving a custom component into production, require evidence that:

The proprietary signal changes a meaningful decision rather than merely changing a score.
The resulting intervention has a credible path to measurable retention or NRR impact.
A named team owns data quality, production reliability, drift monitoring, governance, and retraining.
The migration preserves the activation loop instead of sending users to a separate dashboard.
The added value justifies both the continuing cost and the engineering capacity displaced by the work.

Create a canonical risk contract before two systems coexist. Define the eligible population, outcome, prediction window, score meaning, reason codes, refresh expectations, owner, permitted actions, and measurement plan. Without that contract, teams will compare incompatible scores and select whichever one confirms their prior belief.

Run the custom component beside the baseline before switching interventions. Inspect coverage, stability, explanations, workflow behavior, and segment differences without changing several parts of the retention program at once. This makes the eventual migration a product decision supported by evidence, not an infrastructure milestone searching for a justification.

Key takeaways

Buy when your immediate need is dependable coverage, rapid activation, and standard integrations for customer success or product-led growth.
Build when proprietary signals, domain-specific risk scoring, specialized explainability, or product differentiation can create material value and you can fund continuing operations.
Blend when you need a working baseline now and have a testable hypothesis about where your data or context can outperform a general-purpose capability.
Do not approve any path until every score has a named recipient, a defined action, a delivery system, and a business outcome.
Compare equivalent total costs, including data work, integrations, monitoring, governance, activation, opportunity cost, and migration.
Measure the model and the intervention separately. Prediction quality can prioritize attention; only an effective action can improve retention.

Take a one-page decision memo into your next review. It should name the churn definition, first population, intervention, deadline, available signals, proprietary advantage, workflow, operating owners, governance constraints, total-cost boundary, and experiment. End the meeting with a selected path and an explicit condition for reconsidering it.

Start with the smallest path that closes the loop from behavior to action to measured outcome. Earn the right to build more by proving that your own data changes the decision and that the decision changes retention.

References

Pendo – Build vs. Buy for Churn Prediction: My Proven Playbook for Faster Retention and ROI

April 16, 2026

From Brain Dump to Done: How Todoist’s Ramble Captures Tasks in Real Time with AI

Turning a rambling stream of consciousness into a clean task list while someone is still talking has been a longtime product dream of mine. With Ramble, Todoist brought that dream to life by using live audio AI to capture tasks in real time—no transcription step required. The result is a voice-to-task flow that feels natural, fast, and surprisingly disciplined.

As I listened to the Doist team—Ernesto Garcia (Front-end Product Engineer), Thomas Jost (Backend Software Engineer), and Hugo Fauquenoi (Product Manager)—walk through their approach, I heard a blueprint for building pragmatic GenAI features. What began as a two-to-three month AI exploration became one of their most technically deliberate releases: a “Gemini-powered pipeline that makes tool calls while the user is still speaking, surfacing tasks on screen in real time without any text output from the model.”

The breakthrough started with user research. People weren’t merely dictating tasks; they were doing a “brain dump” first—often into pen and paper or even ChatGPT voice—and only then committing items to Todoist. Meeting users where they already are reframed the problem: don’t force structure upfront; capture fluid thought and translate it into actionable tasks instantly.

That insight led to a bold architectural choice: skip transcription entirely and process raw audio directly with a Gemini live audio model. By removing the brittle middleman of text, the team reduced latency and kept the model focused on one job—turning intent into structured actions. It’s a crisp example of AI workflows designed for reliability over novelty.

The real magic is in the real-time “tool calls.” As the user speaks, the model triggers add task, edit task, and delete task operations immediately. For high-friction contexts like driving, they paired visual task cards with subtle sound effects as confirmation cues. It’s thoughtful conversation design that respects attention and safety without sacrificing speed.

Teaching the model to capture tasks literally—without over-interpreting or trying to complete the work—required careful prompt engineering for voice and temperature tuning. Drawing a bright line between “capture versus do” kept the experience trustworthy. In my own AI Strategy work, I’ve found that establishing explicit agentic guardrails early prevents unintended autonomy later.

Dates were the sleeper challenge. The team had to inject the current date, normalize to days vs. months, and always output dates in English for the natural language parser—while preserving the user’s original language for everything else. If you’ve ever shipped date handling across locales, you’ll appreciate how many edge cases hide in “Taming Dates and Time.”

Quality didn’t hinge on intuition alone. They built an LLM-judge eval system using real employee recordings from 100+ people across 35 countries in 20+ languages to catch prompt regressions. That’s eval-driven development done right: representative data, repeatable scoring, and tight feedback loops as models and prompts evolve.

For project and label matching, they chose direct context injection over RAG. Instead of building a retrieval pipeline, they injected the full project/label list into the system prompt. With smart context window management and a sharply constrained task schema, this was both simpler and more accurate. Sometimes the fastest path to product-market fit is removing moving parts, not adding them.

One product principle stood out: easy correction beats perfect first-time accuracy. Natural language interfaces earn trust when users can fix misfires in a tap or two. That bias toward quick recovery over false precision is how you ship AI that feels useful from day one.

Looking ahead, the roadmap is compelling: multimodal task capture from images and text blobs, Apple Watch support, and automation integrations. As voice AI agent patterns mature, this “tool-only architecture” sets a solid foundation for going from capture to coordinated execution—without losing the simplicity that makes Ramble shine.

If you want to hear the full conversation, you can listen on Spotify or Apple Podcasts. It’s a masterclass in building focused GenAI features that trade cleverness for clarity—and still delight.

Resources & Links: Todoist • Doist • Google Vertex AI (Gemini)

Inspired by this post on Product Talk.

April 16, 2026

How to Build a Trusted AI Product Platform That Scales

Your teams have AI pilots that work in a demo. Then the questions start. Security wants to know what data the system can reach. Product wants to know whether the answers are dependable. Support wants a fallback when the model fails. Executives want evidence that the investment is changing a customer or business outcome.

You do not need another impressive model response. You need a product platform that makes AI behavior understandable, controllable, and repeatable across use cases. That requires a trust architecture, a path from prototype to production, and metrics that expose failure instead of averaging it away.

Trust fails where an AI output crosses a decision boundary

Most teams discuss AI trust as if it were a property of the model. It is better understood as a property of the whole product system. A capable model can still create an untrustworthy experience if it uses the wrong context, hides a consequential assumption, calls an unauthorized tool, or leaves the user unable to correct an action.

The important moment is the handoff from generation to decision. Before that handoff, the output is a possibility. After it, someone may use it to answer a customer, change a record, prioritize work, or trigger another system. The controls you need depend on what crosses that boundary.

A practical way to classify AI use cases is by the authority you give the system:

Inform: The system summarizes, explains, retrieves, or drafts. A person still interprets the result.
Recommend: The system ranks options or proposes a next action. Its framing can materially influence a decision.
Act: The system invokes tools, changes state, communicates externally, or starts a workflow.

Use mode	Primary trust failure	Required product control	Evidence needed before release
Inform	An incorrect, incomplete, or untraceable answer	Visible scope, supporting evidence, uncertainty, and an easy correction path	An evaluator can reproduce the evidence path and identify known limitations
Recommend	A hidden assumption, weak comparison, or recommendation that ignores the user’s constraints	Explicit assumptions, alternatives, decision criteria, and user-editable constraints	Representative cases show whether the recommendation applies the intended rubric
Act	An unauthorized, excessive, or difficult-to-reverse change	Least-privilege access, previews, confirmation, audit records, and reversal where the underlying system supports it	Authorized reviewers validate simulated actions, denied actions, failure recovery, and a limited production path

This classification prevents a common planning error: giving every AI feature the same review process. A summarizer and an autonomous account-management agent should not pass through identical gates. The second system needs stronger identity, permission, confirmation, and recovery controls because its mistakes can propagate beyond the conversation.

For each proposed use case, ask five questions before discussing a model:

<!– wp:list {

April 15, 2026

Never Lose Your AI Superpowers: How I Sync Context and Skills Across Every Device

I spend a meaningful portion of my week helping teams operationalize AI workflows, and one theme comes up over and over: how to share context files and skills seamlessly across devices and with colleagues. Hosting Claude Code office hours has only reinforced it—sharing context and skills is the single biggest blocker to reliable, repeatable outcomes.

I hear from leaders driving AI adoption who have built robust, high-signal context systems and carefully crafted skills. Their challenge isn’t creating value—it’s distributing it. They need a way to make the same trusted workflows available to teammates and to keep everything in sync across laptops, desktops, and phones.

I hit the same wall myself. I work across multiple devices (a Mac Mini for day-to-day, a MacBook Air on the road, and an iPhone) and I collaborate with a full-time admin. I wanted my context and skills to be consistent everywhere, for both of us. In this piece, I’ll share my setup—what I store where, how I share it across devices and with my team, the trade-offs of each option, and how I keep everything current. We’ll cover four different syncing services: git/GitHub, Obsidian Sync, Dropbox and iCloud.

If you’re new to this series, this is the eighth installment. Earlier pieces provide foundational context: Claude Code: What It Is, How It's Different, and Why Non-Technical People Should Use It; Stop Repeating Yourself: Give Claude Code a Memory; How to Use Claude Code Safely: A Non-Technical Guide to Managing Risk; How to Choose Which Tasks to Automate with AI (+50 Real Examples); How to Build AI Workflows with Claude Code (Even If You're Not Technical); How to Use Claude Code: A Guide to Slash Commands, Agents, Skills, and Plug-ins; and Context Rot: Why AI Gets Worse the Longer You Chat (And How to Fix It).

The day it really hit me was right before my interview with Claire Vo on How I AI. I was staying in an AirBnB with only my laptop, and I planned to demo my /today command along with my context file structure. Minutes before the session, I realized the latest version of my /today command wasn’t on that machine. I was able to remote into my Mac Mini and grab it—crisis averted—but it was a wake-up call. I needed a more reliable, shareable approach for syncing context and skills across devices and with my admin.

I started by testing the tools I already used—Dropbox, iCloud, and GitHub—to see what might fit. Each got me partway there, but each also introduced friction that mattered in daily use.

First, absolute file paths don’t travel well. I began with Dropbox but quickly ran into cross-linking headaches. Good context systems rely on rich interlinking—index files point to other context files, and those context files link to each other. When Claude creates a link from one context file to another, it tends to use the full file path: /Users/ttorres/Library/CloudStorage/Dropbox. That worked on my Mac Mini and MacBook (same user name), but not on my phone—and not for my admin. I tried to force relative links (~/Dropbox), but couldn’t get Claude to do it consistently, which led to broken links. This isn’t unique to Dropbox; Claude prefers full paths because they’re reliable on a single machine, but they’re brittle across devices and useless when sharing with colleagues. Claude is trained to use relative file paths when working within a git repository, but I struggled to get it to work reliably in Dropbox.

Second, skills live in a user directory by default. By default, skills live in ~/.claude/skills. Most sync services aren’t designed to share your ~/ folder. iCloud is the exception, but then you’re limited to Apple devices—no Windows or Android. There is a workaround: set up a claude folder in Dropbox and create a symlink from ~/.claude to your synced claude folder, so all skills, commands, and settings live in Dropbox. Then, on each device (yours or a colleague’s), you set up a symlink to that folder so Claude can find the files. This works, but I was running into another limitation that made Dropbox a poor fit.

Third, Obsidian on iOS doesn’t sync cleanly with Dropbox. I rely on Obsidian’s file browser alongside my notes to navigate context quickly. Storing vaults in Dropbox gave me parity across my Mac Mini and MacBook Air, but I couldn’t get the iOS Obsidian app to reliably load my Dropbox vaults. That friction was a dealbreaker for on-the-go work.

At that point, I explored git/GitHub. GitHub is cloud storage for git repositories. A git repository is a folder of shared files used so engineers can collaborate on the same code base. Each person clones a local copy, works locally, then pushes changes back to the hosted repo on GitHub; others pull to update. Git’s merge and conflict tooling is excellent. Git is the powerhouse of file syncing and version control. It easily handles syncing context and skills, Claude behaves better with relative links in a git repo, and I can open the repo in my IDE with a clean file browser. For me, that checked all the boxes—until I factored in my admin. Git has a learning curve, requires manual pull/push hygiene, and often assumes an IDE workflow. That overhead was too heavy for a non-technical collaborator.

The turning point was Obsidian Sync. A colleague suggested it, and it ended up being the sweet spot. Obsidian is a markdown reader; files are stored locally in a normal folder you can open in Finder or File Explorer. There’s no proprietary format—you can read files with any text editor, and Claude can access them via bash commands. Obsidian Sync is simpler than git: open a note and it syncs in the background. I can access the same vaults across my Mac Mini, MacBook Air, and iPhone, and I can share a vault with my admin so we can both create and access notes.

Because we’re in different time zones and rarely edit the same note simultaneously, limited conflict handling hasn’t been an issue. Obsidian’s internal link notation also means one note can link to another and those links just work across devices. Claude can follow these links, so the brittle file path problem disappears.

Here’s where I landed. After a lot of trial and error, I have a setup that works across my devices and for my admin, who uses both a Windows desktop and a Mac laptop. I keep my core context in Obsidian vaults synced with Obsidian Sync, which preserves portability, link integrity, and ease of use. For skills, I avoid scattering files in machine-specific locations and instead centralize what Claude needs to reference in shared, human-readable folders. If you require advanced version control with branching and reviews, git/GitHub is excellent. If your priority is low-friction, cross-device access for non-technical teammates, Obsidian Sync is a practical, reliable choice. And if you must use Dropbox or iCloud, consider symlinks and be vigilant about relative paths—just know that absolute paths won’t travel well.

Inspired by this post on Product Talk.

April 15, 2026
Cracking the Hardest Percentages: Turn Complex Support into Scalable, Trust-Building Automation

I’ve learned that the smallest slice of your support queue often dictates the majority of your operating cost, customer memory, and automation ceiling. In product reviews and CX ops deep-dives, I see the same pattern: the “easy” tickets pad your resolution counts, but the complex, multi-step queries quietly own your handle time and your brand trust. If you care about compounding impact, your customer support AI strategy has to target that hardest percentage first.

Complex queries are a small percentage of your queue, but they consume a disproportionate share of your team’s time.

Take a typical queue: password resets outnumber refund disputes ten to one, but a reset takes five minutes and a dispute takes thirty. The “rare” query accounts for over a third of total handling time. The same pattern holds for account investigations, subscription changes, and billing disputes.

How you handle complex queries is also what customers actually remember about their support experience. When someone is dealing with a damaged order or a billing dispute, the stakes are higher, and a fast, good resolution is what separates a forgettable interaction from one that builds lasting trust.

Most AI Agents automate the easy, informational queries well. The question for your automation rate is whether they can handle the hard ones. That’s where agentic AI and robust AI workflows make or break your outcomes.

We’ve gotten really good at informational queries – the hard part is what comes next. I’ve seen teams invest deeply here, and for good reason: it lifts containment quickly and cheaply. But to break through the plateau, you have to execute actions across systems, not just answer with text.

We’ve invested deeply in informational Q&A. We built Apex, a specialized customer service model trained on billions of support interactions, as Fin’s core answering engine. Beneath that sits a custom retrieval model, a purpose-built reranker, and a unified RAG pipeline, all trained specifically for customer service. Fin resolves issues at a higher rate than general-purpose frontier models, with fewer hallucinations and at lower cost.

But informational Q&A only covers queries where text is the answer. Most Agents can handle that. Far fewer let you configure complex, multi-step actions without a forward-deployed engineer setting it up for you, which creates a gap.

Every query your team handles falls into one of three categories:

Informational: “Can you ship transatlantic by priority next day?” Answered with text from your knowledge base.

Personalized: “Where is my order?” Requires data unique to that user.

Action-led: “My order arrived damaged, I need a refund.” Requires doing something: checking a return window, cross-referencing transaction data, making a judgment call – reading from multiple systems and acting across them.

From Jan to Apr 2026, the trend moves steadily upward, pausing briefly before a sharp late surge. A clear snapshot of momentum for customer service KPIs, finance results, and the impact of new procedures.

These complex queries, the ones that require multi-step processes across systems, aren’t edge cases; they’re the reason your support team exists. This is the gap Fin Procedures was built to close.

It works in practice, and the trajectory matters for product strategy and ops planning.

Procedures is live, it’s scaling, and the results are clear. Since launching in managed availability, Procedures has handled over 1.5 million conversations, and volume is doubling month over month across hundreds of apps in fintech, e-commerce, gaming, healthcare, and SaaS.

When customers hit complex, multi-step queries, the experience is dramatically better when Fin can do the work end-to-end. We tested this with a randomized 5% holdout – conversations where Procedures would normally run, but didn’t. CSAT was 28.93% higher when Procedures ran, a statistically significant result.

A product, not a services engagement. I’ve sat through too many “automation” projects that were really solutions engineering gigs: workshops, custom scripts, then a queue of change requests when policies shift. It’s fragile and slow.

The B2B AI industry has a consultingware problem. It’s not databases being forked anymore, it’s prompts. The economics of maintaining bespoke setups per customer don’t work. Either the application falls behind new models, or the vendor changes the model and quality degrades invisibly.

In my view, an agentic AI platform should be a product your team owns end to end: a natural language editor – literally paste your existing SOPs – branching logic, data connectors, and AI-powered simulations for testing. Your CX ops team configures this, iterates on it, owns it. If you need help, a forward-deployed team can assist, but they’re optional, not a dependency. You always have control.

And because it’s a unified product, improvement compounds. When the vendor optimizes a prompt, every customer’s Procedures get better. When they upgrade the model, they can A/B test across the entire customer base and know it’s better before rolling out. You can’t do that when every customer has a bespoke prompt. The consulting model isn’t just expensive, it’s structurally unable to compound.

Today, Fin Procedures is available to every Intercom customer – no waitlist or managed rollout, ready for all 8,000+ customers.

We’re iterating fast based on real customer feedback. Here’s what’s landed since the last major update, and why it matters for reliability and governance:

AI-powered Procedure review: Flags broken logic, missing references, and unreachable conditions before you deploy.

Kick off your journey with the #1 Agent—an AI partner designed to turn resolutions into real outcomes. Tap “Start a free trial” to explore faster, smarter customer service and see how Fin delivers value from day one.

Procedure failure reporting: A new reporting dimension that lets you drill into conversations where Procedures failed, so you can diagnose and fix.

Version history with rollback: Track every change, compare versions, roll back if needed.

Data connector health monitoring: See at a glance if your integrations are healthy, degraded, or failing.

Optional data connector parameters: Fin only asks customers for information when it’s actually needed, instead of prompting for every field.

Email Simulation support: Test how your Procedures behave across chat and email before going live.

Agent in the Loop (Beta) unlocks the next tranche of automation. Even with Procedures, two things hold teams back from automating their most complex queries: missing integrations and policies that require a human sign-off on sensitive decisions.

“Agent in the Loop” is built for both. Need Fin to check your internal admin tools but haven’t built a data connector yet? Put a human checkpoint at that step. Fin handles the conversation, gathers context, and pauses, surfacing a structured summary for a human agent to verify or act, then resumes. You get automation on the 80% that doesn’t need the integration.

For compliance – identity verification, high-value refunds – Fin does the legwork, a human makes the final call and then hands it back to Fin. This works natively in the Intercom Inbox and via Slack. Some competitors don’t have an inbox-native variant at all, meaning humans need to leave their primary workspace to review AI actions.

Procedures are also built to let you collaborate with all your teammates – both human agents and AI Agents. Fin can work with them directly inside a Procedure, using APIs and webhooks to loop in another teammate mid-flow, hand off context, and pick back up once they’re done.

Making it easier, faster. Procedures is already self-serve, but the next step is making Procedure creation, testing, and maintenance significantly more streamlined and easy to do, with less manual editing and more AI-assisted building and debugging. There’s a lot coming in this space over the next few months – and it aligns perfectly with a retrieval-first pipeline and stronger governance at scale.

The hardest percentages matter the most. The biggest unlock for your automation rate won’t be answering more FAQs, it will be handling the complex, multi-step queries that consume your team’s time and define what customers remember about their experience with you.

That means working with an Agent that goes beyond answering questions and executes processes. A product your team owns and configures, not a service you buy and hope gets maintained. And a platform where every improvement compounds across every customer. That’s Procedures. Available now, for everyone.

Inspired by this post on The Intercom Blog.

April 14, 2026
Product Work Is Relationship Work: How I Align Stakeholders Faster and Cut Team Politics

Lately, I keep hearing a familiar question: with AI making it so easy to generate ideas and build products, do we still need product managers? My answer is unequivocal—yes. Tools accelerate delivery, but they don’t build trust, reconcile competing incentives, or create the shared understanding teams need to ship outcomes. Product work is relationship work.

I recently listened to “Product Work Is Relationship Work – All Things Product with Teresa & Petra,” and it echoed what I see every day in high-performing product organizations. If you prefer to watch, here’s the episode on YouTube: https://www.youtube.com/embed/d-0f8uAfc8w?feature=oembed

Listen to this episode on: Spotify | Apple Podcasts

While AI can help build things faster, it can’t replace the relationship work required to align stakeholders, navigate competing priorities, and create shared understanding across teams. That’s the hard, human part of product management—and it’s not going away.

In my experience, product teams stall when collaboration becomes transactional. We jump to negotiation (“What can you commit by Friday?”) before establishing context (“What problem are we solving and why now?”). When I slow down to get curious—about constraints, incentives, and assumptions—momentum actually increases because we’re rowing in the same direction.

Stakeholder alignment often breaks down when we conflate advocacy with exploration. We argue our viewpoint as if it were the only lens that matters, rather than making space to surface how others see the system. I’ve found the distinction between “dialogue vs. discussion,” rooted in work by Chris Argyris and elaborated in The Fifth Discipline by Peter Senge, to be a powerful reset. Dialogue builds shared understanding; discussion decides. You need both, in the right order.

Language matters in the room. The improv principle “Yes, and” is deceptively simple but transformative. When a designer, engineer, or executive feels heard (“Yes”) and we build on their idea (“and”), we create psychological safety without sacrificing critical thinking. I use “Yes, and” to explore perspectives before we converge on decisions—especially with product trios and senior stakeholders.

Here are the moves I rely on to keep collaboration relational and outcomes-focused. First, we align on outcomes before solutions. I explicitly separate outcomes vs output OKRs so we’re clear on what success looks like, independent of the features we ship. That clarity reduces rework and speeds up decision-making later.

Second, we operationalize curiosity with continuous discovery. I schedule recurring, lightweight touchpoints with customers and internal stakeholders so insights compound. When learning is continuous, debates quiet down—evidence does the heavy lifting.

Third, we invest in relationship rituals. Regular 1:1s with key partners, stakeholder maps that capture motivations, and pre-reads that frame trade-offs all prevent misalignment from surfacing in the last mile. These small habits pay huge dividends in trust and speed.

Fourth, I’m explicit about mode-switching in meetings: are we advocating a position or exploring perspectives? Calling the mode out loud prevents people from mistaking questions for opposition and keeps the conversation productive.

Fifth, we use “Yes, and” to move from possibility to practicality. We explore generously, then converge rigorously—ranking options by impact, effort, and risk so decisions are transparent and fair.

If stakeholder alignment, team dynamics, or product “politics” slow your team down, this conversation offers a practical reframe. You’ll move faster when you build the relational tissue first—because alignment is an accelerant, not a tax.

Resources & Links:

Follow Teresa Torres: https://ProductTalk.org

Follow Petra Wille: https://Petra-Wille.com

Mentioned in this episode:

Petra’s Coaching Packages

Work by Chris Argyris on organizational learning and dialogue vs. discussion

The Fifth Discipline: The Art and Practice of the Learning Organization by Peter Senge

Improv principle “Yes, and”: Saying “Yes, and” — A principle for improv, business & life and Yes, and …

Have thoughts on this episode or examples from your team? Leave a comment below—I’d love to learn what’s working (and what’s not) in your stakeholder landscape.

Inspired by this post on Product Talk.

April 14, 2026

A Practical Customer Insight System for Product-Led Growth

You have customer interviews, support tickets, sales objections, funnel dashboards, and a backlog full of requests. Yet activation is still stalled, retention remains uneven, and every team has a different explanation for why.

The problem is rarely a shortage of feedback. It is the absence of a reliable path from customer evidence to a self-serve product experience. You need an insight system that identifies the real obstacle, proves that it matters to the right segment, and scales the solution without making every customer depend on a call, a CSM, or a custom implementation.

Start with the growth outcome, not the feedback inbox

A customer insight is not a quote, a feature request, or a chart. It is an evidence-backed explanation of why a defined group of customers is or is not reaching an outcome.

That distinction matters in product-led growth. A request can be sincere and still point toward the wrong solution. A funnel can reveal a drop-off and still tell you nothing about the customer’s intent. An insight connects the customer’s job, observed behavior, friction, and business consequence closely enough to support a decision.

Begin with the outcome you need to understand:

Activation: Are new customers completing the behavior that represents first value?
Adoption: Are activated customers using enough of the core workflow to make the product part of their routine?
Retention: Are the right customers continuing to receive value after the novelty and onboarding assistance disappear?
Expansion: Is deeper usage creating a credible path to more seats, more workflows, or additional capability?

Build a driver tree beneath that outcome. If activation is the target, its branches might include setup completion, successful data connection, first completed workflow, and collaboration. If expansion is the target, the branches might include activation depth, feature breadth, additional users, and new use cases. The tree gives customer evidence a destination. An observation that cannot be tied to a growth outcome or one of its drivers may still be interesting, but it is not ready to shape the roadmap.

Define each metric before gathering evidence around it. Product teams often use terms such as activation, onboarding, and first value as if they were interchangeable. They are not. A dependable metric catalog specifies the formula, behavioral event, cohort, time window, exclusions, owner, and lineage. Without that contract, two teams can analyze the same customer journey and reach incompatible conclusions because they silently measured different things.

The same discipline applies to revenue outcomes. Net recurring revenue can be expressed as starting monthly recurring revenue plus expansion, less contraction and churn, divided by starting monthly recurring revenue. That lagging result becomes useful for product discovery only after you connect it to leading behaviors by segment. Otherwise, you know revenue moved but not what product behavior might explain the movement.

Capture each candidate insight in a standard record:

The target segment and use case.
The job the customer is trying to complete.
The expected growth outcome and driver-tree branch.
The observed behavior, including where the journey changes or stops.
The customer’s stated friction, goal, or workaround.
The consequence for time-to-value, adoption, retention, or expansion.
The evidence supporting the interpretation and the evidence still missing.
The next decision the insight is meant to inform.

Keep observation, interpretation, and decision separate. “Customers abandon setup after reaching permissions” is an observation. “They do not trust the permission model” is an interpretation. “Redesign the permissions screen” is a decision. Combining all three in one sentence makes an assumption look like evidence.

Move insights through three levels of leverage

Product-led growth does not require you to avoid high-touch customer work. It requires you to learn from that work without making high-touch intervention the permanent delivery model.

I use three levels to manage that transition: close customer diagnosis, repeatable internal operations, and customer-facing product. Each level answers a different question.

Diagnose the problem in context. Work closely enough with a customer to see the entire job: the intended outcome, existing process, product configuration, behavior path, workaround, and point of failure. The objective is learning, not merely satisfying the account’s request.
Turn the diagnosis into a repeatable internal workflow. Standardize the inputs, analysis, output, and recommended action so customer success, sales, or support can apply the learning to other relevant accounts. This stage tests whether the insight travels beyond the original customer.
Promote the proven pattern into the product. Give customers the diagnosis, recommendation, or improved workflow directly. The product must work reliably across its intended audience without a specialist interpreting every result.

The first version can be deliberately manual. One customer automation analysis took more than half a day to prepare and visualize. It predicted a 70% automation rate for that customer, which the customer then achieved. A single match does not prove that the method generalizes, but the manual work exposed a useful taxonomy and a measurable hypothesis. That taxonomy could then be operationalized across more customers before becoming customer-facing.

The lesson is not that every manual analysis deserves automation. Manual work is a discovery instrument. It lets you identify the variables, edge cases, and next actions before engineering a permanent system around them.

Use a promotion gate before moving an insight from one level to the next:

Problem repeatability: The same underlying job and obstacle appear across the intended segment, even if customers describe them differently.
Diagnostic stability: Different people can use the same inputs and reach a consistent interpretation.
Action repeatability: The recommended intervention helps customers take a recognizable next step rather than producing an interesting report.
Outcome visibility: You can observe whether the customer completed the behavior the intervention was meant to change.
Bounded exceptions: You understand where the method fails, which configurations require special handling, and when a human must intervene.
Product leverage: Self-service delivery creates enough customer value to justify product complexity, maintenance, and support.

Stopping at the internal-workflow level can be the correct decision. If an analysis is valuable only for a narrow set of complex accounts and still requires expert judgment, forcing it into a universal feature can make the product harder to understand. Productization is earned through generalization; it is not the automatic reward for discovering a customer problem.

Combine customer language, behavior, and commercial impact

No single signal can carry a product decision. Interviews reveal goals and reasoning but not prevalence. Behavioral data reveals sequences and drop-offs but not intent. Support and sales conversations expose recurring friction but overrepresent customers who chose to speak. Revenue data shows consequence but rarely identifies the mechanism.

Signal	What it can tell you	What it cannot establish alone
Customer interview or workflow observation	The job, desired outcome, constraints, vocabulary, and workaround	How common the problem is or whether a proposed change will alter behavior
Product behavior	Where users stop, repeat actions, take alternate paths, or differ by cohort	Why the behavior occurred or what outcome the user expected
Support, success, and sales conversations	Recurring confusion, implementation blockers, objections, and unmet expectations	The rate among all customers exposed to the same experience
Commercial outcomes	Which segments expand, contract, renew, or leave	Which product mechanism caused the result
Experiment or staged intervention	Whether a defined change moves a specified behavior for the eligible cohort	Why every customer responded as they did or whether the effect will persist indefinitely

Triangulation means linking these signals at the level of a segment and use case, not dropping them into one large feedback pile. Segment retention and adoption by plan, customer size, and use case. Then compare customers who reached the target outcome with customers who encountered the suspected obstacle.

A practical investigation sequence looks like this:

Define the eligible cohort using the metric contract.
Locate the behavioral inflection: the step, sequence, or period where successful and unsuccessful journeys diverge.
Inspect the surrounding support conversations, implementation notes, and known configuration differences.
Talk with customers from both sides of the divergence. Ask them to reconstruct what they were trying to do, what they expected, and what they did next.
Write the narrowest explanation consistent with the evidence, including plausible alternatives.
Specify the behavior that should change if the explanation is correct.

For example, suppose customers complete a technical connection but do not proceed to the collaborative part of a workflow. The evidence does not yet justify “build better team invitations.” Customers may not understand the next step, may not need collaborators for that use case, may lack permission to invite them, or may not have received enough value to involve colleagues. Each explanation implies a different solution and a different test.

This is also where AI-assisted analysis either becomes valuable or produces confident noise. An agent can help retrieve related conversations, run a defined funnel, compare cohorts, and assemble prior experiment results. It cannot compensate for contradictory activation definitions or an event taxonomy that nobody owns. A retrieval-first system grounded in canonical metrics, event definitions, cohort logic, dashboards, and versioned analysis gives the agent something stable to reason from.

Preserve provenance. Every generated synthesis should retain links to the customer evidence, metric definition, query, and decision record behind it. Apply the same permissions and PII controls used for the underlying systems. Faster synthesis is useful only when a product manager or analyst can inspect how the conclusion was formed.

Prioritize customer problems by leverage, not request volume

The loudest request is not necessarily the largest opportunity. Enterprise accounts generate detailed feedback because they have access to account teams. New self-serve users often leave silently. A raw count therefore measures how feedback entered the company as much as it measures customer need.

Frequency without a denominator is especially misleading. “This appeared in many tickets” is weaker than “this appears among customers who attempted this workflow.” Always compare the affected group with the population exposed to the same experience, then keep the result segmented by use case and customer type.

Before assigning a score, run each opportunity through six questions:

Outcome: Which activation, adoption, retention, or expansion driver does this problem obstruct?
Segment: Does it affect the customers and use cases the product is designed to serve?
Evidence: Do customer language and observed behavior support the same explanation?
Consequence: Does the friction delay value, block a core job, increase dependency on assistance, or precede contraction?
Leverage: Can a reusable product change solve the problem without recreating a custom service for every account?
Testability: Can you define the eligible cohort, expected behavior change, observation window, guardrail, and decision rule before shipping?

If an opportunity fails the outcome or evidence question, investigate it rather than prioritizing it. If it fails the leverage question, consider an internal tool, implementation playbook, or targeted service. If it fails testability, improve the instrumentation before making a broad release.

Write the opportunity without embedding the preferred feature:

For [segment] trying to [complete a job], [observable friction] prevents or delays [customer outcome]. This appears in [customer evidence] and [behavioral or commercial evidence]. If the explanation is correct, changing [part of the experience] should move [leading behavior] while preserving [guardrail].

That statement gives design and engineering room to find the smallest appropriate intervention. The answer may be clearer positioning, contextual education, a better default, a repaired workflow, a new capability, or a human-assisted path for exceptional cases.

Do not use in-app guidance to wallpaper over a missing capability. Guides, tours, and contextual tooltips can make the next value-producing action clearer when the product already supports the customer’s job. They cannot make an irrelevant or broken workflow valuable.

Define success before implementation. Name the primary behavior, eligible cohort, measurement window, and guardrails. If you run an A/B test, establish the minimum detectable effect and decision criteria in advance. If the eligible population cannot support a reliable experiment, use a staged release and cohort evidence, but do not describe an observational difference as causal proof.

Make customer learning part of the operating cadence

A strong insight repository can still become a graveyard if it is separated from planning and growth reviews. Put customer learning into the cadence where trade-offs are made.

Use a weekly driver-tree review for operating decisions. Review movement in the relevant outcome by segment, newly triangulated insight records, active tests, and results from prior interventions. Every item should leave the meeting in a named state: gather evidence, test a hypothesis, operationalize internally, productize, monitor, or stop. Record the owner, missing evidence, next action, and reason for the decision.

Keep strategic and execution cadences distinct. QBRs are useful for examining value realized, retention risks, and expansion paths with customers. OKRs translate the resulting priorities into owned cross-functional work. The weekly review manages learning and experiments between those larger checkpoints. This prevents a quarterly customer conversation from becoming an isolated presentation with no route into product execution.

Assign clear responsibilities:

Product owns the problem framing, decision record, and promotion between levels of leverage.
Research and design protect the customer’s context, job, language, and workflow.
Data protects metric validity, cohort logic, instrumentation, and the distinction between correlation and causation.
Customer success, support, and sales contribute evidence with account and use-case context rather than forwarding unqualified requests.
Engineering identifies technical constraints, edge cases, observability needs, and the ongoing cost of operating the solution.

Close the loop in three places. Tell participating customers what was learned or changed without promising that every request will ship. Tell internal teams why the opportunity advanced or stopped. Feed the result back into the insight system so future decisions can retrieve the hypothesis, intervention, and observed outcome.

If AI helps operate that system, evaluate it as a product rather than trusting plausible prose. Useful quality dimensions include faithfulness to metric definitions, numerical accuracy, latency, and actionability. Log accepted answers, material edits, and overrides. Those signals reveal where retrieval, taxonomy, permissions, or evaluation cases need improvement.

Key takeaways

Anchor customer evidence to a defined growth outcome and driver tree before discussing solutions.
Treat feature requests as entry points for investigation, not as validated insights.
Combine customer language with behavioral and commercial evidence at the segment and use-case level.
Move learning from close diagnosis to a repeatable internal workflow before making it a universal product feature.
Prioritize problems that are consequential, repeatable, measurable, and suitable for self-service delivery.
Use a weekly decision cadence so each insight advances, gathers evidence, or stops.

At your next growth review, choose one stalled outcome and trace it to one customer segment, one behavioral inflection, and the conversations surrounding that moment. Write the insight record before proposing a feature. Then decide whether the next move is deeper diagnosis, an internal workflow, or a product experiment.

That small discipline changes the purpose of customer feedback. It stops being material for a backlog and becomes a system for helping more customers reach value on their own.

References

April 13, 2026

How We Taught Agentic AI to Speak Product Analytics—and Unlocked Actionable Insights

I set out to solve a deceptively simple problem: help our teams ask product questions in plain English and get trustworthy, analysis-grade answers—fast. That required more than a powerful model; it demanded agents that genuinely understand the language of product analytics, from behavioral analytics nuances to the messy reality of event taxonomies, funnels, and cohorts. In this post, I share how we engineered agentic AI that speaks our domain fluently and turns questions into decisions.

The core challenge wasn’t data volume or dashboard sprawl; it was semantics. Different teams said “activation,” “onboarding,” or “first value” and meant overlapping but distinct things. Our PMs, analysts, and engineers navigated a maze of synonyms across Amplitude analytics, Pendo, and our unified analytics platform. Generic LLMs stumbled on these nuances, so we built a shared ontology—driver trees anchored to a clear North Star—with canonical definitions for activation, retention, and conversion, plus consistent event naming and cohort logic.

We started with a rigorous metric catalog: every KPI linked to its drivers, exact formulas, cohorts, and time windows; every event mapped to a product taxonomy; every dashboard and SQL snippet versioned with ownership and lineage. That catalog became the ground truth for agents. We embedded data governance and privacy-by-design from the start—permissioning for fields and queries, PII redaction, and scoped access that reflected how product teams actually work.

Next, we built a retrieval-first pipeline to ground the agents in our corpus before generation. We indexed metric definitions, dashboards, experiment readouts, runbooks, and high-signal Slack threads so the agent could cite relevant artifacts, not just predict plausible text. With careful context window management and prompt engineering, the agent retrieves definitions and prior analyses, then plans multi-step actions: run a query, compare cohorts, check “minimum detectable effect (MDE)” for an A/B test, and summarize findings with references.

Architecturally, we treated this as “Agent Analytics”: an orchestrator that selects tools based on intent—querying Amplitude analytics or Pendo for behavioral paths and funnels, hitting our warehouse for cohort tables, or pulling experiment metadata and anomaly detection alerts. Tool use is permission-aware, auditable, and designed to fail safe. The agent’s outputs include citations back to the exact definitions, dashboards, and SQL used, so reviewers can validate and iterate.

Quality came from eval-driven development, not intuition. We built a gold set of representative product questions (activation inflections, retention analysis by segment, funnel drop-offs after feature launches) and scored the agent on faithfulness to definitions, numerical accuracy, latency, and actionability. We incorporated regression checks to catch drifts after schema changes, and we tuned prompts to reduce overconfident answers and push for clarifying questions when context was missing.

Safety and reliability were non-negotiable. We layered AI risk management with role-based access, guardrails that block destructive queries, and risk scoring for unfamiliar joins or sudden spikes in metric deltas. The agent logs every step—what it retrieved, which tools it called, and why—so analysts can replay and refine the chain of thought with transparent provenance.

The payoff: product teams now self-serve nuanced questions in minutes instead of days, and our analysts spend more time on discovery than report wrangling. Retention analysis improved as the agent standardized cohort logic; conversion investigations accelerated thanks to consistent funnel definitions; and cross-functional decisions aligned around the same driver trees and shared language. Most importantly, the agent turned ambiguous asks into structured analyses that stand up to scrutiny.

For fellow product leaders, my lesson is simple: start with semantics, not models. A crisp ontology, disciplined taxonomy, and clear ownership will outperform a flashy stack riddled with ambiguity. Avoid technology FOMO; favor retrieval-first grounding, small sharp tools, and continuous discovery with your product trios. When your organization speaks a common analytics language, agents can finally think with you, not just for you.

Next, we’re extending the agent’s planning skills to recommend experiment designs, estimate power and “minimum detectable effect (MDE),” and propose driver-tree-informed bet sizing. We’re also tightening feedback loops so every accepted answer, edit, or override strengthens the retrieval corpus and evaluations. The vision: a calm, reliable layer that makes rigorous product analytics feel conversational—and helps teams move from questions to confident action.

Inspired by this post on Amplitude – Best Practices.

April 13, 2026
Stop Drowning in Tasks: How AI Marketing Agents Restore Focus and Maximize Impact

Every week I meet marketers who are working harder than ever—more campaigns, more content, more dashboards—yet seeing less movement on metrics that matter. The surge of AI tooling has amplified activity, not necessarily impact. That’s the focus problem: we confuse motion with momentum, and our backlogs look great while our outcomes stall.

Learn how AI agents for marketing can help you prioritize impact so you can do important work, instead of just more work.

In my role leading product and growth teams, I’ve learned that AI only compounds value when it is pointed squarely at outcomes. If we don’t define what “good” looks like, agentic AI will simply scale busywork. The antidote is a disciplined operating model that connects strategy to execution and instruments agents with clear success criteria.

First, anchor your program with outcomes vs output OKRs. Choose one or two measurable business outcomes—such as qualified pipeline, conversion rate, or activation—and make everything else subordinate. This provides the compass agents need to make effective trade-offs when speed and volume tempt you to do “one more thing.”

Second, map a driver tree from the target outcome down to the controllable levers: audience segments, offers, channels, messaging, and experience friction. This traceability shows where agents can move the needle fastest—whether that’s accelerating research, sharpening positioning, or eliminating handoffs that slow experimentation.

Third, design a small, agentic AI workforce aligned to those levers. For example: a Research Agent that synthesizes market insights and past performance; a Copy Agent that generates on-brief, on-brand variants; a Distribution Agent that adapts content to each channel and schedules posts; and an Analytics Agent that runs A/B tests, summarizes results, and flags anomalies. Keep human oversight where judgment matters most—strategy, brand voice, and high-stakes decisions.

Fourth, instrument rigor from day one with Agent Analytics and eval-driven development. Define offline evals for brand consistency, factuality, safety, and response time; pair them with online experiments that quantify lift on your target outcomes. Set a minimum detectable effect (MDE) so you stop shipping changes that cannot plausibly move the metric.

Fifth, operationalize your AI workflows. Standardize prompts, inputs, and handoffs; templatize briefs and acceptance criteria; and keep a change log so improvements compound rather than reset. Use short, frequent feedback loops to prune low-impact work and double down on what demonstrably advances your objectives.

I’ve seen teams reclaim focus and momentum when they treat agents as teammates, not toys. The magic isn’t in producing more assets—it’s in consistently choosing the next best action in service of a clear outcome. When you combine outcome clarity, a driver tree, targeted agents, and tight evals, AI becomes a force multiplier for marketing impact.

If you’re feeling overwhelmed by AI’s possibilities, start small: commit to one outcome, one driver you believe is material, and one agent designed for that job. Prove lift, codify the workflow, then scale. Velocity is only valuable when it’s pointed in the right direction.

Inspired by this post on Amplitude – Best Practices.

April 10, 2026
Inside the Most Politically Dangerous C‑Suite Role: Hard Truths on Culture, Layoffs, and Leadership

I’ve long believed the people function is a strategic engine, not a support lane. That conviction was only reinforced in a recent deep dive with Katie Burke, now COO at Harvey after joining as Chief People Officer. Before Harvey, she spent 11 years in HR leadership at HubSpot, helping build one of tech’s most distinctive cultures. In this piece, I unpack what resonated most for me as a product leader: a marketing-minded approach to HR, deliberate hiring from hospitality, and the non-negotiable case for culture as a core business strategy.

The first principle is simple and often overlooked: HR leaders should think like marketers. Employer brand is a product; your candidate and employee journeys are funnels; and your programs deserve the same rigor we bring to product—segmentation, positioning, channels, and continuous A/B testing. When we treat onboarding, performance, and manager enablement like iterative product launches—complete with activation metrics, retention curves, and NPS—we stop guessing and start compounding results.

One line has become a north star for how I approach executive leadership: “Don’t ask for a seat at the table. Build the table.” In practice, that means codifying the operating system—decision rights, principles, cadences, and accountability—so the organization isn’t improvising strategy in every meeting. Product, People, and Finance should co-own this OS; that’s how you scale clarity faster than headcount.

Transparency is the tax we pay for alignment, and it compounds trust. After an IPO, the impulse can be to close ranks. The better move is radical transparency with context: what changed, why it matters, and how decisions get made now. On my teams, that looks like publishing decision records, sharing tradeoffs explicitly, and using written docs to reduce rumor velocity—core muscles in stakeholder management as complexity grows.

I also loved the counterintuitive hiring bet: prioritize hospitality backgrounds alongside traditional corporate pedigrees. People who’ve thrived in service environments bring customer empathy, operational resilience, and a bias for proactive care—traits that elevate everything from onboarding to incident response. In product terms, they’re culturally accretive hires with high signal on service quality and consistency.

The trickiest part of the Chief People Officer role isn’t process—it’s politics. You are the executive team’s own HR business partner, which requires coaching, candor, and conflict mediation at the highest stakes. The goal is to “Be the Michael Jordan of your exec team”—the teammate who elevates standards, makes others better, and chooses the hard right over the easy familiar.

Layoffs create a culture debt that accrues interest. Expect a “2.5-year cultural hangover after a layoff”—in many companies, an inevitable two-year layoff hangover—unless you actively repay it. That repayment plan includes narrating the why with specificity, rebuilding trust through manager enablement, and re-anchoring on performance and values. Measure leading indicators (manager effectiveness, time-to-decision, psychological safety) alongside lagging ones (regretted attrition) to track the true recovery arc.

People leaders also need to create “graceful exits.” Doing this well preserves dignity for the person, protects the team’s morale, and safeguards the company’s brand. The bar is straightforward: clear rationale, fair process, useful feedback, generous support, and alumni pathways. A graceful exit signals that even when business realities bite, respect is non-negotiable.

Expectation-setting matters. Two truths cut through the noise: “The workplace shouldn’t be Disneyland” and “Our job is not to make you happy every day.” The promise is not perpetual happiness; it’s meaningful work, fair standards, growth opportunities, and leaders who tell the truth. When we set that contract clearly, engagement becomes an outcome of purpose and progress—not perks.

On feedback, I use the protein vs. sugar rule for employee feedback. Sugar feedback is pleasant and perishable; protein feedback is specific, sometimes uncomfortable, and growth-driving. Great cultures build a taste for protein—clear role expectations, crisp examples, and written follow-ups. Mechanically, that looks like structured 1:1s, decision retros, skip-levels, and manager training that demystifies “what good looks like.”

Being a Chief People Officer isn’t for the faint of heart. The role must be demanding by design—on executive hiring quality, performance management courage, and values enforcement. Moments like “Berry-Gate” are reminders that small symbolic issues can balloon when feedback loops are unclear. Close the loop fast, publish the rationale, and ensure there’s a predictable path for concerns to be heard and resolved.

When hiring, beware patterns that predict friction. That’s why “frequent flyers” are a new-hire red flag. Movement can signal adaptability—but weather-vein pivots and blame-shifting often repeat. Probe for ownership, learning moments, and sustained impact; you want people who compound value, not just sample it.

Clarity on scope prevents leadership whiplash. Which company decisions fall to the Chief People Officer? Think leveling frameworks, compensation philosophy and bands, performance calibration, manager standards, ER policies, and org design guardrails—always in lockstep with Finance and the CEO. Escalate when there are values collisions or systemic risks; otherwise, push decisions to the right altitude and owner.

Scaling exposes the same few failure modes on repeat: fuzzy decision rights, a thin manager bench, brittle processes that don’t flex, and inconsistent leveling that erodes trust. The antidote is an operating model that pairs clear principles with lightweight mechanisms—documented roles, regular calibration, and reviews that audit for both outcomes and operating behaviors.

Comparing a scaled SaaS like HubSpot with an AI-native company like Harvey surfaces important differences. The former optimizes for durable systems, predictable cadences, and governance; the latter optimizes for rapid learning loops, emergent org design, and a higher tolerance for ambiguity. The art is porting the right controls at the right time without crushing velocity.

AI is already changing the people function. GenAI can draft job descriptions, summarize performance notes, classify themes from engagement surveys, and power AI workflows that resolve common HR tickets. The human-in-the-loop remains essential for judgment, context, and ethics—especially around data governance and privacy-by-design. A pragmatic AI Strategy here frees HRBPs for higher-order coaching and organizational development work.

One practice I recommend widely: share your own performance reviews. Modeling openness normalizes growth and turns feedback into a shared craft, not a secret ritual. It also builds trust when you later ask the organization to lean into sharper, protein-rich feedback.

Finally, disagreements with the CEO are inevitable—and healthy. Handle them with pre-briefs, crisp written proposals, explicit tradeoffs, and a shared decision record. Argue like scientists, not politicians; once a call is made, disagree and commit. That combination of candor and alignment is what keeps executive teams high-trust and high-velocity.

The people leader’s chair may be the most politically dangerous role in the C-suite—but it’s also one of the most leveraged. Build the table, tell the truth, design for standards and dignity, and treat culture like the product that powers everything else.

April 10, 2026
Commercial vs. Internal Products: Hard Truths, High Leverage, and How I Make the Call

Internal Products Are Hard; Commercial Products Are Harder. That line captures years of hard-won lessons from leading both internal platforms and market-facing SaaS at HighLevel. I’ve seen how the two demand different muscles—even when the tech stack, talent, and timelines look the same on paper.

When I talk about internal products, I mean services and solutions that our own employees use to take care of customers—customer-enabling tools and services, agent consoles, fulfillment and billing workflows, operations dashboards, and the underlying platforms that keep them fast, compliant, and resilient. These tools don’t generate revenue directly, but they quietly determine customer experience, gross margin, and how quickly we can ship, resolve issues, and scale.

Commercial products, by contrast, add a second challenge layer. Beyond discovery, usability, and reliability, we must conquer positioning, pricing and packaging, competitive differentiation, sales enablement, procurement hurdles, and ongoing customer success motion. The surface area for failure is bigger, and the time-to-signal on product-market fit is slower and noisier.

Here’s how I decide where to invest. First, I anchor on outcomes, not output. If the business priority is net revenue retention, faster onboarding, or reduced cost-to-serve, internal products often provide the highest-leverage path. If the priority is new revenue, new market entry, or a must-have differentiator, we lean commercial. I make the trade explicit in outcomes vs output OKRs so we can defend the decision when pressure mounts.

Second, I run a clear build vs buy calculus. For internal needs, the default is buy if a mature, configurable solution exists that meets our security, data governance, and integration requirements. I only build when the workflow is core to our differentiation, the TCO of customization is lower than vendor sprawl, or we can capture unique proprietary advantage. For commercial products, I avoid embedding third-party IP in a way that caps differentiation or compresses margins as we scale.

Third, I insist on continuous discovery. Internal audiences are not a captive market—they’re discerning experts with real jobs to do. I treat them like customers, with structured customer interviews, journey mapping, and opportunity solution trees. I rely on empowered product teams and product trios to validate problems and reduce solution risk before we commit engineering time.

Fourth, I frame commercial vs internal work with capacity guardrails. In most planning cycles, I reserve explicit allocation for platform scalability and internal tooling, separate from feature bets. Without this, internal products become backlog filler, which guarantees we’ll pay the interest later in churn, SLA breaches, and slower delivery.

Execution differs too. For internal products, change management is the make-or-break. I plan enablement as a first-class deliverable: clear rollouts, in-app guides, training, and feedback loops with frontline champions. I track adoption, time-to-resolution, error rate, and satisfaction for internal users with the same rigor we apply to external users.

For commercial products, I design the discovery-to-GTM handshake early. Pricing and packaging must reflect value drivers discovered in research, not what’s easiest to meter. Sales and solutions engineering need crisp narratives, objection handling, and proof points. Customer success needs activation plans and health signals tied directly to leading indicators of retention.

Across both, I instrument the product and process. I lean on feature flags and progressive delivery to manage risk, and I protect SLOs with error budgets so teams balance reliability with iteration speed. CI/CD isn’t a badge—it’s how we earn the right to ship continuously without eroding trust.

Common pitfalls recur. Teams skip UX for employee tools because “they have to use it”—which backfires as shadow workflows and rework. Leaders underfund internal platforms, then wonder why velocity stalls. On the commercial side, teams over-index on features and under-invest in positioning and onboarding, leading to poor activation and elongated sales cycles.

What’s the payoff? When we treat internal products as products, we unlock scale: shorter handling times, fewer escalations, clearer accountability, and higher customer satisfaction. When we approach commercial products with the same discovery rigor plus smart GTM, we compress time-to-value and amplify differentiation. The craft is knowing which lever to pull when—and having the discipline to measure what matters.

My rule of thumb is simple. If the goal is operational excellence that compounds across the entire customer journey, invest in internal products with the same intensity you reserve for revenue-generating features. If the goal is market expansion or category leadership, invest in commercial products with a tight discovery-to-GTM loop. In either case, clarity of outcomes, disciplined discovery, and empowered teams win the day.

Inspired by this post on SVPG.

April 9, 2026
Stop Forcing AI to Prove ROI: A Product Leader’s Playbook to Measure Real Business Value

Every planning cycle, I feel the drumbeat: “Show me the AI ROI—this quarter.” The pressure is real, especially when boards and CFOs expect immediate payback. Yet when I review stalled initiatives across teams and peers, the pattern is consistent: most companies treat AI like a feature to ship, not a system to manage. That mindset almost guarantees we measure the wrong things, declare victory (or failure) too early, and miss the durable value AI can create.

Here’s the core problem I see: we leap to solution and skip the counterfactual. Without a baseline, a clear control, or a defined “what would have happened otherwise,” we’re guessing. We also fixate on lagging, financial KPIs that move slowly (revenue, cost, risk), then use outputs—not outcomes—as OKRs. If we don’t align on outcomes vs output OKRs upfront, the best team in the world can still optimize for activity over impact.

My AI Strategy starts from a simple truth: value shows up along three vectors—revenue, cost, and risk—on different timelines. In the near term, we must validate leading indicators (adoption, engagement, activation) that ladder to those vectors through a transparent driver tree. Over time, those drivers compound into the lagging KPIs finance cares about. When we make the driver tree explicit, everyone can see how model precision, response time, and workflow integration roll up to conversion lift, case deflection, time-to-resolution, or reduced exposure.

To make this rigorous, I run a five-step playbook. First, define the decision and business outcome in plain terms. Second, instrument the baseline with behavioral analytics on a unified analytics platform—tools like Amplitude analytics or Pendo help expose friction points we’ll later target. Third, create a counterfactual using A/B testing and specify a minimum detectable effect (MDE) so we know how long to run and how much traffic we need. Fourth, quantify costs (training, inference, integration, change management) and include AI risk management, privacy-by-design, and data governance up front. Fifth, lock a measurement plan that connects leading indicators to lagging ROI through the driver tree.

Most AI initiatives don’t fail on model quality—they fail on adoption. If the workflow isn’t smoother, trust isn’t earned, or value isn’t obvious, users revert. That’s why I invest early in onboarding, in-app guides, product tours, and thoughtful tooltip design to reduce the time-to-first-value. Then I watch user activation, retention analysis, and task completion to ensure the assistive experience is not just novel—it’s habit-forming.

For generative use cases, eval-driven development is non-negotiable. I maintain offline evaluations for accuracy and safety, and online evaluations for business impact. Retrieval-first pipeline health, context window management, and prompt engineering affect reliability; so do latency and grounding quality. We ship behind feature flags, measure guardrail effectiveness, and tighten feedback loops from human-in-the-loop reviews into model updates—continuously.

On the business side, I avoid “AI theater” by structuring benefits like a CFO. Revenue: increased conversion or expansion driven by better recommendations, faster sales cycles, or higher trial activation. Cost: case deflection, agent time saved, fewer escalations, and lower rework. Risk: reduced exposure via automated checks, anomaly detection, and consistent policy application. If any claim can’t be tied to measured deltas—via A/B testing or strong quasi-experiments—it doesn’t go in the deck.

Build vs buy deserves the same discipline. I map platform scalability, governance requirements, and total cost of ownership against time-to-impact. Teams often underestimate integration and maintenance drag; a pragmatic mix of bought components with thin custom layers can accelerate outcomes while keeping options open. The goal isn’t to own every layer—it’s to own the learning loop and the differentiated experience.

I also remind teams that tooling should serve the strategy, not replace it. I’ve seen concise, effective messaging that captures the point: “Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.” The words are compelling because they reflect the three-vector value model and the adoption imperative. The same standard should apply to any AI initiative we propose.

If you’re under pressure to prove ROI, shift the conversation: lead with the driver tree, specify your counterfactual, and anchor on leading indicators you can move in weeks—not quarters. Then connect those to the lagging KPIs finance expects over time. When we manage AI like a product—grounded in evidence, experimentation, and user-centered adoption—we don’t have to force ROI. We compound it.

Inspired by this post on Pendo – Perspectives.

April 8, 2026