Tag: AI Strategy

How Snapbar Turned Crisis Into an AI-Native Photo Experience Revolution

What does it take to reinvent a 14-year-old company, not once, but twice? I ask that question often when I look at mature product organizations, because the hardest transformations rarely start with a clean slate. They start with real customers, legacy expectations, operational muscle memory, and a market that suddenly refuses to behave the way it used to.

Snapbar is a useful case study in that kind of transformation. The company began as a wedding photo booth side hustle, grew into a national events company, and then watched COVID wipe out the entire business overnight. As a product leader, I find that moment especially important because it separates teams that are attached to the current expression of their product from teams that understand the deeper customer need underneath it.

The deeper need was never just a physical photo booth. It was identity, participation, memory, brand engagement, and a shareable experience that people could take with them. When in-person events disappeared, Snapbar went from physical photo booths to a cross-platform virtual product built on WebRTC in spring 2020. That was not a cosmetic pivot. It was a first-principles rebuild under pressure.

I have seen many teams talk about innovation when conditions are favorable. Snapbar’s story is more interesting because the team had to innovate when the existing business model was unavailable. That kind of constraint can be clarifying. It forces product teams to ask: What job are we really doing for customers, and what parts of our current solution are merely historical artifacts?

The next reinvention came from generative AI. Pushed by declining repeat business, Snapbar dove deep into Stable Diffusion, custom LoRA fine-tunes on H100/H200 GPUs, and eventually a reasoning-model-powered generative image and video pipeline. What stands out to me is not simply that the team adopted gen ai. It is that they connected AI capabilities to a domain they already understood deeply: photography, events, brand activations, and experiential marketing.

This distinction matters. In product management, technology FOMO can lead teams to bolt AI onto workflows without a clear strategic advantage. Snapbar appears to have moved differently. They used 14 years of industry knowledge to identify where AI could change the experience itself, not just automate a back-office task or generate a novelty output.

The product evolution is a strong example of applied AI. Snapbar integrated Stable Diffusion 1.5 as their first generative AI model and ran custom LoRA fine-tunes on H100/H200 GPUs to produce brand-quality outputs nobody else in their space could match. That level of execution shows the difference between experimenting with a model and building a differentiated product system around it.

I also appreciate the way the team moved from negative prompts to reasoning model long-form prompts. In brand environments, creative control and safety control are not optional. A brand activation must feel imaginative, but it also has to remain on-message, inclusive, and predictable enough for a live event setting. Better prompt engineering becomes part of the product’s trust layer.

One of the most important product details is the meta-prompting pre-processing pipeline designed to ensure user likenesses, including non-obvious details like disabilities, are accurately represented in generated images. That is not a minor implementation detail. It reflects a more mature view of AI risk management, representation, and customer experience.

From my perspective, this is where product strategy and ethical technology intersect. Generative AI systems can easily flatten people into generic outputs. A thoughtful product team has to decide what fidelity means, what consent means, and how much control users and brands should have over the final artifact. Snapbar’s approach suggests that representation is not just a model-quality problem; it is a product-design problem.

Just Now Possible spotlights Snapbar’s journey from photo booths to AI-powered brand experiences, framing reinvention, creativity, and applied AI as the center of the conversation.

The company’s experiential marketing platform lets brands “world build” at conferences, trade shows, and live events by bringing fans into branded creative worlds. That phrase matters because it reframes the photo booth from a capture device into a participatory brand system. The user is not merely photographed. The user becomes part of a designed world.

I see this as a broader shift in product experience. Static brand impressions are giving way to co-created moments. Snapbar added participatory user inputs through Mad Lib-style prompts and prompt injection, turning photo experiences into co-creation moments between brands and their audiences. That is a more durable engagement loop than simply asking someone to pose in front of a branded backdrop.

The operational story is just as relevant for product and engineering leaders. Snapbar used Claude Code and Codex to build and ship features rapidly as a small bootstrap team, and developed a four-pillar agent orchestration framework: context, tools, verification, and workflows. I like that framing because it treats AI-assisted development as a system of work, not a magic shortcut.

In my own product leadership work, I keep coming back to the same lesson: AI workflows only become reliable when the team defines the surrounding operating model. Context determines whether the agent understands the problem. Tools determine what it can actually do. Verification determines whether the output is trustworthy. Workflows determine whether the capability compounds across the organization.

Snapbar is now building customer-facing “vibe coding” using the Claude Agent SDK so brands can configure and create experiences themselves within Snapbar’s platform. That is a meaningful product move. It shifts creation closer to the customer while keeping the workflow inside a controlled product environment. For brand teams, that could reduce dependency on custom service work while still preserving creative flexibility.

This is the kind of AI Strategy I find most compelling: not a generic claim that AI will transform everything, but a specific path from domain expertise to product capability to customer empowerment. Snapbar did not abandon its past. It converted years of event, photography, and brand knowledge into a new interface for generative AI.

The core lesson for product teams is clear. Reinvention does not always mean discarding the original business. Sometimes it means identifying the durable customer need, rebuilding the delivery mechanism, and then using new technology to expand what the experience can become. Snapbar’s journey from wedding photo booths to virtual WebRTC experiences to AI world building shows how a team can preserve its market intuition while changing nearly everything about the product surface.

For product leaders evaluating gen ai opportunities, I would take three practical lessons from this story. First, start with the customer experience, not the model. Second, treat brand safety, representation, and verification as product requirements from the beginning. Third, use agentic AI internally only when the team has a clear framework for context, tools, verification, and workflows.

Snapbar’s story resonates because it is not about chasing a trend. It is about a team using necessity, curiosity-led self-education, and disciplined product thinking to build something that feels native to the generative AI era. That is the difference between adopting AI and becoming AI-native.

Inspired by this post on Product Talk.

July 9, 2026
Durable Product and Platform Leadership Beyond the Launch
A successful product can create momentum, but durable leadership determines whether that momentum becomes an enduring company or platform. The distinction is especially important for infrastructure businesses, where trust, scalability, and operating discipline must keep pace with adoption.

Taken together, the two source articles suggest a practical leadership test: can an organization preserve customer value while strengthening the strategy, governance, and systems surrounding the product?

Product strength can conceal organizational weakness

Why Great Products Can Still Fail argues that product excellence is necessary but insufficient for company health. A compelling product may temporarily mask unclear strategy, weak accountability, poor tradeoffs, or an operating culture that values output more than outcomes. Adoption and market opportunity do not automatically prove that the organization can make sound decisions as it grows.

This changes the leadership question. The issue is not simply whether teams can ship something customers value, but whether the company can repeatedly direct talent and capital toward the right problems. Product discovery, stakeholder management, roadmapping, and sprint planning become parts of a governance system: they connect customer evidence to decisions and expose assumptions before those assumptions harden into costly commitments.

The article also emphasizes ethical decision-making and corporate governance. That perspective broadens product leadership beyond roadmap ownership. Leaders remain responsible for the organizational conditions under which a successful product is developed, sold, and extended.

Durable platforms reduce uncertainty at every stage

The Supabase article approaches durability through a developer-platform case study. It reports that Supabase started with an open-source PostgreSQL proposition intended to combine rapid application development with an architecture developers would not have to abandon as their needs became more serious. In that account, the platform’s value rests on fast setup, predictable building blocks, reliable documentation, sensible defaults, and a credible path to scale.

Those qualities reveal a broader platform principle: durability is not the same as having the largest feature set. A durable platform lowers uncertainty. It helps customers understand what they are adopting, begin using it without unnecessary friction, and remain confident that early speed will not create an architectural trap later.

The source attributes part of that confidence to Supabase’s alignment with PostgreSQL and its open-source approach. Community trust and commercial growth are presented as mutually reinforcing rather than competing motions. This complements the governance argument from the first article: trust is created when a company’s operating choices support the product promise, not merely when its marketing states that promise.

Leadership durability comes from connected operating loops

The Supabase account reports that founder Paul Copplestone’s earlier startup experiences contributed to an emphasis on finding product-market fit before blitzscaling and on separating fundraising from building. It also describes the company as operating with a constraint mindset even after raising capital. Read alongside the warning that strong products can disguise structural problems, the lesson is that available resources should not be mistaken for validated demand or organizational readiness.

Positioning forms another operating loop. According to the Supabase article, a tagline change preceded the project reaching the top position on Hacker News and was treated as an early product-market-fit signal. The useful interpretation is not that wording alone establishes fit. It is that positioning can test whether the market recognizes the job a product performs. When the message and the customer problem align, feedback becomes clearer and acquisition friction may fall.

Measurement must then distinguish genuine contribution from inherited momentum. The source reports that Supabase designed sales compensation around incremental uplift over a control group. In a product-led business, that approach asks whether sales created conversion or expansion beyond what self-service adoption would probably have generated. It places evidence above activity and limits the temptation to claim credit for demand already produced by the product.

Organizational learning completes the system. The article describes a fully distributed, asynchronous team with near-zero attrition and connects its scaling philosophy to kaizen, or continuous improvement. Because these are claims from a single company-focused account rather than independently verified comparisons, they should be treated as reported characteristics. Their leadership relevance is still clear: asynchronous execution depends on strong writing and explicit ownership, while continuous improvement requires teams to identify and remove recurring friction.

AI readiness should amplify a durable foundation

The Supabase article reports three AI-related waves involving pgvector, Bolt and Lovable, and Claude Code. It presents these developments as successive ways in which retrieval, rapid application creation, and AI-native development workflows increased the relevance of an existing backend platform.

The sequence matters because it separates readiness from trend chasing. The reported AI opportunities could compound platform value because the underlying customer need already existed: developers wanted to build quickly on a backend they could trust. AI changed workflows and urgency, but it did not replace the platform’s core value proposition.

For leadership teams, this implies a disciplined filter for emerging technology. A new capability deserves investment when it strengthens an established customer job, improves the platform’s trusted primitives, or opens a coherent path for existing users. Technology excitement alone cannot resolve weak positioning, unclear ownership, or an unproven operating model.

Key takeaways
- Treat product success as evidence, not immunity. Adoption does not eliminate the need for governance, ethical judgment, and explicit accountability.
- Design platforms around customer confidence. Fast onboarding, dependable primitives, clear documentation, and a credible scaling path matter together.
- Preserve constraints after capital or demand arrives. Resources should follow validated customer value rather than substitute for it.
- Measure incremental impact. Product-led and sales-led motions need a method for separating created lift from revenue that would have occurred anyway.
- Use AI to extend a durable value proposition. Emerging workflows are most useful when they compound an existing platform advantage.
Durable leadership is ultimately visible in what happens after early success: whether the organization converts attention into learning, learning into disciplined choices, and those choices into a platform customers can continue to trust.

References
- Shivam.Consulting Blog – Why Great Products Can Still Fail: The Leadership Lesson Every Team Needs
- Shivam.Consulting Blog – Why Supabase’s AI-Era Growth Playbook Should Reshape How We Build Platforms
July 3, 2026
How I Use Novus, the First Product Agent, to Turn Rapid Releases into Measurable Wins

In a world of relentless CI/CD and accelerating release trains, product leaders like me can’t afford lagging signals or fuzzy readouts on what’s truly moving the needle. I need immediate, trustworthy feedback that connects code shipped to outcomes achieved and customer value created.

Coding agents compress weeks of development into hours, but the faster your codebase changes, the harder it is to know what’s actually helping end-users.

That tension is exactly why I brought Novus into my product toolbox. To keep up with the pace of development, over 600 product teams are already using Novus, the first-of-its-kind product agent, to automatically set itself up, monitor product data, and tell you what to do next.

From my chair, that promise matters only if it translates into clear decisions. With Novus, I’ve been able to tighten the loop between experimentation and learning: it pairs eval-driven development with behavioral analytics and observability so I can see how a release influences activation, engagement, and retention—without spelunking through fragmented dashboards. The agentic AI backbone reduces the manual stitching I used to do across events, cohorts, and funnels, letting me focus on prioritization and product strategy instead of report wrangling.

Day to day, Novus fits naturally into our AI workflows. It surfaces anomalies early, clarifies trade-offs, and frames next-best actions in the language of outcomes. Because it plugs into a unified analytics platform approach, I can maintain continuous discovery at scale while preserving the rigor of Agent Analytics: hypotheses are explicit, telemetry is consistent, and results are traceable. That’s the operating cadence I expect from modern product management leadership.

If your roadmap moves faster than your learning loops, a product agent can be the missing link between speed and certainty. Novus helps me convert rapid releases into measurable wins, keeping the team aligned and confident about what to build next—and just as importantly, what to stop doing.

Inspired by this post on Pendo – Best Practices.

June 17, 2026
Salesforce to Acquire Fin for ~$3.6B: Powerful AI Synergy, Product Strategy Takeaways

I’m processing a milestone moment for SaaS, AI strategy, and product leadership. One statement captures the news with clarity: “We’re excited to share that we just signed an agreement for Salesforce to acquire Fin for ~$3.6B. The transaction is expected to close in the fourth quarter of Salesforce’s fiscal year 2027.” As a product leader, I see this as a high-conviction bet on agentic AI, Customer Agents, and CRM integration at massive scale.

The backstory matters, and it’s remarkable: “Fin started as Intercom 15 years ago. We changed our name to cap our transformation just weeks ago. We were a darling of the SaaS era and invented so many of the patterns you see in software today. Nearly four years ago, in need of a reboot, we jumped on weeks-old modern LLMs to create and define the category we know as Customer Agents today.” That arc—from SaaS pioneer to LLM-powered category creator—illustrates how bold pivots, shipped with urgency and clear product strategy, can reset the trajectory of a company and a market.

From a product management lens, this deal reinforces a few truths: category creation rewards those who move first with conviction; “reboots” succeed when they’re anchored in genuine customer value; and modern LLMs, applied through disciplined roadmapping and eval-driven development, can unlock step-change outcomes in customer support ai strategy and product-led growth. It also signals the rising centrality of agentic AI and operational AI workflows inside the CRM.

The leadership dimension is just as instructive. As the announcement framed it: “Salesforce invented modern software and SaaS. And Marc Benioff is like the final boss of tech founder CEOs. In seat for 27 years, he’s one of the last of his era. Still pushing, pivoting, placing big bets.” That ethos—placing big, principled bets while adapting the operating model—sets the tone for what sustained product management leadership looks like at scale.

Customer continuity and acceleration are clearly emphasized: “To our customers: Over the past few years we’ve been shipping intensely. Including recently our groundbreaking model, Apex, and our paradigm-defining internal agent, Operator. With the resources of Salesforce this will only accelerate. And yet little will practically change. I’ll still be CEO, Des will still be running R&D, we’ll both still be committed to continuing to lead this category. Thank you very sincerely and deeply for your belief in us.” For practitioners, the signal is strong: continued focus on shipping, sharper execution readiness, and tighter integration paths inside the Salesforce ecosystem.

Smiles, clinking glasses, and a roundtable toast in a cozy private room capture the energy of a big day—celebrating Salesforce's definitive agreement to acquire Fin and the teams joining forces for what's next.

There’s a human heartbeat here too: “While this is not the end, it is a major, pivotal, special, and emotional moment for us.” Moments like this remind me that building enduring products is equal parts craft and courage—powered by teams who commit to the long game, navigate uncertainty, and still ship relentlessly.

Strategically, I expect near-term priorities to center on secure data flow and governance, deep CRM integration, and unifying telemetry for Agent Analytics across channels. On the roadmap, I’d anticipate tighter alignment between LLM safety, retrieval-first pipelines, and enterprise-grade observability—plus thoughtful go-to-market strategy enabling sales-led growth to complement product-led growth. The real unlock comes when Customer Agents are natively orchestrated with Service, Sales, and Marketing workflows—measured with clear outcomes vs output OKRs and reinforced by robust knowledge management.

For fellow product leaders, the takeaways are actionable: define category boundaries with crisp value propositions; balance speed with governance; invest in eval-driven development and continuous discovery; and keep your product trios aligned around measurable customer outcomes. Above all, build the operating cadence—metrics, rituals, and talent—that lets you compound small wins into durable differentiation.

And I appreciate the spirit of this closing line: “And now, time to get back to work. See you at our next product launch in a few weeks. (:” That’s the mindset that turns a headline into execution: celebrate briefly, then ship the next proof point.

Inspired by this post on The Intercom Blog.

June 15, 2026
Crafting Beloved Tech Brands: My Moonshot Marketing Playbook for the Post-LLM Era

I spend a lot of my time asking a deceptively simple question: what does excellent marketing actually look like in 2026? From the vantage point of product leadership, the answer isn’t a spreadsheet or a channel plan—it’s a feeling. Beloved tech brands earn the benefit of the doubt, create gravity around their roadmap, and make customers proud to belong. That kind of momentum is not an accident; it’s a system.

Here’s the hard truth I’ve learned building and scaling products: giving teams different goals creates dysfunction. When brand, demand gen, product marketing, and comms run on fragmented OKRs, you manufacture internal headwinds. “Marketing is one engine – not separate pieces.” One strategy, one narrative, one set of outcomes—expressed through different craft disciplines and time horizons.

That unity of purpose clarifies executive roles, too. The real difference between an SVP and a CMO is scope and narrative ownership. A great CMO architects the whole system—portfolio allocation, brand architecture, integrated go-to-market strategy, and the bar for creative taste—while refusing to get dragged into decisions they should never be making (for example, approving every headline or micromanaging channel tactics). Leaders should decide the outcomes, standards, and constraints; teams should control the craft.

On portfolio design, I run marketing like a portfolio of moonshots. You need a healthy mix: proven programs that compound, emergent bets that learn fast, and a small set of true moonshots that can change the slope of the curve. The point isn’t bravado; it’s risk-balanced exploration. If everything ships safely, you’re under-investing in differentiation. If everything is a swing for the fences, you’re not building a repeatable growth engine.

This is where taste becomes a strategic advantage. “Ubiquity is the opposite of cool.” If you want to be beloved, you cannot treat every channel, audience, and moment as equal. Early on, selective distribution, distinctive creative codes, and tight community loops create status and meaning. Later, you scale without sanding off the edges that made the product special.

Why do a few companies build a flywheel of momentum while others stall? They align story, product, and distribution. The product earns trust, the narrative creates aspiration, and the go-to-market strategy ensures the right customers experience both at the right time. Then perception cycles kick in—the Silicon Valley clock turns—and irrational optimism or skepticism can amplify signals. The antidote is compounding proof: consistent product shipping, community advocacy, and creative that makes people care.

Scaling taste across an organization is teachable. I codify brand principles, narrative guardrails, and examples of “right” versus “almost right.” I replace abstract feedback with decision rubrics—what we keep, kill, or revise and why. I run recurring creative reviews with a small cross-functional council, so judgment compounds. Taste can’t be fully automated, but it can be operationalized: shared references, a story bible, and a high bar for craft that’s explicit, not mystical.

In a post-LLM world, the fundamentals haven’t changed—but the frontier has. Generative tools supercharge iteration and research, yet the artistry never really left. You still need a point of view, a tension worth resolving, and a value proposition that’s felt, not just stated. Can taste be encoded in software? Parts of it—pattern libraries, style constraints, data-driven feedback—absolutely. But the spark that makes work unforgettable remains human: judgment, risk tolerance, and the courage to ship something that might not fit the playbook.

That’s why telling an optimistic, yet realistic story about AI matters. Over-automation drains humanity; under-automation wastes potential. The best work pairs AI Strategy with craft leadership: LLMs for rapid exploration, humans for narrative decisions and ethical judgment. Your message should show how AI expands customer agency, not just efficiency.

The brand-versus-growth debate is a false choice. The right story accelerates pipeline, and the right demand programs reinforce the brand. Look at Apple’s discipline around product truth and design codes, or Google Chrome’s “The Web Is What You Make of It (Dear Sophie)” for proof that emotion and utility can co-exist. Notion, Pinterest, Square, HubSpot, and Harley-Davidson show how community, identity, and product-led growth interlock when the company knows exactly what it stands for.

When it comes to launches, I’ve learned that announcement videos full of humans, lack humanity. Overproduced gloss often dilutes the truth customers seek: what problem does this solve, how quickly can I feel the value, and why does it matter now? Real users, real context, and a crisp arc from problem to promise will outperform most theatrics.

Practically, I architect my week to protect taste and outcomes. Early-week for strategy, portfolio reviews, and cross-functional alignment; mid-week for deep creative and product marketing work; late-week for decision clears and postmortems. I time-box “disruptive energy”—space to chase non-obvious ideas—and I guard it like any critical meeting. Without protected cycles for exploration, the urgent will always suffocate the important.

If there’s a single takeaway: playbooks are obsolete, but the fundamentals are not. The channels change; the psychology doesn’t. Run one engine. Allocate a true portfolio. Scale taste with rigor. In the AI era, make people care. That’s how beloved tech brands are built—and how they endure.

June 4, 2026
Package Hack Wake-Up Call: My Playbook for Securing Cowork, Coding Agents, and Secrets

I love being a builder. It feels like a superpower I can’t stop using, and lately I’ve been channeling it into better workflows, faster experimentation, and sharper product thinking.

I tinker with my Claude Code workflows to make every day more effortless. I’m having a blast creating AI-generated interview snapshots and opportunity solution trees for Vistaly. I also spend time digging into traces and iterating on the AI coaches I use for our discovery courses.

Then the recent wave of malicious software spreading through the open-source community popped my bubble. It hit companies big and small—names like OpenAI, PostHog, and Zapier. As I dug in, I realized what many cybersecurity experts have long known: this is a deep rabbit hole. If I want to build responsibly, I have to get significantly better at protecting my devices, credentials, and code. And if you’re building with AI or modern tooling, you likely do, too.

Here’s why. We all rely on open-source software. Most modern applications assemble tried-and-true components—parsing a PDF, handling dates across time zones, visualizing spreadsheet data, connecting to an API—rather than reinventing them. The same is true for agent skills and MCP servers; they accelerate how we get value from models. This is overwhelmingly a good thing. But it also creates an attack surface that bad actors exploit.

We don’t need to abandon third-party code. We do need to understand the mechanisms attackers use and consistently defend against them.

When one malicious worm compromises hundreds of packages, what should dev teams do? This visual teaser maps the agenda—how it spreads, how to guard against it, AI tool risks, and concrete steps to mitigate.

On May 11th, I started seeing tweets about a TanStack hack. At that time, I didn’t know what TanStack was. But apparently, it’s a popular set of JavaScript libraries that are used by a lot of React sites. At first, I didn’t pay much attention. Then I learned the packages were compromised by a worm—malicious software that self-replicates—and it spread quickly. Within hours, dozens of packages were implicated; by day’s end, it was in the hundreds. That’s when I knew I had to lean in.

If you’ve explored safe development practices with coding agents before, you’ve seen the basics of package safety. A package is a bundle of reusable code shared through registries, and nearly every app you use depends on them. The unfortunate twist with this specific hack, known as the Mini Shai-Hulud worm, is that it shows prior “safe enough” heuristics aren’t sufficient. Popularity and trust signals don’t guarantee safety. We have to do more.

So here’s what I’ll cover today: how malicious software typically works, a practical framework for guarding against it, the specific risks of using Cowork to write and run code, and concrete steps to mitigate that risk. My goal is simple: help you keep building—despite the risks—while protecting your data and your business.

Quick disclaimer: I’m not a security expert. I’m sharing my personal journey and what I’ve learned through research and hands-on work. Please use your best judgment when applying any of this.

Package hacks share a simple playbook: get in, sweep for secrets, and phone home. This visual breaks down the 3 steps and flags new entry points—from packages to MCP servers, agent skills, and app extensions.

An agent recently scoured over 230,000 malicious software incidents and found that most malicious software follows a similar pattern. First, it needs an entry point onto your computer. Once installed, it scours your device for sensitive data, and then it uses your network connection to send that data to its own servers. The Mini Shai-Hulud worm spreads via malicious package install scripts that run at download time, then searches the device for credentials (including package publishing rights), poisons additional packages to continue replicating, and uses multiple channels—including the victim’s own GitHub public repos—to distribute secrets.

In practice, most attacks boil down to three steps: 1) It finds an entry point to your device. 2) It searches your device for sensitive data. 3) It sends that data to its own server. The good news: this pattern also tells us how to defend. We can harden entry points, minimize what code and agents can access, and constrain outgoing network traffic.

Keep in mind that install scripts aren’t the only entry vector. Any code that runs on your machine could contain malicious payloads: third-party packages, agent skills, MCP servers, browser or desktop extensions—the list is long. As coding agents and “vibe coding” tools become mainstream, more non-engineers are exposed to the same risks engineers have managed for years.

You might be at elevated risk if you do any of the following: you download and use third-party skills or MCP servers; you let Claude Code, Codex, or other coding agents write scripts that run locally and use third-party packages; you use an IDE like VS Code or Cursor with third-party extensions; or you install third-party extensions in tools like Obsidian. This isn’t an exhaustive list, but if any of these apply, it’s worth tightening your approach.

Relying on third-party code? This visual highlights four common risk zones—agent skills/MCP servers, coding agents, IDE extensions, and Obsidian plugins—and urges a review of downloads, local scripts, and add-ons.

The “safest” approach would be to avoid installing third-party software on your local device entirely. That’s not realistic. We all depend on third-party components in our stack. So I’ll start with one of the most common paths for non-engineers writing and running code today: Cowork.

Evaluating Cowork’s safety was eye-opening. Cowork offers meaningful protection—more than running code directly on your machine—but it isn’t bulletproof. There’s a notable gap you should understand.

Here’s how Cowork helps. It runs code inside a virtual machine, which isolates the execution environment from your real device—a quarantine room for code. While Cowork doesn’t fully control what comes into the room (that part is on you), if malicious code gets in, it’s contained and cannot reach the rest of your filesystem. Cowork also limits outbound network traffic from the virtual machine, which helps disrupt data exfiltration. However, it’s not foolproof.

Because Claude can install packages inside Cowork, it remains susceptible to malicious code like the Mini Shai-Hulud worm. And GitHub is on the allow list so Cowork can read and write to your repos. Since the Mini Shai-Hulud worm uses GitHub to publish secrets, this creates exposure. The crucial mitigation: if you never give Cowork access to sensitive data, there’s nothing for an attacker to steal.

A quick visual from a security deep dive on package hacks shows how Cowork handles threats: entry points are contained, data is only safe when kept outside, and network traffic is partly limited—making shared data the gap to watch.

Your responsibility is straightforward but critical: your data is only safe if it stays outside the virtual machine. When you mount folders into Cowork, those folders become accessible to any code running inside the VM. That includes malicious scripts. Before sharing, ask two questions: do the folders contain any credentials or secrets, and do they include proprietary data that would be harmful if accessed?

It’s common for code to need credentials. That’s why Cowork includes connectors to third-party sources like Google Drive and Slack. Credentials configured for these connectors never enter the VM—they remain outside the quarantine room—so they’re not exposed to malicious code. But if your code requires additional credentials inside the VM, scope them tightly and assume they could be compromised.

You can also use custom MCP servers you create yourself with Cowork. Those credentials stay outside the VM as well, provided the MCP servers are remote (hosted on a web server, not downloaded locally). It’s more work than dropping in a local server, but it keeps secrets out of reach from VM-executed code.

Beyond credentials, scrutinize the actual content you share with Cowork, including anything accessed through connectors. Least privilege is the rule: grant only what’s absolutely necessary for the task, and nothing more.

Amid a wave of package-supply attacks, this Product Talk visual launches a 3-part guide to safer AI building—starting with Cowork safety today, then Claude code config next week, and off-device development coming soon.

What about skills? Cowork supports skills, and you can add third-party skills inside the quarantine room. If you’re not placing your own data in that room, you can afford more risk. The moment you add sensitive or proprietary data, be selective. Skills can include third-party code, and bad actors use skill directories to distribute malicious payloads. Personally, I never use third-party skills as-is. If one looks useful, I read through the files, then ask Claude to recreate it so I understand what it does and maintain control. If I were to use third-party skills, I’d do it in Cowork and keep their data access to the minimum necessary.

Overall, Cowork is a solid, “safe-ish” option if you’re disciplined about what you share. The challenge is that utility often requires access to real data—exactly what we’re trying to protect. In an upcoming deep dive, I’ll outline strategies to keep malicious code out in the first place. While I’ll focus on local development, the same patterns can extend to Cowork with a bit of setup.

One more important clarification: don’t confuse Cowork with the Code tab in the Claude Desktop app. Cowork runs code inside a virtual machine. The Code tab does not. If you ask Claude to write and execute code from the Code tab, that code runs on your local device and you’re fully responsible for security. There is one exception: the Code tab can run code in Anthropic’s cloud; I’ll cover that approach when we get into moving development off the local machine.

To summarize Cowork’s protections against the attacker’s three-step pattern: installs and scripts still run, but they’re contained inside an isolated virtual machine instead of your real device; access to sensitive data is strongly limited to the specific folders you mount, leaving the rest of your filesystem (including unrelated credentials) out of reach; data exfiltration is partially constrained because Anthropic limits outbound network traffic from the VM—helpful, but not absolute. By contrast, local Code tab sessions offer no isolation, no filesystem restrictions, and no network limits—so any malicious install scripts run directly on your machine with full access and open egress.

My takeaways so far: I still love building with AI, but I’m doing it more cautiously. Cowork offers meaningful containment when used deliberately. I still prefer the flexibility of Claude Code, and I’ve reconfigured my setup to reduce risk. Even so, “safer” isn’t “safe,” which is why I’m increasingly shifting development off my local device to more controlled environments. I’ll share the practical details—tools, configs, and scripts—in the next installments.

If this perspective is useful, let me know. I want builders to move fast—and safely—through this new era of agentic AI. Until then, stay safe out there.

Inspired by this post on Product Talk.

June 3, 2026

An AI Operating Model That Measures Outcomes, Not Activity

Your AI team is shipping, dashboards are filling up, and executives are still asking the uncomfortable question: what changed for the customer or the business?

The answer is rarely another model metric. You need an operating model that connects AI quality to customer behavior, workflow performance, commercial results, and risk. When that chain is visible, you can decide what to scale, what to repair, and what to stop.

Key takeaways

Give every AI initiative an outcome contract that names the target behavior, business result, guardrails, and decision owner.
Measure four linked layers: AI quality, user behavior, workflow results, and business outcomes.
Preserve the context behind each interaction so you can compare outcomes by customer, workflow, model version, and acquisition path.
Run one recurring evidence review where teams make explicit scale, fix, hold, or stop decisions.
Use the first 90 days to prove a reusable learning system, not merely a functioning AI experience.

Start each initiative with an outcome contract

A feature brief tells a team what to build. An outcome contract tells it why the work exists, how evidence will be interpreted, and who can act on that evidence. It is the smallest practical unit of an outcome-led AI portfolio.

Write the contract before choosing a model or polishing a prompt. Keep it to one page and require six fields:

Target workflow: Name the repeated job being changed, such as resolving a support request or preparing a sales follow-up.
Target user behavior: Describe what a person should do differently. Adoption alone is weak; successful completion, accepted recommendations, or reduced rework is stronger.
Business outcome: Connect the behavior to retention, expansion, qualified demand, service capacity, or another commercial result.
Quality floor: Define the task-level evaluation the AI must pass before exposure expands.
Guardrails: Name the safety, privacy, latency, reliability, and cost conditions that must remain acceptable.
Decision rule: State what evidence will trigger a scale, fix, hold, or stop decision, and name the person accountable for making it.

A driver tree makes the logic inspectable. Start with the business result, work backward to the customer behavior that can influence it, then identify the product and AI capabilities that can change that behavior. This prevents a model improvement from being mistaken for business progress.

The contract also gives empowered teams useful boundaries. Leaders align the portfolio around outcomes and constraints; teams retain room to change prompts, retrieval methods, interaction design, or even the proposed solution. That is the practical connection between AI strategy, continuous discovery, evaluation, delivery, and value capture.

Build one scorecard across four layers

AI outcome analytics is not a single north-star metric. It is a chain of evidence. If you measure only the beginning of the chain, you learn whether the system produced an answer. If you measure only the end, you may see revenue move without knowing why.

Measurement layer	Question it answers	Useful examples	Typical decision
AI quality	Did the system perform the intended task?	Task success, groundedness, safety failures, response variance	Change prompts, context, retrieval, model, or fallback
User behavior	Did a person trust and use the result?	Acceptance, correction, abandonment, repeat use, human escalation	Change the interaction, explanation, or moment of assistance
Workflow outcome	Did the job become meaningfully better?	Successful completion, rework, cycle time, resolution quality	Expand, narrow, or redesign the workflow
Business and risk outcome	Did the change create durable value within constraints?	Retention, expansion, qualified leads, cost per successful outcome, incidents	Scale, repackage, hold, or stop

Read the layers from left to right. Good AI quality with weak behavior usually points to product design, trust, or workflow placement. Strong usage with no workflow improvement may indicate novelty rather than value. Workflow gains with poor economics mean the experience works but the architecture or packaging does not.

Use the workflow attempt as the basic unit of analysis whenever possible. A generic session can contain several unrelated intentions. A workflow attempt lets you connect the user request, retrieved context, model and prompt version, response, correction, completion, and downstream result.

Persist the properties needed to reconstruct that journey. Customer segment, acquisition context, workflow type, entitlement, experiment group, model version, retrieval version, and human-handoff status often matter more than another page-view event. Carrying critical context across visits lets you trace behavior from early exploration to conversion and expansion instead of losing the causal story at signup.

Keep the event taxonomy small enough to govern. Instrument decisions and state changes, not every interface movement. For each event, document its owner, trigger, required properties, prohibited sensitive data, and validation method. A dashboard built on ambiguous events creates confidence without clarity.

Run a weekly loop from evidence to decision

Analytics creates value only when it changes a decision. Give each AI initiative a recurring evidence review attended by the product trio and the engineering, data, risk, operations, or go-to-market partners needed for that workflow.

Check the contract. Reconfirm the target workflow, primary outcome, evaluation floor, and guardrails. If the goal has changed, update the contract before interpreting the data.
Inspect the scorecard. Review AI quality, behavior, workflow, business, risk, and cost in that order. Look for breaks in the chain rather than averaging them into one health score.
Segment the result. Compare the cohorts that could conceal a failure: new and experienced users, customer tiers, workflow types, channels, experiment groups, and system versions.
Review failure cases. Sample unsuccessful attempts and classify the reason: missing context, poor retrieval, incorrect generation, confusing interaction, policy restriction, latency, or a problem outside the AI system.
Make one portfolio decision. Choose scale, fix, hold, or stop. Record the evidence, owner, next test, and condition for revisiting the decision.

Do not let offline evaluations and online analytics compete. Offline evaluations test whether a candidate change can handle representative tasks and known edge cases. Online measures show whether the released experience changes real behavior under real conditions. A candidate should clear the evaluation floor before broader exposure, then earn expansion through customer and business evidence.

When you run an experiment, agree on the hypothesis, primary outcome, guardrails, minimum detectable effect, and stopping rule before looking at results. Feature flags and progressive rollout keep the decision reversible. If the result is ambiguous, improve the test or narrow the population; do not promote the most flattering proxy.

This rhythm makes learning rate operational. The useful question is not how many experiments ran. It is how many consequential uncertainties were resolved and converted into a product, portfolio, or go-to-market decision. Testable decisions, behavioral analytics, and guarded rollouts make speed credible because the evidence can survive scrutiny.

Assign decision rights before the dashboard turns red

AI products cross boundaries that ordinary feature teams can often ignore. Product owns the customer and business outcome. Engineering owns service behavior and remediation. Data or AI teams own evaluation integrity and model observability. Risk, security, legal, and operations own constraints that cannot be traded away informally.

Write those responsibilities into the operating model. For each risk tier, specify who can approve an initial release, expand exposure, pause the system, change a model or retrieval source, accept a temporary exception, and communicate an incident. A named decision owner is more useful than a large committee with shared accountability.

Governance should begin during discovery. The team can then choose acceptable data, design consent, build traceability, create fallbacks, and define escalation paths before those choices become expensive. Model cards, data records, evaluation results, release history, and incident decisions should form one audit trail rather than separate compliance paperwork.

The same principle applies to commercial decisions. Product, finance, sales, and customer success need a shared definition of value. Measure inference and support costs against successful workflow outcomes, not raw requests or tokens. Packaging can then reflect delivered value while protecting margins and avoiding incentives for wasteful usage.

Use simple decision tests:

Scale when the primary outcome improves, quality and safety floors hold, economics remain acceptable, and the result repeats in the intended cohorts.
Fix when the chain reveals a local weakness, such as adequate AI quality but low acceptance, or strong adoption but excessive rework.
Hold when the evidence is inconclusive, the measurement is unreliable, or a guardrail is close enough to its limit that broader exposure would create avoidable risk.
Stop when only proxy metrics improve, the target workflow does not change, or the value depends on manual intervention that cannot be sustained.

Use 90 days to prove the operating system

Your first 90 days should produce more than a working use case. They should leave behind a repeatable contract, event model, evaluation set, rollout path, governance record, and decision cadence that the next team can reuse.

Weeks 1–2: choose the workflow. Audit available content and data, map the highest-value repeatable workflows, and select one where behavior and business impact can be observed. Write the outcome contract and assign decision rights.
Weeks 3–4: define the evidence. Build the driver tree, establish the four-layer scorecard, create representative offline evaluations, classify risk, and document the release and stop conditions.
Weeks 5–8: build and instrument. Create the retrieval and prompt baseline, capture lineage and version context, validate events, implement observability, and test graceful fallbacks. Rehearse how the team will diagnose a failed attempt.
Weeks 9–12: release and learn. Ship behind a feature flag, begin with limited exposure, compare behavior and outcomes, inspect failure cohorts, and make explicit scale, fix, hold, or stop decisions.

At the end, ask for three forms of proof. Can the team explain which customer behavior changed? Can it connect that behavior to a workflow and business result without hand-waving? Can another team reuse the operating artifacts without rebuilding them from scratch?

If any answer is no, keep the rollout narrow and repair the system of learning. If all three are yes, fund the next workflow using the same operating model. The goal is not a larger collection of AI features. It is an organization that can turn uncertain AI capabilities into measurable outcomes, repeatedly and responsibly.

References

May 28, 2026

Speed-to-Lead Is Dead: How AI Agents End the Wait and Rebuild a High-Velocity Sales Org

A prospect lands on our site, skims pricing, watches a demo, and clicks “contact sales.” For years, that’s where momentum died. They waited, and we built entire sales motions around managing that delay.

We optimized for “speed-to-lead,” made it the hallmark of a high-performing sales development org, hired more SDRs, tuned routing rules, added shift coverage, and stared at response-time dashboards. Typical SLA targets were one hour for best-fit leads, four hours for core MQLs, forty-eight hours for everyone else. Those were considered good numbers.

No one questioned the premise because the lag felt structural—shift scheduling, routing delays, and humans working 9–5. The fastest teams could only shrink the gap; nobody could remove it.

An AI Agent closes it completely.

When a prospect arrives today, the conversation can begin immediately. That single change reshapes how I design a sales org—how we staff it, what our team prioritizes, and the metrics we hold ourselves accountable for.

Step outside our dashboards and look at the buyer experience. We spend heavily to drive traffic, then push visitors into forms and queues that add friction precisely when purchase intent peaks.

Intent is highest the moment someone seeks out our product. If an SDR follows up two or three hours later, that buyer’s in another meeting, the urgency has faded, and the moment is gone. We still call it a lead; the buyer has already moved on.

What AI changes

Agents eliminate the structural constraints that made speed-to-lead a problem—shift scheduling, routing delays, CRM batch processing, the SDR being on another call. None of it applies anymore because every single lead can be engaged immediately, at any hour and in any language.

The impact goes beyond response time. When an Agent engages at peak intent, qualification, discovery, and even an initial demo moment can unfold in a single, continuous conversation. The gated funnel collapses. There’s no reason to qualify someone today, schedule discovery for Thursday, and demo the following week when the conversation is already happening.

The constraint the industry built around simply isn’t there anymore. We’re already seeing it with Fin, a Customer Agent. As sales leaders, we need to frame this differently.

If speed-to-lead is no longer the constraint, the knock-on effects reach every part of the org.

Introduce Fin for Sales to your team with this clean hero banner: bold headline, signature blue spiral, and a clear 'Start free trial' call to action—inviting readers to explore an AI customer agent built for revenue.

SDRs focus on moving deals forward. Instead of frontline triage, they double down on phone-based selling and relationship building, complex deal navigation, and multi-threaded engagement across stakeholders—the high-leverage work that used to get crowded out by the inbox.

Pipeline gets more relevant. The old model rewarded volume: capture as many form fills as possible, respond fast, and sort quality later. When an Agent engages at the moment of intent, it qualifies during the conversation. Low-fit leads get filtered out before they reach the team, and high-fit prospects arrive with context—needs, timeline, stakeholders—instead of just a name and email.

You measure outcomes, not response time. When first response is instant, different metrics matter. I anchor on three questions:

1) Is the Agent doing the work? Completion rate, qualification rate, and contact capture rate indicate whether conversations reach clear outcomes and produce usable handoffs to the team.

2) Is the work producing pipeline? Meetings booked and pipeline created through Agent-handled conversations are the leading indicators of revenue, not how fast someone followed up.

3) Are buyers having a good experience? Conversation-level satisfaction matters more than ever because the Agent is the first interaction prospects have with your company. The experience it delivers is the first impression you make.

These three questions reveal whether the motion is working. Time-to-first-response can’t.

Sales orgs built hiring plans, workflows, and performance metrics around beating intent decay. That made sense when the lag was unavoidable. It isn’t anymore.

An Agent is always on. It engages the moment a prospect arrives on your site, qualifies them in real time, and routes them to the right outcome without waiting for someone to be free. The lag the industry built itself around doesn’t exist when the conversation starts immediately.

The companies leaning into this are investing in what happens after the conversation starts: how well the Agent qualifies, where it creates pipeline, and what SDRs should actually spend time on. What matters now is not how fast you respond, but what the conversation produces.

Speed-to-lead made sense when the delay was structural. It isn’t anymore. If you’re re-architecting go-to-market, instrument Agent Analytics, revisit SDR charters, and tighten CRM integration so every qualified handoff is instant, traceable, and revenue-linked.

Inspired by this post on The Intercom Blog.

May 26, 2026
Beyond Accuracy: How I Evaluate AI Customer Service Agents That Delight and Scale
When teams evaluate AI Agent options for customer service, I often see the rigor aimed at the wrong subset of criteria. After leading and observing dozens of proof of concept (POC) efforts with our customers and prospects, I understand why performance—accuracy scores, resolution rates, and benchmark tests on curated datasets—soaks up most of the attention. But those indicators alone won’t guarantee success once you leave the sandbox and face real customers.

If your POC only proves that the AI “works,” you’re missing the bigger picture. Here’s what else I look for to make the best long-term decision.

How does it handle your real-world setup?

Performance is table stakes, but it has to reflect the messiness of an actual support environment. The best-performing Agents don’t just get answers right—they exhibit resilient, human-like behavior under pressure. I watch how the Agent behaves when it doesn’t know an answer: does it recover or spiral? Does it stay on track through multi-step requests, and how gracefully does it hand off to human agents? If your knowledge base depends on a retrieval-first pipeline, test cross-source retrieval and grounding—not just single-document lookups.

When I build evaluation scenarios, I put the Agent through its paces with a broad, realistic mix:
- Multi-turn queries that require the Agent to carry context across a conversation, not just answer isolated questions.
- Vague or fragmented inputs, like typos, grammatical errors, and incomplete questions, because that’s how customers actually write.
- Edge cases and sensitive scenarios, like billing disputes, frustrated customers, and questions that sit at the boundary of what the Agent is trained on.
- Different phrasings of the same question. An Agent that handles one version well but fails on a rephrasing has a knowledge problem, not a performance problem.
- Queries that require pulling from multiple knowledge sources. Real issues are rarely answered by a single help article, and an Agent that can only handle single-source questions will hit a ceiling fast.
- Multilingual conversations, if your customer base requires it. Performance can vary significantly across languages and it’s better to discover that in testing than in production.
This preparation is worth the effort. Any Agent can look impressive in a demo; what matters is how it holds up as part of your team, serving your customers in production.

What does it feel like to interact with the Agent?

Two AI Agents can post the same quantitative scores—resolution rates, containment rate, and more—and still deliver very different customer experiences. Resolution rate tells me whether the Agent finishes conversations; it says nothing about how customers felt during them. I deliberately assess the experience, not just the outcome, because conversation design shapes trust and brand perception.

Here’s what I look for to ensure the AI Agent is enjoyable to interact with:
- Is the tone natural and on-brand, or does it feel robotic and generic?
- Does it build trust early in the conversation, or does it create friction that makes customers want to immediately request a human?
- When it doesn’t know the answer, does it handle that gracefully?
- When it hands off to a human, is that transition seamless, or does the customer feel abandoned?
As George Dilthey at Clay put it when evaluating their AI setup: “Keep what’s important to your business up front and center. For us, that was transparency and control over the customer experience.”

That framing is exactly right. The Agent represents your brand in every conversation. Customers don’t experience “accuracy,” they experience conversations. An Agent that’s technically accurate but tonally off-brand will erode customer trust over time.

I make the experience dimension explicit in my POCs. I have people on my team—and when possible, a small cohort of real customers—interact with the Agent under realistic conditions. Then I ask how it felt, not just whether it worked.

Can you keep improving it after launch?

This is the dimension most teams don’t evaluate at all, and it’s possibly the most important one. Choosing an Agent that works today and ensures you can continuously improve the customer experience over time requires more than a functional demo. You’re buying a system that must get better every week, not just during the first sprint.

The feedback loop

Can your team easily review conversations and identify where the Agent is underperforming? Can you pinpoint specific gaps (missing knowledge, incorrect tone, poor handoff decisions) and act on them quickly? The faster the loop between “something isn’t working” and “we’ve fixed it,” the more value compounds over time. In practice, that means instrumenting conversations, leveraging Agent Analytics, tagging misroutes and tone slips, and running targeted evals on known failure modes.

The speed of iteration

When you identify a gap, how quickly can you address it? This is partly a question of tooling (how easy is it to update knowledge, refine guidance, adjust behavior?) and partly a question of team capability. The teams getting the most out of AI are the ones that have changed how they operate and made continuous improvement a part of their everyday work. They’ve committed to going all-in for the long term, not just the first few weeks when launching their AI Agent. We treat this as eval-driven development: automate evaluations that mirror real tickets, tighten prompt engineering and retrieval settings, and ship small fixes daily.

The vendor partnership

The vendor behind the Agent matters just as much as the solution itself. You’re choosing a partner for transformation that will help you evolve how your business delivers customer experience. Ask:
- How does customer feedback influence the product roadmap, and can they show you examples?
- If you have feedback on limitations or weaknesses, do they engage transparently or get defensive?
- What kind of support will you get post-launch?
- Are they shaping where AI customer experience is going, or reacting to what others are building?
How a vendor responds to those questions tells you more about the long-term relationship than any benchmark result.

What a good POC proves

If your POC only proves “the AI works,” you haven’t done enough. A strong proof of concept tests performance in realistic conditions, evaluates the experience from the customer’s perspective, and validates the system that will support continuous improvement after launch. Done well, it sets you up for long-term operational success and builds organizational AI readiness—not just a flashy demo.

Inspired by this post on The Intercom Blog.
May 22, 2026
Old-School Selling Beats PLG in the AI Era: My GTM Playbook for 8‑Day Enterprise Deals

Old-school, in-person selling is having a renaissance in the AI era, and I’ve seen why up close. From leading product and go-to-market teams through hypergrowth, I keep returning to one lesson: enterprise buyers still reward the teams who show up, orchestrate change management, and own outcomes end-to-end. The tech has changed; the human dynamics haven’t.
Has the sales playbook changed in the AI era? The tools are faster and the surface area is bigger, but the core motion remains the same: “showing up” beats letting the marketplace decide. That’s why in-person enterprise rollouts still beat product-led motions, especially when the stakes include security, governance, and cross-functional adoption. You win by reducing organizational risk, not by assuming free trials will do the heavy lifting.
Great enterprise sellers collapse silos. They sell to engineers and executives in one motion, pairing deeply technical validation with crisp business narratives. In my org, that means every high-velocity pilot has a dual thread: hands-on, eval-driven proof for the builders and a value architecture for the budget owners. When those motions run in parallel, time-to-value plummets and procurement friction fades.
Selling to AI-native buyers who grew up on ChatGPT changes tempo, not fundamentals. The same seller, different tempo: 8 weeks vs. 8 business days. These buyers evaluate fast, expect clear ROI, and push for automation-first workflows. How AI-native buyers handle build vs. buy decisions comes down to build for differentiation and buy for acceleration. If you make procurement feel like product—frictionless, instrumented, and transparent—you’ll meet their bar.
Process matters, but humanity wins. Building a robust sales process that still leaves room for unscripted moments is where trust is formed. I’ll never forget the story of the rep who taught a champion’s son guitar over Zoom—an unscripted moment that cemented a partnership. The lesson: raise the floor without capping the ceiling. Equip every rep with repeatable plays, then celebrate the creative instincts that make champions out of customers.
In early GTM, why the three highest-leverage early sales hires aren’t sellers at all resonates with my experience. I prioritize a solutions engineer who can de-risk integration, a forward-deployed operator who can run the first rollout like a product manager, and a customer success lead who designs adoption paths from day zero. Together, they compress the value journey from proof to production.
Compensation design shapes your talent market. The case for outsized commission accelerators for star sellers — and the kind of person they attract is real: magnets for competitors who close complex, multi-threaded deals and thrive with ownership. But beware: why too much process narrows the kind of seller you attract. Over-script it and you filter out the very people who can navigate ambiguity with customers.
Under the hood, instrumenting the funnel from stage zero to close keeps the system honest. I track intent signals before pipeline, conversion by persona and use case, proof milestones, and time-to-value in production. The three pillars of GTM excellence for me are repeatable discovery, referenceable outcomes, and relentless enablement. And inside the leadership team, building peers who are 80% aligned, not 100% preserves healthy tension while keeping execution fast.
AI is expanding the definition of enablement—whether AI is changing what good enablement looks like isn’t a theoretical question anymore. I see world-class teams arming reps with retrieval-first knowledge bases, sandbox environments, and objection libraries that evolve weekly. Meanwhile, selling against direct and implied competitors at once is the norm: your battlecard must cover “do nothing,” internal tools, adjacent categories, and new AI entrants—while you still remember why in-person enterprise rollouts still beat product-led motions for durable adoption.
Planning horizons tighten in AI markets. How far out should a GTM leader be planning? I work a dual cadence: a rolling 6-week operating plan that’s ruthlessly tactical and a 2–3 quarter roadmap for coverage, enablement, and category storytelling. What a normal week looks like in hypergrowth blends customer time, pipeline triage, onboarding and enablement, deal engineering, and process tuning—always with one or two high-conviction bets that could bend the curve.
References: Ahead: https://www.ahead.com; Amazon: https://www.amazon.com; Anthropic: https://www.anthropic.com; Attio: https://www.attio.com; Augment Code: https://www.augmentcode.com/; Cognition: https://cognition.ai; Cursor: https://cursor.com; Dani McCabe: https://www.linkedin.com/in/danielle-mccabe/; Datadog: https://www.datadoghq.com; GitHub Copilot: https://github.com/features/copilot; HubSpot: https://www.hubspot.com; Jeremy Powers: https://www.linkedin.com/in/jeremypowers/; JPMorgan: https://www.jpmorgan.com; Matt McClernan: https://www.linkedin.com/in/mattmcclernan/; MongoDB: https://www.mongodb.com; Nicole Rettinger: https://www.linkedin.com/in/nicole-rettinger-23b20465/; Notion: https://www.notion.com; OpenAI: https://openai.com; Parag Agrawal: https://www.linkedin.com/in/paragagr/; Parallel: https://parallel.ai; Snowflake: https://www.snowflake.com; University of Chicago: https://www.uchicago.edu; Windsurf: https://windsurf.com
If you’re scaling an AI product today, pair a disciplined sales-led growth engine with the best of product-led growth: fast paths to proof, hands-on validation for builders, executive-level value mapping, and human moments that turn customers into advocates. That’s how you compress an eight-week cycle into five business days—and keep the expansion flywheel spinning.

May 21, 2026
Built for Your Biggest Days: How We Engineer Fair, Reliable Scale Without Downtime

I’m getting sharper, more specific questions about scale from enterprise customers every quarter, and that’s exactly how it should be. Teams want to know how our platform behaves during their highest-volume moments — the Black Friday sales, the sporting events, the production incidents — and they want confidence their growth won’t outpace the systems they depend on. We welcome those questions. They’re the right ones to ask of any critical component of your business. Today, our systems handle serious scale. At daily peak, we see over 150,000 customer requests per second coming into the platform, with more than 70,000 asynchronous requests per second flowing through the background systems. During our busiest days of the week, we handle over five million conversations and more than 100 million comments being added across the platform. We also design for individual customer spikes, not just aggregate platform traffic. We can handle a single customer workspace spiking with hundreds of comments per second, or around 100 new conversations per second. Sustained over a full day, that would map to millions of conversations from a single customer. While those numbers matter, they age quickly. Every growing software company can publish a bigger number every year, month, week. What ultimately matters is whether the architecture has clear scaling levers, whether we understand the pressure points in the system, and whether we can add capacity before customers need it. Every system has limits. Competence is knowing where they are, measuring them, and moving them before customers reach them. Here’s how we do that in practice. We build on boring foundations because at the edges, we try hard not to be clever. We use AWS for the infrastructure primitives AWS is very good at running. We do not want our engineers spending their best energy recreating S3, load balancers, queues, or commodity infrastructure patterns. We want that energy spent on the parts of the system that are specific to our customers and our product. “That is a deliberate trade-off. It gives us fewer systems to understand, deeper expertise in the ones we do run, and more leverage when we need to scale.” This extends a principle I’ve embraced for years: run less software. The point isn’t to minimize the stack for its own sake; it’s to compound expertise. When many teams build on the same small set of technologies, our tooling, observability, and operational practice all improve together. Boring technology choices aren’t a lack of ambition — they reserve our ambition for the nuanced scaling challenges that matter. The source of truth is the hard part. You can scale stateless web traffic by adding machines, add queue consumers, and add cache. Those are real problems — just not the hardest ones. The source-of-truth database is where the most important data lives, where the hardest correctness guarantees exist, and where maintenance windows often come from. It has to be correct, fast, resilient to failover, capable of large migrations, and able to keep serving traffic while we improve it. As customers grow, it cannot require a full re-architecture every time the next ceiling appears. That is why we moved to Vitess, managed by PlanetScale. The goals were clear: improve availability, reduce operational complexity, make large table migrations safer, simplify MySQL scaling, and eliminate customer downtime from routine database maintenance and failovers. When we first laid out this direction, the largest part of the migration was still ahead of us. We completed that migration in 2025, and the benefits are now part of how we operate the platform day to day. Today, our highest-scale source-of-truth data is spread across 128 shards. The database layer handles around two million requests per second, with more than ten million cache reads per second in front of it. For the largest customers, we can isolate and scale database capacity independently, including dedicating a shard to a single customer when needed. We have not come close to needing that, which is significant. The goal of architecture like this is not to run every system at the edge of its capacity, but rather to have room to move before customers need it. Vitess gives us native sharding, query routing, online schema change capabilities, connection pooling, and resharding primitives built for this kind of workload. Instead of application code carrying all of the sharding complexity, the database layer can do more of the work. That reduces cognitive load for engineers and removes whole classes of operational risk. Ultimately, this gives us practical scaling options instead of hard architectural rewrites, and lets us do routine database improvement without planned customer-impacting maintenance windows. Search is not a hidden bottleneck for us. Search underpins core product surfaces across the platform — from vector search in our AI features to realtime reporting — and if it’s slow or unhealthy, customers feel it. Scaling isn’t just adding more machines; often the better approach is making the product do less unnecessary work. Today, our Elasticsearch clusters support a much higher-throughput product than in the past, with more than 650TB of storage, more than 1.7 trillion documents, and peaks above 40,000 requests per second. We’re serving a larger product surface more efficiently, not just running a bigger cluster. More importantly, when an index gets too large or traffic distribution turns unhealthy, we don’t want a high-risk, manual migration. We reshape Elasticsearch indexes online by partitioning by customer ID, dual-writing to old and new indexes, backfilling, validating, gradually moving customers with feature flags, and deleting the old index only when we’re confident. We’ve used this pattern for years to make large search migrations safer and more incremental — a core playbook in our platform scalability and SRE practices. Fairness is non-negotiable in a multi-tenant system. A single customer’s high-volume moment should not quietly become everyone else’s latency problem. We design for this at multiple layers. For asynchronous work, we use overflow queues and queueing strategies that prevent one high-volume workload from consuming shared capacity in a way that hurts quieter tenants. AWS SQS fair queues are one example of a primitive we use extensively. They’re designed for exactly this class of problem. When one tenant creates a backlog in a shared queue, fair queues help reduce the dwell-time impact on other tenants. We also build application-level guardrails so customer isolation doesn’t depend on every engineer remembering every rule in every code path. In a large multi-tenant Rails application, the safe path must be built into the system. The focus is primarily about correctness and customer data separation, but the broader operating principle is the same: important customer boundaries should be enforced by infrastructure and application frameworks. The same thinking applies to scale. We want customer-specific load to be visible, attributable, and controlled. When a customer spike happens, we should be able to understand it as that customer’s workload, protect the rest of the platform, and add capacity where it’s actually needed. Fin adds a new dimension to scaling. Our AI Agent Fin introduces a new set of infrastructure challenges. To provide reliable AI-powered support at scale, we need to operate across multiple model providers, route across them based on capacity and latency, and protect customer-facing workloads from lower-priority work. The details differ from traditional SaaS infrastructure, but the principle is the same: understand the bottlenecks, build clear scaling levers, and monitor the customer outcome. AI providers are not commodity storage systems, and we do not design as if they are. That is why we have invested in Fin-specific reliability systems. Fin now fully resolves over two million conversations per week. At that scale, high availability cannot depend on a single model, a single provider, a single region, or a single pool of capacity. Our LLM routing layer supports cross-vendor failover, cross-model failover, latency-based routing, capacity isolation, and load testing. We also maintain buffer capacity with major providers, with headroom to handle 2x to 3x normal Fin traffic at any point. For enterprise customers, this matters because AI support volume can spike just like human support volume — and the AI layer must absorb that spike without relying on one fragile upstream path. When customers depend on Fin to absorb a spike in support demand, the AI layer needs the same operational discipline as the rest of the platform. Performance tests help, but production traffic is reality. Real customers use products in ways no synthetic test will perfectly predict: launches, incidents, seasonal patterns, gaming events, sudden changes in end-user behavior. Those moments give us data that no lab can fully reproduce. Often, a large customer event barely moves the platform-wide graphs because our customer base is broad enough that one industry’s peak aligns with another’s quiet period. Black Friday and Cyber Monday are good examples. Many ecommerce customers are at their busiest, while many B2B SaaS customers are quieter. At the aggregate platform level, the change can be much less dramatic than people expect. “That does not mean those events are unimportant. It means we need to look at both levels: the health of the overall platform and the experience of the individual customer having the spike.” Sometimes, these events teach us something specific. In one case, a very large customer used the Messenger in a way that exercised the full Messenger lifecycle even though the visible user experience did not require it. Under normal traffic, this was fine. During a major customer-side incident, their users refreshed aggressively, generating a much larger burst of Messenger traffic than the integration actually needed. The platform stayed available, but the event exposed unnecessary work in that integration path. We built a lighter-weight integration path that served the customer’s actual use case with far less work per request, making future spikes easier to absorb. We treat large customer events this way even when there’s no broad customer impact. They’re opportunities to understand real scaling properties and make the next event safer — a habit that anchors our incident management, observability, and FinOps practices. Scale is also an operating model. The infrastructure matters, but it’s not enough. You can have the right database architecture and still hurt customers if you detect issues late, recover slowly, communicate poorly, or fail to learn from incidents. “That is why our operating model starts with customer outcomes. If the customer cannot do the job they came to do, the system is unhealthy. It does not matter how many dashboards are green.” Heartbeat metrics tell us whether customers can do the core jobs they hire us to do. They cut through infrastructure noise and answer the question that matters most during an incident: are customers able to use the product successfully? This shapes how we ship. Today, we average around 250 ships to production per workday, with an average merge-to-production time under 10 minutes. That isn’t a vanity metric — it’s part of the safety model. Smaller changes are easier to understand, easier to observe, and easier to roll back. Feature flags let us separate deployment from release. Automatic rollback and heartbeat-driven detection help us recover quickly when a change hurts customers. These are the very DORA metrics we hold ourselves to in order to balance CI/CD speed with stability. “Fast shipping is not the opposite of reliability. Done properly, it is one of the ways you stay in control of change.” The bar is high. Engineers are expected to understand the impact of their changes, watch them go live, and act quickly if something looks wrong. Resuming service is not the end of an incident. We expect teams to understand the root cause, fix the contributing systems, and prevent recurrence. That’s how scale stays safe over time. Scheduled maintenance should be extraordinary. Historically, database maintenance was a main reason for maintenance windows: upgrading a database, changing instance sizes, performing failovers, or moving large tables could require customer-impacting downtime. With the move to Vitess and PlanetScale, we changed what routine database improvement looks like. We can upgrade, scale, and improve critical database infrastructure without turning that work into planned customer-impacting downtime — and we do this in practice, not just as a goal. This matters because customers rely on our platform for live operations. If their support team, Messenger, Help Desk, or AI Agent is unavailable, the impact is immediate. Scheduled maintenance cannot be treated as a casual operational convenience. “Our posture is simple: routine infrastructure improvement should not require planned customer-impacting downtime.” Scheduled maintenance should be exceptional, non-routine, clearly communicated, and minimized in frequency, duration, and customer impact. That’s the practical benefit of the architecture work: better scaling is not only about handling more traffic, but also reducing the operational moments that might inconvenience customers. What this means for customers is simple: be skeptical of vague scale claims. The question isn’t whether a vendor says they can scale — it’s whether they can explain how, where the limits are, what they measure, how they recover, and what they’ve changed after learning from production. We understand the scaling properties of our systems, have clear levers to add capacity at the right layers, design for customer isolation and fairness, monitor customer outcomes directly, and use real production events to make the next one safer. Scale is never finished. Every large customer event, traffic spike, migration, and incident teaches us something about the real behavior of the system — and we use that data to keep improving. That’s what you should expect from a platform you depend on during your busiest moments.

Inspired by this post on The Intercom Blog.

May 19, 2026
Unlocking AI Agents: The Real Barrier Is Readiness—Not Capability—Here’s How to Scale

There’s a question that runs underneath every AI Agent evaluation: what can it do?

Two years ago, that was the right question to ask because Agents were limited and capability was a genuine constraint. The gap between what organizations needed and what the technology could deliver was wide. I felt that gap acutely in early pilots—plenty of ambition, not enough dependable execution.

That gap has since narrowed considerably, and yet most organizations are running their Agents well below what’s technically possible. I see teams lean on answering and routing, but stop short of looking things up, taking actions, or resolving complex, multi-step problems—especially where data, process variance, or risk come into play.

The standard explanation is that AI isn’t good enough yet—models must improve, or vendors must ship more features. But after studying organizations across industries actively expanding their AI automation, I’ve found that this explanation holds up less often than people assume. The blockers tend to be elsewhere.

The teams I’ve observed weren’t primarily constrained by what their AI could do; they were constrained by what their organization was structured to let it do. In other words, the ceiling wasn’t the Agent’s capability—it was organizational readiness, governance, and risk tolerance.

“Readiness” for AI breaks into five distinct types, and most organizations have some but not all of them. Below is how I assess them with product, operations, and engineering leaders.

Content readiness is whether you can explain your product and policies clearly and consistently. Most companies can. In practice, that means up-to-date knowledge bases, unified policy language, and clear versions that Agents can cite and apply.

Scope readiness is whether you’ve defined the edges: when should AI engage, and when should it step aside? Edge cases multiply, intent varies by customer segment, sensitive topics surface mid-conversation, but most teams can work through this with effort. Clear guardrails reduce ambiguity and shrink risk.

Procedural readiness is where things start to get harder. This is about whether you can articulate your processes clearly enough for something other than a human with years of tacit knowledge to follow. The happy path is rarely the problem. It’s the failure paths, decision branches, variations that have never been written down because they’ve always lived in someone’s head.

Data readiness is the first real cliff. Can you reliably identify the right user, account, or object at the moment a decision needs to be made? Is the data trustworthy in real time? Are the APIs stable, accessible, and actually connected? For most organizations, the honest answer is “partially, but we’re not always sure when it breaks.”

Execution readiness is the highest bar. Not just technically (can the Agent make the change?) but organizationally. Who owns it when the wrong refund gets processed? Who detects it? Who recovers? Does someone with authority actually accept the risk?

Most companies have the first two, some have the third, fewer have the fourth and fifth. When I map this with teams, we often discover that their Agent’s ceiling is really a reflection of operational maturity and data plumbing, not model quality.

We studied companies across six industries – energy, healthcare, ecommerce, gaming, financial services, property management – all trying to expand what their Agents could do. The pattern was consistent: teams set out to automate real actions—looking up account status, processing changes, handling transactions. In most cases, the AI could technically do it, but at a certain point (somewhere between guiding a user through a process and looking something up on their behalf) they hit a wall.

One team tried to automate application changes but couldn’t reliably identify which application to modify across their internal systems. Another explored billing automation but couldn’t access live account data due to regulatory constraints. A third needed to verify status across third-party vendor systems their Agent couldn’t reliably reach. I’ve seen similar constraints surface around CRM integration, data governance, and vendor SLAs—none of which are model issues.

In most cases, the team redesigned around what their infrastructure could support. They moved toward guiding—walking users through processes step by step, rather than executing changes on their behalf. It worked, it resolved conversations and delivered real value, just differently than anyone planned. In customer support, this often looks like consultative flows that shorten time-to-resolution even without direct writes.

Most Agent evaluations are built around capability. Can it handle complex queries? Does it support multiple channels? Can it integrate with our systems? These are reasonable things to evaluate for, but they produce a capability score, and that doesn’t tell you whether your organization can actually use what you’re buying.

The teams that got to deeper automation, the ones executing actions early, didn’t have “better AI,” they had more standardized operations. Actions that were already well-defined, consistently applied, and exposed through stable systems with clear rules. Automation wasn’t inventing new behavior, it was triggering actions that were already tightly controlled elsewhere.

Readiness enables capability, not the other way around. Which reframes the evaluation question from “can the AI do this?” to “are we actually ready for it to?”

Something that gets lost in most conversations about AI readiness is that organizations are often further along than they assume, just not for the kind of work they were planning for. A team that set out to automate refunds but can reliably guide users through complex troubleshooting has genuine capability deployed. They’re operating at the level their readiness supports, which is a starting point, not a deficit.

The more useful frame isn’t “are we ready?” – it’s “what are we ready for, and what specifically stands between here and the next level?” The gaps tend to be concrete: a missing API, data that lives in three systems that don’t agree, a process that’s never been documented, or an ownership question nobody has answered. These are solvable problems. They just require a different kind of investment than buying a more capable Agent.

What nobody has worked through seriously yet is how organizations actually build readiness. Does it develop naturally through using AI at shallower levels first? Or is it mostly a function of prior decisions, like system architecture choices made years ago, operational maturity that accumulated over time, engineering investments that have nothing to do with AI? When readiness does increase, what actually changes? Does the support team develop it? Does engineering grant it? Does it require executive sponsorship and investment in infrastructure with no obvious AI label on it?

In my experience, progress comes from a joint effort: product to define scope and guardrails, operations to codify procedures and edge cases, engineering to harden APIs and observability, and leadership to underwrite risk with clear ownership. When those pieces align, agentic AI moves from guided assistance to safe, auditable execution.

Until there are clearer answers, the pattern is likely to continue. Companies will buy capable Agents, plan ambitious rollouts, and find that the harder work is building the organizational infrastructure. The Agents can do the work. The question is what it takes to let them.

Inspired by this post on The Intercom Blog.

May 18, 2026

Tag: AI Strategy

Product strength can conceal organizational weakness

Durable platforms reduce uncertainty at every stage

Leadership durability comes from connected operating loops

AI readiness should amplify a durable foundation

Key takeaways

References

Key takeaways

Start each initiative with an outcome contract

Build one scorecard across four layers

Run a weekly loop from evidence to decision

Assign decision rights before the dashboard turns red

Use 90 days to prove the operating system

References