Category: Generative AI

How Snapbar Turned Crisis Into an AI-Native Photo Experience Revolution

What does it take to reinvent a 14-year-old company, not once, but twice? I ask that question often when I look at mature product organizations, because the hardest transformations rarely start with a clean slate. They start with real customers, legacy expectations, operational muscle memory, and a market that suddenly refuses to behave the way it used to.

Snapbar is a useful case study in that kind of transformation. The company began as a wedding photo booth side hustle, grew into a national events company, and then watched COVID wipe out the entire business overnight. As a product leader, I find that moment especially important because it separates teams that are attached to the current expression of their product from teams that understand the deeper customer need underneath it.

The deeper need was never just a physical photo booth. It was identity, participation, memory, brand engagement, and a shareable experience that people could take with them. When in-person events disappeared, Snapbar went from physical photo booths to a cross-platform virtual product built on WebRTC in spring 2020. That was not a cosmetic pivot. It was a first-principles rebuild under pressure.

I have seen many teams talk about innovation when conditions are favorable. Snapbar’s story is more interesting because the team had to innovate when the existing business model was unavailable. That kind of constraint can be clarifying. It forces product teams to ask: What job are we really doing for customers, and what parts of our current solution are merely historical artifacts?

The next reinvention came from generative AI. Pushed by declining repeat business, Snapbar dove deep into Stable Diffusion, custom LoRA fine-tunes on H100/H200 GPUs, and eventually a reasoning-model-powered generative image and video pipeline. What stands out to me is not simply that the team adopted gen ai. It is that they connected AI capabilities to a domain they already understood deeply: photography, events, brand activations, and experiential marketing.

This distinction matters. In product management, technology FOMO can lead teams to bolt AI onto workflows without a clear strategic advantage. Snapbar appears to have moved differently. They used 14 years of industry knowledge to identify where AI could change the experience itself, not just automate a back-office task or generate a novelty output.

The product evolution is a strong example of applied AI. Snapbar integrated Stable Diffusion 1.5 as their first generative AI model and ran custom LoRA fine-tunes on H100/H200 GPUs to produce brand-quality outputs nobody else in their space could match. That level of execution shows the difference between experimenting with a model and building a differentiated product system around it.

I also appreciate the way the team moved from negative prompts to reasoning model long-form prompts. In brand environments, creative control and safety control are not optional. A brand activation must feel imaginative, but it also has to remain on-message, inclusive, and predictable enough for a live event setting. Better prompt engineering becomes part of the product’s trust layer.

One of the most important product details is the meta-prompting pre-processing pipeline designed to ensure user likenesses, including non-obvious details like disabilities, are accurately represented in generated images. That is not a minor implementation detail. It reflects a more mature view of AI risk management, representation, and customer experience.

From my perspective, this is where product strategy and ethical technology intersect. Generative AI systems can easily flatten people into generic outputs. A thoughtful product team has to decide what fidelity means, what consent means, and how much control users and brands should have over the final artifact. Snapbar’s approach suggests that representation is not just a model-quality problem; it is a product-design problem.

Just Now Possible spotlights Snapbar’s journey from photo booths to AI-powered brand experiences, framing reinvention, creativity, and applied AI as the center of the conversation.

The company’s experiential marketing platform lets brands “world build” at conferences, trade shows, and live events by bringing fans into branded creative worlds. That phrase matters because it reframes the photo booth from a capture device into a participatory brand system. The user is not merely photographed. The user becomes part of a designed world.

I see this as a broader shift in product experience. Static brand impressions are giving way to co-created moments. Snapbar added participatory user inputs through Mad Lib-style prompts and prompt injection, turning photo experiences into co-creation moments between brands and their audiences. That is a more durable engagement loop than simply asking someone to pose in front of a branded backdrop.

The operational story is just as relevant for product and engineering leaders. Snapbar used Claude Code and Codex to build and ship features rapidly as a small bootstrap team, and developed a four-pillar agent orchestration framework: context, tools, verification, and workflows. I like that framing because it treats AI-assisted development as a system of work, not a magic shortcut.

In my own product leadership work, I keep coming back to the same lesson: AI workflows only become reliable when the team defines the surrounding operating model. Context determines whether the agent understands the problem. Tools determine what it can actually do. Verification determines whether the output is trustworthy. Workflows determine whether the capability compounds across the organization.

Snapbar is now building customer-facing “vibe coding” using the Claude Agent SDK so brands can configure and create experiences themselves within Snapbar’s platform. That is a meaningful product move. It shifts creation closer to the customer while keeping the workflow inside a controlled product environment. For brand teams, that could reduce dependency on custom service work while still preserving creative flexibility.

This is the kind of AI Strategy I find most compelling: not a generic claim that AI will transform everything, but a specific path from domain expertise to product capability to customer empowerment. Snapbar did not abandon its past. It converted years of event, photography, and brand knowledge into a new interface for generative AI.

The core lesson for product teams is clear. Reinvention does not always mean discarding the original business. Sometimes it means identifying the durable customer need, rebuilding the delivery mechanism, and then using new technology to expand what the experience can become. Snapbar’s journey from wedding photo booths to virtual WebRTC experiences to AI world building shows how a team can preserve its market intuition while changing nearly everything about the product surface.

For product leaders evaluating gen ai opportunities, I would take three practical lessons from this story. First, start with the customer experience, not the model. Second, treat brand safety, representation, and verification as product requirements from the beginning. Third, use agentic AI internally only when the team has a clear framework for context, tools, verification, and workflows.

Snapbar’s story resonates because it is not about chasing a trend. It is about a team using necessity, curiosity-led self-education, and disciplined product thinking to build something that feels native to the generative AI era. That is the difference between adopting AI and becoming AI-native.

Inspired by this post on Product Talk.

July 9, 2026
Reliable AI Coding Requires Four Kinds of Control
Reliable AI coding is not primarily a matter of finding a better prompt or a more capable model. It is a workflow-design problem: teams must control what the product should do, what the repository currently does, what the model can see, and what the agent is allowed to change.

Managing those four kinds of state turns an AI coding session from an open-ended conversation into a bounded engineering process. The payoff is faster iteration without treating plausible output, confident status messages, or large context windows as substitutes for evidence.

Reliability depends on the surrounding system

A large language model generates an answer token by token from the input available to it. That input can include more than the visible request: an application may add system instructions, conversation history, project files, enabled tools, skills, and other supporting context. As Shivam.Consulting Blog’s guide to how ChatGPT works explains, the surrounding application therefore helps shape the result even when two products use the same underlying model.

This mechanism has an important operational consequence. An agent can produce code that looks convincing without possessing a stable model of the intended product, the complete repository, or the runtime environment. Fluency indicates that the output fits learned patterns; it does not establish that the implementation satisfies the requirement.

A dependable workflow consequently controls four connected states. Product state covers requirements, constraints, permissions, edge cases, and acceptance criteria. Repository state covers the actual code, data model, dependencies, tests, and uncommitted changes. Model state covers the instructions and evidence present in the context window. Execution state covers tools, filesystem access, commands, network activity, and other permissions. A failure in any one can appear to be a coding error even when the code is not the original cause.

Tool selection should reflect that distinction. Shivam.Consulting Blog’s vibe-coding playbook recommends managed app builders when the purpose is to explore an interaction or answer an early product question, while positioning developer-oriented coding agents as more appropriate for existing repositories, multi-file changes, tests, and review workflows. The useful dividing line is not whether a tool can generate code. It is whether the environment exposes enough control and evidence for the consequence of the change.

Convert product intent into a bounded change contract

Many unreliable sessions begin before an agent edits a file. If the requested behavior, non-goals, affected users, data rules, and observable success conditions remain ambiguous, the model must fill the gaps. Each follow-up correction can then preserve a different assumption, creating a chain of locally plausible patches without a coherent final design.

A stronger starting point is a compact change contract written outside the chat. It should identify the outcome, relevant current behavior, permitted scope, important invariants, expected edge cases, and the evidence that will demonstrate completion. For a defect, that evidence begins with a reproducible failing case. For a feature, it includes examples of accepted and rejected behavior. The contract should also record explicit non-goals so that an agent does not broaden a narrow request while attempting to be helpful.

Blast radius deserves separate attention. The vibe-coding playbook uses data, controller, and view as a practical three-layer model. A request involving permissions, sorting, filtering, workflow state, or reporting may cross all three even if it appears in the interface as a small change. Reviewing the planned impact across storage, logic, and presentation helps reveal missing migrations, inconsistent validation, stale queries, and user-interface states before implementation begins.

The same source proposes separate plan-review-fix and implement-review-fix loops. Combined with the change contract, these become distinct gates rather than one continuous conversation. The plan gate asks whether the proposed files, layers, and tests match the requirement. The implementation gate asks whether the resulting diff and observed behavior match the approved plan. Separating the gates makes it easier to reject a mistaken approach before it accumulates code.

This structure also clarifies the human role. The agent can explore the repository, propose a plan, implement a bounded change, and help investigate failures. Product and engineering owners remain responsible for deciding what behavior is correct, which tradeoffs are acceptable, and what evidence is sufficient to ship.

Treat context as a limited working set, not permanent memory

A long conversation can feel comprehensive while becoming less dependable. Shivam.Consulting Blog’s context-rot analysis reports research showing that model performance can deteriorate as input length grows and that information at different positions may receive unequal attention. The article’s practical conclusion is more useful than any advertised context-window maximum: available capacity should not be confused with reliable attention.

Context should therefore be curated as a task-specific working set. Durable facts belong in versioned project documents; the active session should receive only the instructions, files, decisions, and evidence needed for the current change. Old tool output, abandoned plans, duplicate explanations, and superseded requirements consume attention without improving the task.

Shivam.Consulting Blog’s guide to Claude Code workflows describes a layered memory pattern: broad preferences in global instructions, project-specific conventions in repository-level files, and reference material loaded when relevant. It also presents stored commands as a way to make recurring procedures explicit, and sub-agents as a way to isolate context or perform independent work. The transferable principle is architectural rather than product-specific: stable policy, project knowledge, task instructions, and transient evidence should not be mixed into one ever-growing transcript.

A clean session boundary can be a reliability control. When a conversation has accumulated contradictory instructions or repeated failed fixes, the next step should not automatically be another patch request. A new session can begin from a short handoff containing the approved change contract, current repository state, attempted approaches, observed failures, and unresolved questions. This preserves useful evidence without carrying the entire history forward.

Sub-agents require the same discipline. Parallelism is valuable when work can be partitioned into independent questions, such as locating relevant code, examining tests, or reviewing a proposed diff. It is less useful when several agents can modify overlapping files or make incompatible architectural assumptions. Each delegated task needs a narrow scope, an expected output, and a rule for whether it may write or only report.

Require evidence, limited authority, and a recovery path

An agent’s statement that a problem is fixed is a claim to verify, not completion evidence. Verification should return to the original reproducer or acceptance criteria, then examine the diff and run the smallest relevant checks. Broader tests can follow when the change crosses modules, alters shared behavior, or affects data. This sequence distinguishes a real correction from a patch that merely changes the visible symptom.

Review should inspect both behavior and change shape. A diff may pass a narrow test while introducing unrelated refactoring, weakening validation, swallowing errors, or duplicating logic. Unexpected file changes, new dependencies, disabled checks, and unusually broad edits are signals to pause. If the evidence is inconclusive, the workflow should return to diagnosis rather than asking the same context-saturated agent to keep editing.

Reliability also depends on limiting what an agent can do. Shivam.Consulting Blog’s Claude Code risk guide describes escalating exposure as an agent moves from reading a project folder to reading elsewhere, fetching external material, writing files, executing generated code, and installing third-party packages or extensions. Although permission models vary by product, the general control is consistent: grant the least authority required for the current step and review the exact path or command before approval.

Folder boundaries should match the task boundary. Credentials, customer information, confidential documents, and unrelated projects should not be placed within an agent’s working scope. One-time approval is preferable when an operation is unusual or its future use would be difficult to predict. Commands that delete, overwrite, upload, install, or execute deserve more scrutiny than read-only inspection because their impact is larger or harder to reverse.

Reversibility completes the control system. The safety guide emphasizes backups and version control because an AI coding interface may not provide a dependable undo operation. A clean checkpoint before implementation, small commits, reviewable diffs, protected secrets, and a tested rollback path reduce the cost of both model errors and human approval mistakes. For higher-risk work, the agent should operate in a disposable branch, isolated environment, or similarly constrained workspace rather than directly against valuable state.

These safeguards are mutually reinforcing. A bounded contract limits scope; curated context reduces instruction drift; verification exposes incorrect claims; least privilege limits blast radius; and version control makes recovery practical. Removing any one of them shifts too much trust onto probabilistic output.

Key takeaways
- Control product state, repository state, model context, and execution authority as separate parts of one workflow.
- Write a change contract with scope, non-goals, invariants, edge cases, and acceptance evidence before implementation.
- Keep context task-specific; store durable knowledge in files and start a clean session when history becomes contradictory or noisy.
- Treat an agent’s completion report as a hypothesis until the original reproducer, relevant tests, observed behavior, and diff support it.
- Match permissions and isolation to the risk of the operation, and create a recovery point before allowing changes.
As coding agents gain more tools and autonomy, reliable teams will distinguish themselves less by how much work they delegate than by how clearly they define authority, evidence, and recovery. The durable advantage will come from workflows in which faster generation is paired with tighter control.

References
July 3, 2026
How I Make AI Agents Speak Like Our Team: A Conversation Design Playbook That Lifts CSAT

If nobody on our team trains the Agent on how to communicate, it will sound like an LLM when it speaks to customers—because it is one. I never want a customer to feel like they’re talking to a machine that doesn’t get them. That’s why I treat conversation design as a core product capability, not an afterthought.

Conversation design is an emerging discipline in AI-first support teams built to solve this exact problem. In practice, I make someone explicitly own how the Agent communicates—tone, structure, level of detail, customer experience, and the handoff and escalation process—because that’s where trust is won or lost.

When there’s no clear owner and no explicit guidance, the Agent starts making its own choices. I’ve seen it over-explain when a short answer would do, reply in a flat tone when a customer is frustrated, or trigger a handoff too late. None of those are model problems; they’re design problems.

The cost is measurable. Customers who get awkwardly structured responses won’t trust the answer—even when it’s accurate—so they escalate to a human to hear the same thing phrased differently. Others will skip the Agent entirely. And when the Agent does hand off, a poor transition means the support rep inherits a frustrated customer. Every one of these outcomes is avoidable; conversation design exists to prevent them.

I’ve seen A/B tests where a warmer, more conversational opening message meaningfully lifted customer satisfaction—CSAT moved from 72.8% to 78.4%. A single design change, applied to the very first message, drove a measurable difference. That’s the kind of leverage I look for as a product leader.

Here’s the scope I use when I talk about conversation design—five areas that shape the customer experience end to end:

1) Tone and personality: Define the Agent’s voice, level of detail, and how formal or casual it should sound—and specify where that register adapts to the situation (for example, urgent access issues versus exploratory product questions).

Design how your AI agent talks. Set tone, style, and product naming rules, then preview replies instantly. Clear callouts showcase brand voice consistency and flexible formatting so your bot communicates like your team.

2) Response structure: Ensure the Agent matches the level of detail to the customer’s request, keeping answers tight when the ask is simple and expanding only when complexity demands it.

3) Handoff logic: Decide when to escalate, how to communicate the transition, and what context to carry over so the human teammate can help immediately without rework.

4) Interaction flow: Map how a conversation progresses—clarifying questions, answers, resolution, or handoff—and design for smooth pivots when customers change direction.

5) Response quality: Go beyond technical correctness to ensure answers feel clear, helpful, and on-brand. Accuracy without clarity erodes trust.

To put this into practice, I start with the feel of the conversation. Before tuning individual responses, I write down one tight paragraph describing the Agent’s voice. I don’t need a full brand bible—just a north star I can use to make consistent decisions about tone. The voice stays consistent, while the register adapts to the context: a locked-out customer needs directness and speed; a feature explorer might value more context and examples.

I design the handoff with extreme care because it’s one of the highest-friction moments. Customers shouldn’t have to re-explain anything. The support rep should receive the full conversation history, the underlying context, what the Agent already tried, and why the escalation happened. Even the phrasing matters—“Let me connect you with a teammate who can help with this” feels very different from a silent handover.

The new CX Score adds context to every conversation: a donut chart surfaces drivers like policy feedback and effort, while a side panel explains why this interaction earned a 3 based on signals from an AI agent chat.

I also build a failsafe. If the Agent can’t resolve the issue cleanly, a graceful fallback still gives the customer a smooth experience. A customer might be frustrated with AI at that point, but a well-handled transition can turn that around.

Follow-ups deserve the same rigor as handoffs. If someone drops mid-conversation—with the Agent or a human—how do we reach back out to confirm they got what they needed? Most teams miss this moment; customers don’t.

Another common pitfall is over-explaining. The Agent has access to a lot of information, and left unguided, it will overshare. The fix is simple: match the answer’s depth to the question. A password reset shouldn’t take three paragraphs; a complex integration might. When there’s more to offer, the Agent should ask before expanding.

I also design for the conversation the customer is actually having—not the script I wish they’d follow. Customers change direction, stack questions, or bring up unrelated follow-ups. The Agent should pivot with them, not force them back into a rigid flow. I also consider whether flows vary by channel and whether different segments merit distinct experiences.

On the instruction side, I keep guidance short. Teams often react to edge cases by adding more rules until the LLM is parsing paragraphs before it can reply. I’ve seen it everywhere. My rule: if it’s about content or information, it belongs in the knowledge base. If it’s about tone or handling specific situations, it belongs in the Agent’s instructions. “Be direct about pricing” does more than a paragraph explaining the philosophy behind your pricing communication strategy.

If you’re using Fin, much of this work happens in Guidance. It’s where conversation design takes shape, helping you define how the Agent should sound, how much it should say, and how it should respond in different situations.

On a crisp grid, 'Blueprint' appears as editable vector paths, underscoring a methodical plan. The image promotes the AI Agent Blueprint—a framework to launch and scale customer service automation with confidence.

Most teams won’t hire a dedicated conversation designer on day one—that’s fine. But someone still needs to own the Agent’s communication, even if it’s part of an existing role. I’ve often seen this start within support operations or knowledge management. As the Agent scales to more conversations, the responsibility becomes formal—and eventually becomes a dedicated role.

Here’s how I’d start, step by step:

1) Name an owner. Make accountability explicit; it doesn’t have to be a new hire.

2) Pick one conversation type that isn’t landing well. Look for cases where the Agent answered correctly but the customer still escalated or left negative feedback. If you’re using Fin, CX Score can help you surface these; it shows which topics and conversation types are scoring poorly and why, so you can see whether the issue is answer quality, customer effort, or something else.

3) Audit the Agent’s instructions. If they’ve grown beyond a few focused rules, trim them. Move content into the knowledge base and keep instructions focused on behavior.

4) Fix your worst handoff. Review a handful of conversations that escalated. Did the customer have to repeat themselves? Did the rep have enough context? Redesign that single transition first.

The impact of these small improvements compounds. A warmer opening can lift CSAT, trimming instructions makes responses sharper, and a better handoff prevents reps from inheriting frustrated customers. None of this requires new knowledge—just someone paying close attention to the conversation itself and designing it with intention.

Inspired by this post on The Intercom Blog.

June 18, 2026
Salesforce to Acquire Fin for ~$3.6B: Powerful AI Synergy, Product Strategy Takeaways

I’m processing a milestone moment for SaaS, AI strategy, and product leadership. One statement captures the news with clarity: “We’re excited to share that we just signed an agreement for Salesforce to acquire Fin for ~$3.6B. The transaction is expected to close in the fourth quarter of Salesforce’s fiscal year 2027.” As a product leader, I see this as a high-conviction bet on agentic AI, Customer Agents, and CRM integration at massive scale.

The backstory matters, and it’s remarkable: “Fin started as Intercom 15 years ago. We changed our name to cap our transformation just weeks ago. We were a darling of the SaaS era and invented so many of the patterns you see in software today. Nearly four years ago, in need of a reboot, we jumped on weeks-old modern LLMs to create and define the category we know as Customer Agents today.” That arc—from SaaS pioneer to LLM-powered category creator—illustrates how bold pivots, shipped with urgency and clear product strategy, can reset the trajectory of a company and a market.

From a product management lens, this deal reinforces a few truths: category creation rewards those who move first with conviction; “reboots” succeed when they’re anchored in genuine customer value; and modern LLMs, applied through disciplined roadmapping and eval-driven development, can unlock step-change outcomes in customer support ai strategy and product-led growth. It also signals the rising centrality of agentic AI and operational AI workflows inside the CRM.

The leadership dimension is just as instructive. As the announcement framed it: “Salesforce invented modern software and SaaS. And Marc Benioff is like the final boss of tech founder CEOs. In seat for 27 years, he’s one of the last of his era. Still pushing, pivoting, placing big bets.” That ethos—placing big, principled bets while adapting the operating model—sets the tone for what sustained product management leadership looks like at scale.

Customer continuity and acceleration are clearly emphasized: “To our customers: Over the past few years we’ve been shipping intensely. Including recently our groundbreaking model, Apex, and our paradigm-defining internal agent, Operator. With the resources of Salesforce this will only accelerate. And yet little will practically change. I’ll still be CEO, Des will still be running R&D, we’ll both still be committed to continuing to lead this category. Thank you very sincerely and deeply for your belief in us.” For practitioners, the signal is strong: continued focus on shipping, sharper execution readiness, and tighter integration paths inside the Salesforce ecosystem.

Smiles, clinking glasses, and a roundtable toast in a cozy private room capture the energy of a big day—celebrating Salesforce's definitive agreement to acquire Fin and the teams joining forces for what's next.

There’s a human heartbeat here too: “While this is not the end, it is a major, pivotal, special, and emotional moment for us.” Moments like this remind me that building enduring products is equal parts craft and courage—powered by teams who commit to the long game, navigate uncertainty, and still ship relentlessly.

Strategically, I expect near-term priorities to center on secure data flow and governance, deep CRM integration, and unifying telemetry for Agent Analytics across channels. On the roadmap, I’d anticipate tighter alignment between LLM safety, retrieval-first pipelines, and enterprise-grade observability—plus thoughtful go-to-market strategy enabling sales-led growth to complement product-led growth. The real unlock comes when Customer Agents are natively orchestrated with Service, Sales, and Marketing workflows—measured with clear outcomes vs output OKRs and reinforced by robust knowledge management.

For fellow product leaders, the takeaways are actionable: define category boundaries with crisp value propositions; balance speed with governance; invest in eval-driven development and continuous discovery; and keep your product trios aligned around measurable customer outcomes. Above all, build the operating cadence—metrics, rituals, and talent—that lets you compound small wins into durable differentiation.

And I appreciate the spirit of this closing line: “And now, time to get back to work. See you at our next product launch in a few weeks. (:” That’s the mindset that turns a headline into execution: celebrate briefly, then ship the next proof point.

Inspired by this post on The Intercom Blog.

June 15, 2026
Claude Code for Product Managers: Accelerate Prototypes, Validate Faster, Ship with Confidence

I build products under constant pressure to learn faster without breaking trust. Claude Code has become a pragmatic addition to my AI product toolbox because it helps me move from idea to evidence with less friction—while keeping engineering, design, and compliance in the loop.

“Claude Code for Product Managers explained: what it is, why it matters, and how it helps PMs prototype, validate, and move faster.” That line captures the essence. In practice, I use it to turn ambiguous problem statements into tangible artifacts—API stubs, SQL queries, test data, and lightweight prototypes—that sharpen conversation and accelerate decision cycles.

What is it in PM terms? A code-aware assistant that helps me prototype safely and quickly. I can generate example API calls, transform messy CSVs for retention analysis, draft instrumentation plans for Amplitude analytics, or spin up a mock service to validate an integration. Because it understands structure, it’s effective at scaffolding small utilities (e.g., a data cleaner or a CLI harness) that make discovery and validation faster.

Day to day, Claude Code reduces handoffs. If I’m exploring a new partner integration, I’ll have it produce a curl library and a Postman collection, then annotate each step with acceptance criteria and expected responses. When I’m shaping a feature, I lean on it to outline event taxonomies and feature flags so that engineering can wire telemetry without guesswork. For insights work, I’ll ask it to propose SQL for cohort, funnel, and retention analysis—always verifying against source schemas before anything touches production.

Speed is only useful when it improves signal quality. I anchor the workflow in continuous discovery: small hypotheses, thin-slice prototypes, and fast instrumentation. Claude Code helps me estimate A/B testing readiness (including minimum detectable effect), generate smoke tests for critical user paths, and structure an eval-driven development loop so we learn from every iteration. It also supports context window management by summarizing long PRDs into the few constraints a prototype must respect.

Governance matters. I apply AI readiness and AI risk management principles: never paste secrets or PII, isolate sandboxes, and log prompts as docs-as-code for auditability. I prefer a retrieval-first pipeline that feeds approved product docs, OpenAPI specs, and design tokens so generations stay grounded. When tools are integrated, I favor the Model Context Protocol (MCP) to constrain capabilities and maintain least-privilege access. Human-in-the-loop review is non-negotiable—especially for anything that might influence customer data or pricing.

The best outcomes show up in product trios. I’ll facilitate a live session with design and engineering: we co-create prompts, compare alternatives, and converge on a thin slice we can ship. That collaboration keeps us empowered, reduces interpretation drift, and turns Claude Code into an accelerant rather than a sidecar. Over time, the trio curates a reusable prompt library for PRD outlines, experiment checklists, and integration playbooks.

Getting started is straightforward: define a safe environment, assemble your authoritative corpus (requirements, specs, taxonomies), and codify a few high-value templates—API exploration, instrumentation plans, sandbox data generators, and acceptance tests. Track impact with simple, objective metrics: cycle time from hypothesis to instrumented prototype, time-to-first-signal, and the proportion of decisions made with data versus opinion.

There are pitfalls. Hallucinated fields can creep into API calls, schema drift can break generated queries, and “clever” refactors may miss edge cases. I mitigate this by grounding generations in current specs, asking for unit tests alongside any code, and validating against a staging environment before anyone talks about production. Treat Claude Code as a collaborator, not an oracle.

If your mandate is to learn faster, de-risk bets, and ship with confidence, Claude Code is worth adopting. Used thoughtfully, it compresses the distance between questions and answers, elevates product discovery, and lets teams validate more ideas with fewer meetings—without compromising on governance or quality.

Inspired by this post on Product School.

June 12, 2026
Beyond Black‑Box Scores: Custom AI That Elevates Trust & Safety Without Burnout

What do you do when off-the-shelf moderation scores aren't good enough—and the alternative is paying human contractors to spend their days reviewing traumatizing content at scale? I’ve wrestled with that exact trade-off in enterprise environments, and it’s why I was eager to unpack how custom AI can raise the bar on trust and safety without compromising accuracy, latency, or the well-being of our teams.

In this episode of Just Now Possible, I sit down with Nikki Marinsek (Data Scientist), Brian McCaffrey (Software Engineer), and Dan Means (Machine Learning Engineer) from Musubi, an AI-native trust and safety toolkit for content platforms. Musubi builds custom-trained ML models and LLM-powered moderation tools that adapt to each platform's unique policies—from dating apps to social networks to AI inference endpoints. As a product leader, I’m drawn to their blend of eval-driven development, agentic AI, and pragmatic deployment pipelines that actually meet real-world SLAs.

We walk through their full journey—starting with a first prototype on tabular data—then discovering the system was sometimes catching issues human moderators missed. That insight became a forcing function to formalize evaluation, calibrate thresholds, and design feedback loops that help humans and models converge. Just as importantly, they built a policy optimizer that uses agentic flows so non-technical trust and safety teams can iterate on LLM moderation policies without needing a data scientist in the room.

If you’ve ever had to balance latency, accuracy, and cost at scale, you’ll appreciate how Musubi tests trade-offs across traditional ML, embedding-driven classification, and LLMs. Their approach mirrors the patterns I expect in high-throughput stacks: cache and pre-compute where possible, contain worst-case latencies, and push evaluation tooling to customers so policy changes are safe, observable, and fast to deploy.

What resonated most with me is their core product strategy: put eval tools directly in customers’ hands. When teams can benchmark AI against humans, referee disagreements using “LLM as judge,” and make policy gaps visible, trust increases and operational drift decreases. That’s the foundation for durable product strategy in sensitive domains like content moderation, fraud management, and risk scoring.

Listen to this episode on: Spotify | Apple Podcasts

Guests: Nikki Marinsek, Data Scientist, Musubi; Brian McCaffrey, Software Engineer, Musubi; Dan Means, Machine Learning Engineer, Musubi.

In this episode: Why off-the-shelf moderation scores fail and how custom-trained models fix that; How Musubi combines traditional ML with LLMs for different moderation tasks; The discovery that AI can outperform human moderators—and how to communicate that to clients; Using AI as a judge to referee disagreements between AI and human decisions; How Musubi onboards new customers with "reverse demos"; What custom model training actually means: fine-tuning, feature engineering, and reusable deployment pipelines; The policy optimizer: an agentic flow that helps customers iterate on their LLM moderation policies; Why pushing eval tools directly to customers is a core product strategy; How Musubi is building flexible orchestration workflows for non-technical trust and safety teams.

From a product management lens, a few highlights stand out. First, the disciplined separation of concerns: use traditional ML for high-precision, low-latency pattern detection and LLMs for nuanced policy interpretation. Second, invest in golden sets and policy loops early so you can quantify improvement and avoid subjective debates. Third, productize customization—create reusable deployment pipelines, parameterized policies, and self-serve evaluation—so each customer’s “custom model” still scales like a platform.

I also appreciated the onboarding tactic of "reverse demos." Rather than a canned walkthrough, the team invites customers to bring real policies and edge cases, then instruments the workflow live. That move builds credibility, accelerates discovery, and surfaces the fastest paths to value—an approach I recommend whenever you’re selling complex AI workflows to non-technical stakeholders.

If you’re navigating cost and latency trade-offs, the conversation goes deep on techniques like embedding-driven classification, fine-tuning vs. training, and when to route decisions through LLM adjudication. My takeaway: treat the router, the evaluator, and the policy as first-class products. When those elements are observable and testable, you can raise quality without exploding compute costs or creating operational bottlenecks.

Resources & Links: Musubi — AI-powered trust and safety toolkit for content platforms. Maven AI Evals Course — AI evals course.

Chapters: 00:00 Meet the Team; 01:18 Why Everyone Wears Product; 02:32 What Musubi Builds; 04:51 AI for Human Moderation; 09:59 Adversaries and Asymmetry; 11:48 Early Days and Low Latency; 13:35 First Prototype Slice; 15:33 Traditional ML Meets LLMs; 19:52 Benchmarking Against Humans; 23:09 LLM as Judge and Policy Gaps; 29:53 From Prototype to Platform; 31:15 Customer Onboarding Reverse Demos; 36:08 Custom Models Per Customer; 38:05 Fine Tuning vs Training; 39:14 Embedding Driven Classification; 40:04 Cost and Latency Tradeoffs; 43:21 Productizing Customization; 49:16 Scaling Prototypes to Production; 51:58 Golden Sets and Policy Loops; 56:17 Coaching Customers Safely; 01:02:06 Gamified Feedback Signals; 01:06:19 Agentic Toolkit Roadmap; 01:09:05 Workflow Orchestration Future; 01:12:06 Wrap Up and Thanks.

Ultimately, this is a playbook for modern trust and safety: align your models to your policies, make evals a habit not an event, and empower non-technical teams with agentic workflows and transparent metrics. That’s how we move beyond black-box scores to systems we can measure, manage, and trust.

Inspired by this post on Product Talk.

June 11, 2026
A Layered Playbook for Package Supply Chain Security
Package supply chain security is not simply a matter of choosing reputable libraries. The practical challenge is controlling an expanding dependency graph, the code that executes during installation, the resources that installed software can reach, and the automated tools allowed to make those decisions.

A useful defensive model follows the path an attack must take: enter through a package or dependency, execute in the development environment, discover valuable information, and transmit it elsewhere. Organizing safeguards around that sequence produces a stronger posture than relying on any single scanner, sandbox, or package reputation signal.

Package risk grows through the dependency graph

Developers usually evaluate the packages they select directly. The less visible risk lies in transitive dependencies: packages installed because another dependency requires them. The source article illustrates the scale of this effect by reporting that installing Jest brought in 266 packages. That example is not evidence that those dependencies were malicious; it shows how one deliberate choice can create hundreds of additional trust relationships.

This changes the unit of review. The relevant question is not only whether a named package appears legitimate, but whether its complete dependency graph is proportionate to the job. A small utility that introduces unfamiliar native modules, unrelated capabilities, or an unexpectedly broad tree deserves more scrutiny than its simple interface might suggest.

Manifests such as package.json, pyproject.toml, and requirements.txt make dependency installation repeatable. Repeatability alone, however, does not guarantee safety. If version ranges or unresolved transitive dependencies allow later releases to enter automatically, two installations based on the same manifest can produce different risk profiles. Pinning direct and transitive versions converts an evolving external graph into a more deliberate, reviewable input.

Match defenses to the stages of a package attack

The source article says an analysis covering more than 230,000 malicious-code incidents found a recurring pattern: malicious code first needs an entry point, then searches the device for sensitive data, and finally uses a network connection to exfiltrate what it finds. This reported pattern suggests three distinct control points.

Reduce risky entry and automatic execution

A waiting period for newly published packages can reduce exposure to releases that have not yet attracted community scrutiny. The article recommends installing only packages that are at least seven days old. That is a risk filter, not a guarantee: an older malicious package can remain undetected, while a legitimate urgent fix may occasionally justify an exception.

Installation scripts require separate treatment because they may execute before a developer has inspected the installed code. Disabling automatic install hooks by default creates a decision point. A package that depends on a post-install action can still be used, but the script, its purpose, and the capabilities it invokes should be reviewed first.

Constrain access after installation

Pre-install review cannot catch every problem. The next layer limits what package code can inspect or modify if it does execute. Sandboxed folders and isolated development environments can reduce the blast radius, but the source cautions that isolation by itself does not prevent malicious code from entering. Access boundaries therefore complement package controls rather than replace them.

Limit unnecessary network egress

Stolen information has less value to an attacker if malicious code cannot transmit it. Restricting unnecessary outbound connectivity addresses the final stage of the reported pattern. This layer matters because a package may evade provenance review and execute inside an environment despite earlier controls. Entry controls, resource boundaries, and egress restrictions together create independent opportunities to interrupt the attack.

Provenance is a decision process, not a trust badge

No single popularity or identity signal proves that a release is safe. The source proposes evaluating maintainer history, download patterns, repository activity, signed releases, and consistency across registries. Their value comes from comparison: a sudden change in maintainership, an unusual release pattern, or a mismatch between repository and registry information may warrant investigation even when each signal looks plausible in isolation.

Context also matters. Dependency behavior should be compared with the package’s stated purpose. A capability that is normal for a database driver may be difficult to justify in a formatting utility. This purpose-to-capability test helps teams focus limited review time on anomalies rather than treating every dependency as equally suspicious.

These checks work best when they lead to a clear disposition: approve the package and lock the reviewed version, replace it with a narrower dependency, inspect it more deeply, or decline it. Provenance information without a decision rule can become documentation that does not change behavior.

AI coding agents must inherit the same installation policy

AI-assisted development introduces a governance problem as much as a technical one. A coding agent may be able to select and install a package while pursuing a larger task, compressing several human decisions into one automated action. If it can also reach broad areas of the file system and use the network, a malicious dependency may encounter a larger potential blast radius.

The source describes workflows in which Claude searches, creates, and edits files across a broad knowledge system, including notes derived from downloaded PDFs. That breadth provides productivity value, but it also makes one-folder isolation impractical for the reported workflow. The proposed response is disciplined configuration: hooks require the agent to follow the same package-age, install-script, provenance, and dependency rules expected of a human developer.

This principle is more durable than a rule tied to one assistant. Package policy should apply consistently whether an installation is initiated by a developer, an AI agent, a local automation script, or a build process. The initiator may change; the acceptable evidence, permissions, and exceptions should not.

Key takeaways
- Review the full dependency graph, because the packages selected directly represent only part of the installed attack surface.
- Use a waiting period for new releases as one filter, while preserving a documented path for justified exceptions.
- Prevent install scripts from running automatically until their purpose and behavior have been examined.
- Combine provenance checks with a purpose-to-capability test and an explicit approve, investigate, replace, or reject decision.
- Pin direct and transitive versions, then run recurring audits to detect issues discovered after installation.
- Apply the same package rules to coding agents, automation, local development, and build environments.
- Layer installation controls, resource constraints, and network egress limits so that one missed signal does not determine the outcome.
A mature package security posture will increasingly depend on making these controls routine and machine-enforceable. As development becomes more automated, the teams best positioned to move quickly will be those that turn package trust from an informal judgment into a consistent operating policy.

References
- Shivam.Consulting Blog – Stop Package Breaches Before They Start: My Proven Playbook to Block Common Entry Points
June 10, 2026
Why We Made Fin the Most Open Agent: Instant HubSpot & Freshdesk Support With 76% Resolutions

I’ve spent my career pairing product strategy with customer reality, and nothing is more clear right now than the demand for openness and speed. Today, we’re announcing that Fin can be used as a Service Agent on top of HubSpot and Freshworks, meaning you can use the world’s best Agent without migrating off your helpdesk.

Hubspot and Freshdesk customers can now:

Get Fin live, integrated, and working seamlessly in less than an hour.

Delivering a 76% average resolution rate.

Across all customer channels (voice, email, chat, social, and more).

Resolving complex queries that require reading and writing to third party systems.

With everything fully configurable to follow the unique policies of every individual business.

This launch is a very visible step in a journey we’ve been on from day one: building an open, customer-first platform that plays well with the rest of your stack. We’ve long known that businesses want flexibility in how they configure their customer-facing tech stack. Since the very beginning, we have built Fin as an open platform, with APIs, MCPs, CLI, and opening up access to Apex, our proprietary trained model that delivers best in class performance.

To make things easy for our customers, we have extensive public documentation of our product on our website, in our help center, and in our developer docs. We are the only Agent company in our space to do this, others hide most details behind sign-in screens, which we don’t believe is the right thing to do.

Open Agent platforms will win because customers refuse to be boxed into closed ecosystems. We now believe our category has reached a stage where customers demand open platforms, that those who open up are more likely to win, and those who remain closed and protectionist will accelerate their demise.

We are operating in a fast changing world, and customers do not want to be locked into a single vendor or closed ecosystem. They want the ability to experiment, to swap things in and out, and move everything with ease, technically and commercially.

In an open world, the best product will win. In a world where businesses can easily swap vendors, the best product will win. We are happy to compete on that front, confident that Fin delivers the best customer experience and the highest performance.

From a product management lens, this openness is powered by agentic AI patterns paired with robust CRM integration. Under the hood, we use Model Context Protocol (MCP), well-documented APIs, and orchestrated AI workflows to read from and write to third-party systems. That’s how Fin handles true multi-channel work—including voice AI agent scenarios—while giving teams the observability they need through Agent Analytics.

If you are a Hubspot or Freshdesk customer, you can now have Fin integrated and live within an hour, without needing any help from us. We’re here if you want us, but as part of our commitment to building an open platform, we’ve designed everything to be self-servable—start in minutes or watch a quick demo of how everything works.

Fin for Hubspot

Fin for Freshdesk

Inspired by this post on The Intercom Blog.

June 9, 2026
A Game-Changing Leap in Voice AI: Fin Voice 2, Apex Flash, and a Live Demo You Can Trust

In competitive markets, I see two options: try to win the game competitors set, or choose to play a different game. In the "Customer Agents" category, I’ve watched too many glossy, fabricated demos—especially around voice—mask the real challenges. Voice is just extremely hard. We all know the future of customer experiences will be Agent-driven voice, yet most of us haven’t actually spoken with a modern AI Agent when calling a business because the tech hasn’t been truly ready in the wild. Today, the bar moves.

What changed? There’s a live, public demo of cutting-edge voice tech you can stress test yourself—no smoke, no mirrors. I recommend taking it for a spin: https://fin.ai/voice. It’s fast, natural, and, yes, very, very good.

For context, yesterday brought Apex Flash, their newest and fastest model, built for the unique demands of low latency channels like voice. Today comes Fin Voice 2, a major upgrade to Fin Voice with over 20 new features, and the first product built on Apex Flash.

Here are the three things that stood out to me—and why they matter for customer support AI strategy and product strategy.

First — thanks to Apex Flash, Fin Voice 2 is now the fastest, most natural Agent for phone, with higher resolution rates and customer satisfaction scores than ever before. Apex Flash is trained on millions of customer experience interactions, fine tuned for customer service, and can be configured to understand all your knowledge and follow all your policies. The result is higher resolution at significantly lower latency—the best of both worlds for voice AI agent performance.

Speed and naturalness here aren’t accidental. Most voice AI products are slow because they convert speech to text, send it to a general model, get a text answer, and then convert it back to speech. Fin Voice 2 was designed to work differently, separating the real time layer that handles speech processing, and the layer that generates answers. That architecture is purpose-built for the demands of customer service on voice.

Powered by Apex Flash, Fin Voice 2 raises the bar on quality and speed—boosting resolution rates and guidance following while cutting time to first audio and semantic search latency, with a lift in CSAT too.

Second — Fin Voice 2 can handle complex queries end to end: taking actions in external systems, verifying callers’ identities, processing refunds, booking appointments, and more. Phone is a high-stakes channel, and Fin adapts to customers across emotional states, clarifies when needed, and confirms key details before taking action. Most of the time, Fin can resolve the query in full, and when it can’t, it seamlessly hands off to the human team, maintaining full customer context and history. You also get multiple improvements to call quality, plus proactive outbound calls to follow up on unresolved issues—all orchestrated by robust AI workflows.

Third — Fin Voice 2 gives you total control with industry-leading tools to configure and manage how Fin behaves. You get rich, detailed insights into call behavior and quality, the most common topics of calls, and one-click recommendations to improve. As with everything in Fin, you can fully self-serve and then manage it all with ease, without requiring professional services. Many vendors only let you set up their voice agent under supervision; with Fin, you get everything you need to iterate fast.

If you haven’t tried the demo yet, go check it out: https://fin.ai/voice. If you prefer to wait, don’t be surprised when you end up speaking with it at a favorite brand soon.

From a product management lens, this is what matters: latency is a feature customers feel; transparency builds trust in enterprise AI; and control is non-negotiable for CX leaders. The combination of a purpose-built, agentic AI architecture, measurable gains in resolution and CSAT, and true self-serve configuration signals that voice is moving from prototype theater to production reality. That’s the different game I want our industry to play.

Inspired by this post on The Intercom Blog.

June 4, 2026
Supercharge Insights with Amplitude Agent Connectors: Connect Notion, Slack, Linear & More

I’ve led enough multi-tool product organizations to know how quickly momentum erodes when insights and actions live in different places. When my teams bounce between Notion, Atlassian, Slack, Linear, and analytics dashboards, we pay a real tax in context switching. That’s why I’m excited about what Amplitude is enabling with Agent Connectors—bringing our daily work and our data-driven decisions into one fluid, agentic AI workflow.

Connect Notion, Atlassian, Slack, Linear, and more to Amplitude's Global Agent. Get richer analysis and take action across tools without leaving Amplitude.

Practically, this means I can treat Amplitude analytics as a unified analytics platform where analysis and execution finally meet. Instead of exporting charts or copying insights into docs, I can drive Agent Analytics directly from the same surface where I manage behavioral analytics, reducing friction and accelerating decisions. For my product strategy, that’s a meaningful shift—from “insight later” to “insight-to-action now.”

Here’s how I’d use it on a typical day: I ask the agent to synthesize signals from recent feature usage, spotlight anomalies, and then draft a concise summary for our Slack channel. In the same flow, I can prompt it to reference our Notion specs for context and queue next steps in Linear, keeping Atlassian stakeholders looped in without any extra swiveling between tabs. The value isn’t just faster execution; it’s tighter alignment across teams because the analysis and the plan live together.

From an operating model perspective, this is how I scale AI workflows responsibly. I can define clear prompts, approval paths, and ownership so the agent augments—not replaces—expert judgment. Data governance and permissions remain front and center: the agent sees what your teams are allowed to see, and we maintain auditability on critical workflow steps. The outcome is a trustworthy, repeatable system that compounds learning over time.

If you’re exploring agentic AI for product teams, start small and instrument your ROI. Pick one or two connectors (Slack and Notion are great first choices), define a measurable workflow—like pushing weekly retention insights and creating prioritized follow-ups in Linear—and iterate using continuous discovery. In my experience, the first wins appear as reduced time-to-insight, fewer meetings to align, and faster cycle time from observation to shipped change.

The big picture is simple: bring your work to your analytics, and your analytics to your work. With Agent Connectors, Amplitude’s Global Agent helps close the loop from understanding behavior to taking action—without leaving the place where your insights are born.

Inspired by this post on Amplitude – Best Practices.

June 3, 2026
Package Hack Wake-Up Call: My Playbook for Securing Cowork, Coding Agents, and Secrets

I love being a builder. It feels like a superpower I can’t stop using, and lately I’ve been channeling it into better workflows, faster experimentation, and sharper product thinking.

I tinker with my Claude Code workflows to make every day more effortless. I’m having a blast creating AI-generated interview snapshots and opportunity solution trees for Vistaly. I also spend time digging into traces and iterating on the AI coaches I use for our discovery courses.

Then the recent wave of malicious software spreading through the open-source community popped my bubble. It hit companies big and small—names like OpenAI, PostHog, and Zapier. As I dug in, I realized what many cybersecurity experts have long known: this is a deep rabbit hole. If I want to build responsibly, I have to get significantly better at protecting my devices, credentials, and code. And if you’re building with AI or modern tooling, you likely do, too.

Here’s why. We all rely on open-source software. Most modern applications assemble tried-and-true components—parsing a PDF, handling dates across time zones, visualizing spreadsheet data, connecting to an API—rather than reinventing them. The same is true for agent skills and MCP servers; they accelerate how we get value from models. This is overwhelmingly a good thing. But it also creates an attack surface that bad actors exploit.

We don’t need to abandon third-party code. We do need to understand the mechanisms attackers use and consistently defend against them.

When one malicious worm compromises hundreds of packages, what should dev teams do? This visual teaser maps the agenda—how it spreads, how to guard against it, AI tool risks, and concrete steps to mitigate.

On May 11th, I started seeing tweets about a TanStack hack. At that time, I didn’t know what TanStack was. But apparently, it’s a popular set of JavaScript libraries that are used by a lot of React sites. At first, I didn’t pay much attention. Then I learned the packages were compromised by a worm—malicious software that self-replicates—and it spread quickly. Within hours, dozens of packages were implicated; by day’s end, it was in the hundreds. That’s when I knew I had to lean in.

If you’ve explored safe development practices with coding agents before, you’ve seen the basics of package safety. A package is a bundle of reusable code shared through registries, and nearly every app you use depends on them. The unfortunate twist with this specific hack, known as the Mini Shai-Hulud worm, is that it shows prior “safe enough” heuristics aren’t sufficient. Popularity and trust signals don’t guarantee safety. We have to do more.

So here’s what I’ll cover today: how malicious software typically works, a practical framework for guarding against it, the specific risks of using Cowork to write and run code, and concrete steps to mitigate that risk. My goal is simple: help you keep building—despite the risks—while protecting your data and your business.

Quick disclaimer: I’m not a security expert. I’m sharing my personal journey and what I’ve learned through research and hands-on work. Please use your best judgment when applying any of this.

Package hacks share a simple playbook: get in, sweep for secrets, and phone home. This visual breaks down the 3 steps and flags new entry points—from packages to MCP servers, agent skills, and app extensions.

An agent recently scoured over 230,000 malicious software incidents and found that most malicious software follows a similar pattern. First, it needs an entry point onto your computer. Once installed, it scours your device for sensitive data, and then it uses your network connection to send that data to its own servers. The Mini Shai-Hulud worm spreads via malicious package install scripts that run at download time, then searches the device for credentials (including package publishing rights), poisons additional packages to continue replicating, and uses multiple channels—including the victim’s own GitHub public repos—to distribute secrets.

In practice, most attacks boil down to three steps: 1) It finds an entry point to your device. 2) It searches your device for sensitive data. 3) It sends that data to its own server. The good news: this pattern also tells us how to defend. We can harden entry points, minimize what code and agents can access, and constrain outgoing network traffic.

Keep in mind that install scripts aren’t the only entry vector. Any code that runs on your machine could contain malicious payloads: third-party packages, agent skills, MCP servers, browser or desktop extensions—the list is long. As coding agents and “vibe coding” tools become mainstream, more non-engineers are exposed to the same risks engineers have managed for years.

You might be at elevated risk if you do any of the following: you download and use third-party skills or MCP servers; you let Claude Code, Codex, or other coding agents write scripts that run locally and use third-party packages; you use an IDE like VS Code or Cursor with third-party extensions; or you install third-party extensions in tools like Obsidian. This isn’t an exhaustive list, but if any of these apply, it’s worth tightening your approach.

Relying on third-party code? This visual highlights four common risk zones—agent skills/MCP servers, coding agents, IDE extensions, and Obsidian plugins—and urges a review of downloads, local scripts, and add-ons.

The “safest” approach would be to avoid installing third-party software on your local device entirely. That’s not realistic. We all depend on third-party components in our stack. So I’ll start with one of the most common paths for non-engineers writing and running code today: Cowork.

Evaluating Cowork’s safety was eye-opening. Cowork offers meaningful protection—more than running code directly on your machine—but it isn’t bulletproof. There’s a notable gap you should understand.

Here’s how Cowork helps. It runs code inside a virtual machine, which isolates the execution environment from your real device—a quarantine room for code. While Cowork doesn’t fully control what comes into the room (that part is on you), if malicious code gets in, it’s contained and cannot reach the rest of your filesystem. Cowork also limits outbound network traffic from the virtual machine, which helps disrupt data exfiltration. However, it’s not foolproof.

Because Claude can install packages inside Cowork, it remains susceptible to malicious code like the Mini Shai-Hulud worm. And GitHub is on the allow list so Cowork can read and write to your repos. Since the Mini Shai-Hulud worm uses GitHub to publish secrets, this creates exposure. The crucial mitigation: if you never give Cowork access to sensitive data, there’s nothing for an attacker to steal.

A quick visual from a security deep dive on package hacks shows how Cowork handles threats: entry points are contained, data is only safe when kept outside, and network traffic is partly limited—making shared data the gap to watch.

Your responsibility is straightforward but critical: your data is only safe if it stays outside the virtual machine. When you mount folders into Cowork, those folders become accessible to any code running inside the VM. That includes malicious scripts. Before sharing, ask two questions: do the folders contain any credentials or secrets, and do they include proprietary data that would be harmful if accessed?

It’s common for code to need credentials. That’s why Cowork includes connectors to third-party sources like Google Drive and Slack. Credentials configured for these connectors never enter the VM—they remain outside the quarantine room—so they’re not exposed to malicious code. But if your code requires additional credentials inside the VM, scope them tightly and assume they could be compromised.

You can also use custom MCP servers you create yourself with Cowork. Those credentials stay outside the VM as well, provided the MCP servers are remote (hosted on a web server, not downloaded locally). It’s more work than dropping in a local server, but it keeps secrets out of reach from VM-executed code.

Beyond credentials, scrutinize the actual content you share with Cowork, including anything accessed through connectors. Least privilege is the rule: grant only what’s absolutely necessary for the task, and nothing more.

Amid a wave of package-supply attacks, this Product Talk visual launches a 3-part guide to safer AI building—starting with Cowork safety today, then Claude code config next week, and off-device development coming soon.

What about skills? Cowork supports skills, and you can add third-party skills inside the quarantine room. If you’re not placing your own data in that room, you can afford more risk. The moment you add sensitive or proprietary data, be selective. Skills can include third-party code, and bad actors use skill directories to distribute malicious payloads. Personally, I never use third-party skills as-is. If one looks useful, I read through the files, then ask Claude to recreate it so I understand what it does and maintain control. If I were to use third-party skills, I’d do it in Cowork and keep their data access to the minimum necessary.

Overall, Cowork is a solid, “safe-ish” option if you’re disciplined about what you share. The challenge is that utility often requires access to real data—exactly what we’re trying to protect. In an upcoming deep dive, I’ll outline strategies to keep malicious code out in the first place. While I’ll focus on local development, the same patterns can extend to Cowork with a bit of setup.

One more important clarification: don’t confuse Cowork with the Code tab in the Claude Desktop app. Cowork runs code inside a virtual machine. The Code tab does not. If you ask Claude to write and execute code from the Code tab, that code runs on your local device and you’re fully responsible for security. There is one exception: the Code tab can run code in Anthropic’s cloud; I’ll cover that approach when we get into moving development off the local machine.

To summarize Cowork’s protections against the attacker’s three-step pattern: installs and scripts still run, but they’re contained inside an isolated virtual machine instead of your real device; access to sensitive data is strongly limited to the specific folders you mount, leaving the rest of your filesystem (including unrelated credentials) out of reach; data exfiltration is partially constrained because Anthropic limits outbound network traffic from the VM—helpful, but not absolute. By contrast, local Code tab sessions offer no isolation, no filesystem restrictions, and no network limits—so any malicious install scripts run directly on your machine with full access and open egress.

My takeaways so far: I still love building with AI, but I’m doing it more cautiously. Cowork offers meaningful containment when used deliberately. I still prefer the flexibility of Claude Code, and I’ve reconfigured my setup to reduce risk. Even so, “safer” isn’t “safe,” which is why I’m increasingly shifting development off my local device to more controlled environments. I’ll share the practical details—tools, configs, and scripts—in the next installments.

If this perspective is useful, let me know. I want builders to move fast—and safely—through this new era of agentic AI. Until then, stay safe out there.

Inspired by this post on Product Talk.

June 3, 2026
Speed-to-Lead Is Dead: How AI Agents End the Wait and Rebuild a High-Velocity Sales Org

A prospect lands on our site, skims pricing, watches a demo, and clicks “contact sales.” For years, that’s where momentum died. They waited, and we built entire sales motions around managing that delay.

We optimized for “speed-to-lead,” made it the hallmark of a high-performing sales development org, hired more SDRs, tuned routing rules, added shift coverage, and stared at response-time dashboards. Typical SLA targets were one hour for best-fit leads, four hours for core MQLs, forty-eight hours for everyone else. Those were considered good numbers.

No one questioned the premise because the lag felt structural—shift scheduling, routing delays, and humans working 9–5. The fastest teams could only shrink the gap; nobody could remove it.

An AI Agent closes it completely.

When a prospect arrives today, the conversation can begin immediately. That single change reshapes how I design a sales org—how we staff it, what our team prioritizes, and the metrics we hold ourselves accountable for.

Step outside our dashboards and look at the buyer experience. We spend heavily to drive traffic, then push visitors into forms and queues that add friction precisely when purchase intent peaks.

Intent is highest the moment someone seeks out our product. If an SDR follows up two or three hours later, that buyer’s in another meeting, the urgency has faded, and the moment is gone. We still call it a lead; the buyer has already moved on.

What AI changes

Agents eliminate the structural constraints that made speed-to-lead a problem—shift scheduling, routing delays, CRM batch processing, the SDR being on another call. None of it applies anymore because every single lead can be engaged immediately, at any hour and in any language.

The impact goes beyond response time. When an Agent engages at peak intent, qualification, discovery, and even an initial demo moment can unfold in a single, continuous conversation. The gated funnel collapses. There’s no reason to qualify someone today, schedule discovery for Thursday, and demo the following week when the conversation is already happening.

The constraint the industry built around simply isn’t there anymore. We’re already seeing it with Fin, a Customer Agent. As sales leaders, we need to frame this differently.

If speed-to-lead is no longer the constraint, the knock-on effects reach every part of the org.

Introduce Fin for Sales to your team with this clean hero banner: bold headline, signature blue spiral, and a clear 'Start free trial' call to action—inviting readers to explore an AI customer agent built for revenue.

SDRs focus on moving deals forward. Instead of frontline triage, they double down on phone-based selling and relationship building, complex deal navigation, and multi-threaded engagement across stakeholders—the high-leverage work that used to get crowded out by the inbox.

Pipeline gets more relevant. The old model rewarded volume: capture as many form fills as possible, respond fast, and sort quality later. When an Agent engages at the moment of intent, it qualifies during the conversation. Low-fit leads get filtered out before they reach the team, and high-fit prospects arrive with context—needs, timeline, stakeholders—instead of just a name and email.

You measure outcomes, not response time. When first response is instant, different metrics matter. I anchor on three questions:

1) Is the Agent doing the work? Completion rate, qualification rate, and contact capture rate indicate whether conversations reach clear outcomes and produce usable handoffs to the team.

2) Is the work producing pipeline? Meetings booked and pipeline created through Agent-handled conversations are the leading indicators of revenue, not how fast someone followed up.

3) Are buyers having a good experience? Conversation-level satisfaction matters more than ever because the Agent is the first interaction prospects have with your company. The experience it delivers is the first impression you make.

These three questions reveal whether the motion is working. Time-to-first-response can’t.

Sales orgs built hiring plans, workflows, and performance metrics around beating intent decay. That made sense when the lag was unavoidable. It isn’t anymore.

An Agent is always on. It engages the moment a prospect arrives on your site, qualifies them in real time, and routes them to the right outcome without waiting for someone to be free. The lag the industry built itself around doesn’t exist when the conversation starts immediately.

The companies leaning into this are investing in what happens after the conversation starts: how well the Agent qualifies, where it creates pipeline, and what SDRs should actually spend time on. What matters now is not how fast you respond, but what the conversation produces.

Speed-to-lead made sense when the delay was structural. It isn’t anymore. If you’re re-architecting go-to-market, instrument Agent Analytics, revisit SDR charters, and tighten CRM integration so every qualified handoff is instant, traceable, and revenue-linked.

Inspired by this post on The Intercom Blog.

May 26, 2026