Tag: vibe coding

How Snapbar Turned Crisis Into an AI-Native Photo Experience Revolution

What does it take to reinvent a 14-year-old company, not once, but twice? I ask that question often when I look at mature product organizations, because the hardest transformations rarely start with a clean slate. They start with real customers, legacy expectations, operational muscle memory, and a market that suddenly refuses to behave the way it used to.

Snapbar is a useful case study in that kind of transformation. The company began as a wedding photo booth side hustle, grew into a national events company, and then watched COVID wipe out the entire business overnight. As a product leader, I find that moment especially important because it separates teams that are attached to the current expression of their product from teams that understand the deeper customer need underneath it.

The deeper need was never just a physical photo booth. It was identity, participation, memory, brand engagement, and a shareable experience that people could take with them. When in-person events disappeared, Snapbar went from physical photo booths to a cross-platform virtual product built on WebRTC in spring 2020. That was not a cosmetic pivot. It was a first-principles rebuild under pressure.

I have seen many teams talk about innovation when conditions are favorable. Snapbar’s story is more interesting because the team had to innovate when the existing business model was unavailable. That kind of constraint can be clarifying. It forces product teams to ask: What job are we really doing for customers, and what parts of our current solution are merely historical artifacts?

The next reinvention came from generative AI. Pushed by declining repeat business, Snapbar dove deep into Stable Diffusion, custom LoRA fine-tunes on H100/H200 GPUs, and eventually a reasoning-model-powered generative image and video pipeline. What stands out to me is not simply that the team adopted gen ai. It is that they connected AI capabilities to a domain they already understood deeply: photography, events, brand activations, and experiential marketing.

This distinction matters. In product management, technology FOMO can lead teams to bolt AI onto workflows without a clear strategic advantage. Snapbar appears to have moved differently. They used 14 years of industry knowledge to identify where AI could change the experience itself, not just automate a back-office task or generate a novelty output.

The product evolution is a strong example of applied AI. Snapbar integrated Stable Diffusion 1.5 as their first generative AI model and ran custom LoRA fine-tunes on H100/H200 GPUs to produce brand-quality outputs nobody else in their space could match. That level of execution shows the difference between experimenting with a model and building a differentiated product system around it.

I also appreciate the way the team moved from negative prompts to reasoning model long-form prompts. In brand environments, creative control and safety control are not optional. A brand activation must feel imaginative, but it also has to remain on-message, inclusive, and predictable enough for a live event setting. Better prompt engineering becomes part of the product’s trust layer.

One of the most important product details is the meta-prompting pre-processing pipeline designed to ensure user likenesses, including non-obvious details like disabilities, are accurately represented in generated images. That is not a minor implementation detail. It reflects a more mature view of AI risk management, representation, and customer experience.

From my perspective, this is where product strategy and ethical technology intersect. Generative AI systems can easily flatten people into generic outputs. A thoughtful product team has to decide what fidelity means, what consent means, and how much control users and brands should have over the final artifact. Snapbar’s approach suggests that representation is not just a model-quality problem; it is a product-design problem.

Just Now Possible spotlights Snapbar’s journey from photo booths to AI-powered brand experiences, framing reinvention, creativity, and applied AI as the center of the conversation.

The company’s experiential marketing platform lets brands “world build” at conferences, trade shows, and live events by bringing fans into branded creative worlds. That phrase matters because it reframes the photo booth from a capture device into a participatory brand system. The user is not merely photographed. The user becomes part of a designed world.

I see this as a broader shift in product experience. Static brand impressions are giving way to co-created moments. Snapbar added participatory user inputs through Mad Lib-style prompts and prompt injection, turning photo experiences into co-creation moments between brands and their audiences. That is a more durable engagement loop than simply asking someone to pose in front of a branded backdrop.

The operational story is just as relevant for product and engineering leaders. Snapbar used Claude Code and Codex to build and ship features rapidly as a small bootstrap team, and developed a four-pillar agent orchestration framework: context, tools, verification, and workflows. I like that framing because it treats AI-assisted development as a system of work, not a magic shortcut.

In my own product leadership work, I keep coming back to the same lesson: AI workflows only become reliable when the team defines the surrounding operating model. Context determines whether the agent understands the problem. Tools determine what it can actually do. Verification determines whether the output is trustworthy. Workflows determine whether the capability compounds across the organization.

Snapbar is now building customer-facing “vibe coding” using the Claude Agent SDK so brands can configure and create experiences themselves within Snapbar’s platform. That is a meaningful product move. It shifts creation closer to the customer while keeping the workflow inside a controlled product environment. For brand teams, that could reduce dependency on custom service work while still preserving creative flexibility.

This is the kind of AI Strategy I find most compelling: not a generic claim that AI will transform everything, but a specific path from domain expertise to product capability to customer empowerment. Snapbar did not abandon its past. It converted years of event, photography, and brand knowledge into a new interface for generative AI.

The core lesson for product teams is clear. Reinvention does not always mean discarding the original business. Sometimes it means identifying the durable customer need, rebuilding the delivery mechanism, and then using new technology to expand what the experience can become. Snapbar’s journey from wedding photo booths to virtual WebRTC experiences to AI world building shows how a team can preserve its market intuition while changing nearly everything about the product surface.

For product leaders evaluating gen ai opportunities, I would take three practical lessons from this story. First, start with the customer experience, not the model. Second, treat brand safety, representation, and verification as product requirements from the beginning. Third, use agentic AI internally only when the team has a clear framework for context, tools, verification, and workflows.

Snapbar’s story resonates because it is not about chasing a trend. It is about a team using necessity, curiosity-led self-education, and disciplined product thinking to build something that feels native to the generative AI era. That is the difference between adopting AI and becoming AI-native.

Inspired by this post on Product Talk.

July 9, 2026
Reliable AI Coding Requires Four Kinds of Control
Reliable AI coding is not primarily a matter of finding a better prompt or a more capable model. It is a workflow-design problem: teams must control what the product should do, what the repository currently does, what the model can see, and what the agent is allowed to change.

Managing those four kinds of state turns an AI coding session from an open-ended conversation into a bounded engineering process. The payoff is faster iteration without treating plausible output, confident status messages, or large context windows as substitutes for evidence.

Reliability depends on the surrounding system

A large language model generates an answer token by token from the input available to it. That input can include more than the visible request: an application may add system instructions, conversation history, project files, enabled tools, skills, and other supporting context. As Shivam.Consulting Blog’s guide to how ChatGPT works explains, the surrounding application therefore helps shape the result even when two products use the same underlying model.

This mechanism has an important operational consequence. An agent can produce code that looks convincing without possessing a stable model of the intended product, the complete repository, or the runtime environment. Fluency indicates that the output fits learned patterns; it does not establish that the implementation satisfies the requirement.

A dependable workflow consequently controls four connected states. Product state covers requirements, constraints, permissions, edge cases, and acceptance criteria. Repository state covers the actual code, data model, dependencies, tests, and uncommitted changes. Model state covers the instructions and evidence present in the context window. Execution state covers tools, filesystem access, commands, network activity, and other permissions. A failure in any one can appear to be a coding error even when the code is not the original cause.

Tool selection should reflect that distinction. Shivam.Consulting Blog’s vibe-coding playbook recommends managed app builders when the purpose is to explore an interaction or answer an early product question, while positioning developer-oriented coding agents as more appropriate for existing repositories, multi-file changes, tests, and review workflows. The useful dividing line is not whether a tool can generate code. It is whether the environment exposes enough control and evidence for the consequence of the change.

Convert product intent into a bounded change contract

Many unreliable sessions begin before an agent edits a file. If the requested behavior, non-goals, affected users, data rules, and observable success conditions remain ambiguous, the model must fill the gaps. Each follow-up correction can then preserve a different assumption, creating a chain of locally plausible patches without a coherent final design.

A stronger starting point is a compact change contract written outside the chat. It should identify the outcome, relevant current behavior, permitted scope, important invariants, expected edge cases, and the evidence that will demonstrate completion. For a defect, that evidence begins with a reproducible failing case. For a feature, it includes examples of accepted and rejected behavior. The contract should also record explicit non-goals so that an agent does not broaden a narrow request while attempting to be helpful.

Blast radius deserves separate attention. The vibe-coding playbook uses data, controller, and view as a practical three-layer model. A request involving permissions, sorting, filtering, workflow state, or reporting may cross all three even if it appears in the interface as a small change. Reviewing the planned impact across storage, logic, and presentation helps reveal missing migrations, inconsistent validation, stale queries, and user-interface states before implementation begins.

The same source proposes separate plan-review-fix and implement-review-fix loops. Combined with the change contract, these become distinct gates rather than one continuous conversation. The plan gate asks whether the proposed files, layers, and tests match the requirement. The implementation gate asks whether the resulting diff and observed behavior match the approved plan. Separating the gates makes it easier to reject a mistaken approach before it accumulates code.

This structure also clarifies the human role. The agent can explore the repository, propose a plan, implement a bounded change, and help investigate failures. Product and engineering owners remain responsible for deciding what behavior is correct, which tradeoffs are acceptable, and what evidence is sufficient to ship.

Treat context as a limited working set, not permanent memory

A long conversation can feel comprehensive while becoming less dependable. Shivam.Consulting Blog’s context-rot analysis reports research showing that model performance can deteriorate as input length grows and that information at different positions may receive unequal attention. The article’s practical conclusion is more useful than any advertised context-window maximum: available capacity should not be confused with reliable attention.

Context should therefore be curated as a task-specific working set. Durable facts belong in versioned project documents; the active session should receive only the instructions, files, decisions, and evidence needed for the current change. Old tool output, abandoned plans, duplicate explanations, and superseded requirements consume attention without improving the task.

Shivam.Consulting Blog’s guide to Claude Code workflows describes a layered memory pattern: broad preferences in global instructions, project-specific conventions in repository-level files, and reference material loaded when relevant. It also presents stored commands as a way to make recurring procedures explicit, and sub-agents as a way to isolate context or perform independent work. The transferable principle is architectural rather than product-specific: stable policy, project knowledge, task instructions, and transient evidence should not be mixed into one ever-growing transcript.

A clean session boundary can be a reliability control. When a conversation has accumulated contradictory instructions or repeated failed fixes, the next step should not automatically be another patch request. A new session can begin from a short handoff containing the approved change contract, current repository state, attempted approaches, observed failures, and unresolved questions. This preserves useful evidence without carrying the entire history forward.

Sub-agents require the same discipline. Parallelism is valuable when work can be partitioned into independent questions, such as locating relevant code, examining tests, or reviewing a proposed diff. It is less useful when several agents can modify overlapping files or make incompatible architectural assumptions. Each delegated task needs a narrow scope, an expected output, and a rule for whether it may write or only report.

Require evidence, limited authority, and a recovery path

An agent’s statement that a problem is fixed is a claim to verify, not completion evidence. Verification should return to the original reproducer or acceptance criteria, then examine the diff and run the smallest relevant checks. Broader tests can follow when the change crosses modules, alters shared behavior, or affects data. This sequence distinguishes a real correction from a patch that merely changes the visible symptom.

Review should inspect both behavior and change shape. A diff may pass a narrow test while introducing unrelated refactoring, weakening validation, swallowing errors, or duplicating logic. Unexpected file changes, new dependencies, disabled checks, and unusually broad edits are signals to pause. If the evidence is inconclusive, the workflow should return to diagnosis rather than asking the same context-saturated agent to keep editing.

Reliability also depends on limiting what an agent can do. Shivam.Consulting Blog’s Claude Code risk guide describes escalating exposure as an agent moves from reading a project folder to reading elsewhere, fetching external material, writing files, executing generated code, and installing third-party packages or extensions. Although permission models vary by product, the general control is consistent: grant the least authority required for the current step and review the exact path or command before approval.

Folder boundaries should match the task boundary. Credentials, customer information, confidential documents, and unrelated projects should not be placed within an agent’s working scope. One-time approval is preferable when an operation is unusual or its future use would be difficult to predict. Commands that delete, overwrite, upload, install, or execute deserve more scrutiny than read-only inspection because their impact is larger or harder to reverse.

Reversibility completes the control system. The safety guide emphasizes backups and version control because an AI coding interface may not provide a dependable undo operation. A clean checkpoint before implementation, small commits, reviewable diffs, protected secrets, and a tested rollback path reduce the cost of both model errors and human approval mistakes. For higher-risk work, the agent should operate in a disposable branch, isolated environment, or similarly constrained workspace rather than directly against valuable state.

These safeguards are mutually reinforcing. A bounded contract limits scope; curated context reduces instruction drift; verification exposes incorrect claims; least privilege limits blast radius; and version control makes recovery practical. Removing any one of them shifts too much trust onto probabilistic output.

Key takeaways
- Control product state, repository state, model context, and execution authority as separate parts of one workflow.
- Write a change contract with scope, non-goals, invariants, edge cases, and acceptance evidence before implementation.
- Keep context task-specific; store durable knowledge in files and start a clean session when history becomes contradictory or noisy.
- Treat an agent’s completion report as a hypothesis until the original reproducer, relevant tests, observed behavior, and diff support it.
- Match permissions and isolation to the risk of the operation, and create a recovery point before allowing changes.
As coding agents gain more tools and autonomy, reliable teams will distinguish themselves less by how much work they delegate than by how clearly they define authority, evidence, and recovery. The durable advantage will come from workflows in which faster generation is paired with tighter control.

References
July 3, 2026

From AI Builder to Agent Swarm: A Product Delivery Model

AI-native product delivery has two distinct layers: a product professional who turns uncertainty into testable artifacts, and an agent workflow that divides complex work among specialized AI components. Treating either layer as the whole model misses the more useful opportunity.

Together, the AI Builder role described by Product School and the parallel-agent architecture discussed by Pendo suggest an operating model for moving from customer evidence to evaluated software. The central lesson is not simply to add more AI. It is to assign clear responsibilities, preserve evidence across handoffs, and expand automation only where it improves a measurable constraint.

Key takeaways

The AI Builder is the human integration layer, connecting discovery, prototyping, evaluation, and delivery inside the product trio.
Parallel agents are a system design choice, useful when specialized paths can improve latency, answer quality, or resilience.
Evaluations, analytics, observability, and controlled releases form the shared control system for both layers.
Fan-out should respond to uncertainty and business importance rather than becoming the default for every task.

One delivery system, with human and machine responsibilities

The Product School article presents the AI Builder as a hybrid product professional rather than a renamed product manager or an isolated prototyper. In its account, this person uses AI across analysis, prototyping, evaluation, and shipping, with the aim of shortening the distance between a customer problem and a runnable experiment.

The Pendo article addresses a different layer. It describes workflows in which research, reasoning, tool use, and formatting can be assigned to specialized agents and then reconciled. Its focus is not ownership of the product problem, but the computational structure used to complete work.

Read together, the articles separate two ideas that are often blurred. An AI-native team still needs a person or group to choose the problem, define acceptable behavior, interpret customer evidence, and decide whether an experiment justifies investment. Agents can perform bounded tasks within that process, but parallel execution does not establish product relevance on its own.

Layer	Primary responsibility	Typical artifacts	Control question
AI Builder and product trio	Translate customer and business uncertainty into experiments	Prototypes, evaluation criteria, instrumented experiences, delivery recommendations	Is the team learning about an outcome that matters?
Agent workflow	Execute and reconcile specialized tasks	Retrieved context, candidate responses, tool results, rankings, formatted outputs	Does orchestration improve the target measure enough to justify its complexity?
Delivery platform	Provide access, measurement, release controls, and safeguards	Tool interfaces, traces, feature flags, budgets, analytics, fallbacks	Can the workflow be observed, governed, and changed safely?

This division of responsibility also clarifies the meaning of vibe coding in the Pendo account. Prompts, examples, and constraints are used to shape an intended experience before the team commits to extensive code or rigid rules. The AI Builder supplies the product judgment and experiment design around that activity; an agent architecture supplies one possible execution mechanism.

Parallelism should target a constraint, not become a default

Pendo reports three proposed benefits of parallel agents. Independent specialists can work concurrently to reduce latency, diverse candidate paths can be compared to improve quality, and risky or failure-prone operations can be isolated behind fallbacks. The article names fan-out/fan-in, race-and-rerank, specialist swarms, consensus, and self-consistency checks as patterns for producing and reconciling candidates.

Those benefits depend on the shape of the task. Parallel research may help when several sources or interpretations must be examined independently. A race-and-rerank pattern may help when multiple plausible outputs can be scored against explicit criteria. Guarded fallbacks may improve resilience when a tool can fail without invalidating the entire experience. By contrast, multiplying agents around a simple, deterministic step adds coordination, cost, and more places to inspect when something goes wrong.

The Product School article provides the missing selection mechanism: the workflow begins with a high-signal use case and explicit evaluation criteria. That makes orchestration a response to observed limitations in an experiment rather than an architectural commitment made in advance. A prototype can begin with the smallest credible workflow, reveal whether the bottleneck is grounding, reasoning, tool reliability, or response time, and introduce specialization at that point.

Pendo proposes a similar progression at the system level: begin with retrieval, add a planner-executor split, and introduce parallel specialists where accuracy or latency problems appear. It also recommends placing budgets on fan-out, caching results, using smaller models when confidence is high, and widening the workflow when uncertainty rises. These are recommendations from the source, not independently reported benchmarks, but they establish a useful product principle: additional computation should be purchased in proportion to uncertainty and consequence.

Evaluation is the bridge from discovery to dependable delivery

The strongest overlap between the two articles is evaluation. Product School describes AI Builders converting interviews and behavioral analytics into instrumented experiments, benchmarking quality before production, and using A/B testing to feed results back into strategy. Pendo similarly calls for offline evaluations before rollout, production experiments afterward, and agent-level analytics to identify regressions across individual workflow steps.

This creates a continuous evidence path rather than a handoff between discovery and engineering. A customer problem informs a prototype; the prototype produces evaluation cases; those cases become release gates; production behavior supplies new evidence for the next iteration. CI/CD can move changes through the delivery system, while evaluations determine whether an AI behavior is ready to move with them.

A staged adoption path

Select a bounded use case. Product School suggests beginning with a high-signal application such as generative-AI prototyping or an in-app guide, rather than attempting to transform the whole delivery process at once.
Define the evidence before expanding the build. Specify evaluation criteria, analytics, and the customer or business outcome the experiment is intended to illuminate.
Establish grounded context. Both articles emphasize retrieval-oriented workflows. Product School also discusses prompts, context windows, and data contracts as product surfaces that require deliberate design.
Start with minimal orchestration. A single workflow or planner-executor arrangement provides a baseline against which a specialist or parallel design can be judged.
Add parallel paths selectively. Introduce research, tool-calling, reasoning, or validation specialists only where evaluation results reveal a material limitation.
Release behind controls. The sources point to feature flags, A/B testing, observability, anomaly detection, fallbacks, and post-launch review as ways to expose failures and limit their impact.

The Model Context Protocol appears in both accounts as a way to standardize access to tools and data. Product School frames MCP integrations as part of an AI Builder’s toolbox, while Pendo argues that standardized access keeps agent roles separate from authentication, quotas, and observability. The combined implication is organizational as well as technical: shared interfaces can let product teams experiment with workflow roles without embedding every platform concern in every prompt.

The operating model changes what a product team owns

Product School places the AI Builder inside the product trio, working with design and engineering from the beginning. Pendo argues that product trios can own complete AI workflows rather than limiting their attention to prompts. These views converge on broader product accountability: the team owns the behavior, evidence, cost, risk, and release mechanism as one product surface.

That ownership requires clearer boundaries, not fewer disciplines. Product judgment determines which outcome deserves attention. Design shapes the customer interaction and failure experience. Engineering and platform work make tool access, observability, quotas, and release controls dependable. The AI Builder connects these concerns through runnable artifacts, while specialized agents remain replaceable components within the evaluated workflow.

The resulting measure of maturity is not the number of agents deployed or the speed of prototype generation. It is whether the team can trace a customer need through an experiment, an evaluation, a controlled release, and a learning decision. As tools become easier to compose, that chain of evidence will be the durable advantage in AI-native product delivery.

References

June 8, 2026

Package Hack Wake-Up Call: My Playbook for Securing Cowork, Coding Agents, and Secrets

I love being a builder. It feels like a superpower I can’t stop using, and lately I’ve been channeling it into better workflows, faster experimentation, and sharper product thinking.

I tinker with my Claude Code workflows to make every day more effortless. I’m having a blast creating AI-generated interview snapshots and opportunity solution trees for Vistaly. I also spend time digging into traces and iterating on the AI coaches I use for our discovery courses.

Then the recent wave of malicious software spreading through the open-source community popped my bubble. It hit companies big and small—names like OpenAI, PostHog, and Zapier. As I dug in, I realized what many cybersecurity experts have long known: this is a deep rabbit hole. If I want to build responsibly, I have to get significantly better at protecting my devices, credentials, and code. And if you’re building with AI or modern tooling, you likely do, too.

Here’s why. We all rely on open-source software. Most modern applications assemble tried-and-true components—parsing a PDF, handling dates across time zones, visualizing spreadsheet data, connecting to an API—rather than reinventing them. The same is true for agent skills and MCP servers; they accelerate how we get value from models. This is overwhelmingly a good thing. But it also creates an attack surface that bad actors exploit.

We don’t need to abandon third-party code. We do need to understand the mechanisms attackers use and consistently defend against them.

When one malicious worm compromises hundreds of packages, what should dev teams do? This visual teaser maps the agenda—how it spreads, how to guard against it, AI tool risks, and concrete steps to mitigate.

On May 11th, I started seeing tweets about a TanStack hack. At that time, I didn’t know what TanStack was. But apparently, it’s a popular set of JavaScript libraries that are used by a lot of React sites. At first, I didn’t pay much attention. Then I learned the packages were compromised by a worm—malicious software that self-replicates—and it spread quickly. Within hours, dozens of packages were implicated; by day’s end, it was in the hundreds. That’s when I knew I had to lean in.

If you’ve explored safe development practices with coding agents before, you’ve seen the basics of package safety. A package is a bundle of reusable code shared through registries, and nearly every app you use depends on them. The unfortunate twist with this specific hack, known as the Mini Shai-Hulud worm, is that it shows prior “safe enough” heuristics aren’t sufficient. Popularity and trust signals don’t guarantee safety. We have to do more.

So here’s what I’ll cover today: how malicious software typically works, a practical framework for guarding against it, the specific risks of using Cowork to write and run code, and concrete steps to mitigate that risk. My goal is simple: help you keep building—despite the risks—while protecting your data and your business.

Quick disclaimer: I’m not a security expert. I’m sharing my personal journey and what I’ve learned through research and hands-on work. Please use your best judgment when applying any of this.

Package hacks share a simple playbook: get in, sweep for secrets, and phone home. This visual breaks down the 3 steps and flags new entry points—from packages to MCP servers, agent skills, and app extensions.

An agent recently scoured over 230,000 malicious software incidents and found that most malicious software follows a similar pattern. First, it needs an entry point onto your computer. Once installed, it scours your device for sensitive data, and then it uses your network connection to send that data to its own servers. The Mini Shai-Hulud worm spreads via malicious package install scripts that run at download time, then searches the device for credentials (including package publishing rights), poisons additional packages to continue replicating, and uses multiple channels—including the victim’s own GitHub public repos—to distribute secrets.

In practice, most attacks boil down to three steps: 1) It finds an entry point to your device. 2) It searches your device for sensitive data. 3) It sends that data to its own server. The good news: this pattern also tells us how to defend. We can harden entry points, minimize what code and agents can access, and constrain outgoing network traffic.

Keep in mind that install scripts aren’t the only entry vector. Any code that runs on your machine could contain malicious payloads: third-party packages, agent skills, MCP servers, browser or desktop extensions—the list is long. As coding agents and “vibe coding” tools become mainstream, more non-engineers are exposed to the same risks engineers have managed for years.

You might be at elevated risk if you do any of the following: you download and use third-party skills or MCP servers; you let Claude Code, Codex, or other coding agents write scripts that run locally and use third-party packages; you use an IDE like VS Code or Cursor with third-party extensions; or you install third-party extensions in tools like Obsidian. This isn’t an exhaustive list, but if any of these apply, it’s worth tightening your approach.

Relying on third-party code? This visual highlights four common risk zones—agent skills/MCP servers, coding agents, IDE extensions, and Obsidian plugins—and urges a review of downloads, local scripts, and add-ons.

The “safest” approach would be to avoid installing third-party software on your local device entirely. That’s not realistic. We all depend on third-party components in our stack. So I’ll start with one of the most common paths for non-engineers writing and running code today: Cowork.

Evaluating Cowork’s safety was eye-opening. Cowork offers meaningful protection—more than running code directly on your machine—but it isn’t bulletproof. There’s a notable gap you should understand.

Here’s how Cowork helps. It runs code inside a virtual machine, which isolates the execution environment from your real device—a quarantine room for code. While Cowork doesn’t fully control what comes into the room (that part is on you), if malicious code gets in, it’s contained and cannot reach the rest of your filesystem. Cowork also limits outbound network traffic from the virtual machine, which helps disrupt data exfiltration. However, it’s not foolproof.

Because Claude can install packages inside Cowork, it remains susceptible to malicious code like the Mini Shai-Hulud worm. And GitHub is on the allow list so Cowork can read and write to your repos. Since the Mini Shai-Hulud worm uses GitHub to publish secrets, this creates exposure. The crucial mitigation: if you never give Cowork access to sensitive data, there’s nothing for an attacker to steal.

A quick visual from a security deep dive on package hacks shows how Cowork handles threats: entry points are contained, data is only safe when kept outside, and network traffic is partly limited—making shared data the gap to watch.

Your responsibility is straightforward but critical: your data is only safe if it stays outside the virtual machine. When you mount folders into Cowork, those folders become accessible to any code running inside the VM. That includes malicious scripts. Before sharing, ask two questions: do the folders contain any credentials or secrets, and do they include proprietary data that would be harmful if accessed?

It’s common for code to need credentials. That’s why Cowork includes connectors to third-party sources like Google Drive and Slack. Credentials configured for these connectors never enter the VM—they remain outside the quarantine room—so they’re not exposed to malicious code. But if your code requires additional credentials inside the VM, scope them tightly and assume they could be compromised.

You can also use custom MCP servers you create yourself with Cowork. Those credentials stay outside the VM as well, provided the MCP servers are remote (hosted on a web server, not downloaded locally). It’s more work than dropping in a local server, but it keeps secrets out of reach from VM-executed code.

Beyond credentials, scrutinize the actual content you share with Cowork, including anything accessed through connectors. Least privilege is the rule: grant only what’s absolutely necessary for the task, and nothing more.

Amid a wave of package-supply attacks, this Product Talk visual launches a 3-part guide to safer AI building—starting with Cowork safety today, then Claude code config next week, and off-device development coming soon.

What about skills? Cowork supports skills, and you can add third-party skills inside the quarantine room. If you’re not placing your own data in that room, you can afford more risk. The moment you add sensitive or proprietary data, be selective. Skills can include third-party code, and bad actors use skill directories to distribute malicious payloads. Personally, I never use third-party skills as-is. If one looks useful, I read through the files, then ask Claude to recreate it so I understand what it does and maintain control. If I were to use third-party skills, I’d do it in Cowork and keep their data access to the minimum necessary.

Overall, Cowork is a solid, “safe-ish” option if you’re disciplined about what you share. The challenge is that utility often requires access to real data—exactly what we’re trying to protect. In an upcoming deep dive, I’ll outline strategies to keep malicious code out in the first place. While I’ll focus on local development, the same patterns can extend to Cowork with a bit of setup.

One more important clarification: don’t confuse Cowork with the Code tab in the Claude Desktop app. Cowork runs code inside a virtual machine. The Code tab does not. If you ask Claude to write and execute code from the Code tab, that code runs on your local device and you’re fully responsible for security. There is one exception: the Code tab can run code in Anthropic’s cloud; I’ll cover that approach when we get into moving development off the local machine.

To summarize Cowork’s protections against the attacker’s three-step pattern: installs and scripts still run, but they’re contained inside an isolated virtual machine instead of your real device; access to sensitive data is strongly limited to the specific folders you mount, leaving the rest of your filesystem (including unrelated credentials) out of reach; data exfiltration is partially constrained because Anthropic limits outbound network traffic from the VM—helpful, but not absolute. By contrast, local Code tab sessions offer no isolation, no filesystem restrictions, and no network limits—so any malicious install scripts run directly on your machine with full access and open egress.

My takeaways so far: I still love building with AI, but I’m doing it more cautiously. Cowork offers meaningful containment when used deliberately. I still prefer the flexibility of Claude Code, and I’ve reconfigured my setup to reduce risk. Even so, “safer” isn’t “safe,” which is why I’m increasingly shifting development off my local device to more controlled environments. I’ll share the practical details—tools, configs, and scripts—in the next installments.

If this perspective is useful, let me know. I want builders to move fast—and safely—through this new era of agentic AI. Until then, stay safe out there.

Inspired by this post on Product Talk.

June 3, 2026
Beyond the Product Builder Hype: How AI, org design, and joy shape PM success

I recently spent time with the debate behind the "product builder" trend—asking whether it’s the future of product management or just another wave of tech FOMO. The conversation featuring Teresa Torres and Petra Wille is a useful prompt, but what matters most is how we translate these ideas into healthy product practices inside our own organizations.

Here’s my take: the product builder movement is neither a mandate nor a fad—it’s a tool. The right question isn’t "should product managers code?" but whether leaning into building advances outcomes for our customers and our teams. In practice, that means letting interest and skill—not pressure—set the pace.

Petra captured it perfectly: "Just because I can do it — is it something I enjoy doing? And do I have enough experience to really get into the flow?" Those two tests—joy and depth—are underrated filters. I’ve seen PMs light up when prototyping or vibe coding a thin slice, and I’ve also seen well-meaning dabbling create hidden complexity that slows everyone down later.

Org design determines whether this works. It’s not about the tools—it’s about clarity of roles, healthy interfaces between product, design, and engineering, and explicit guardrails for where experiments stop and production begins. AI has raised the stakes: "AI can make unskilled work look polished. That’s a feature and a bug — executives see the shine, engineers inherit the mess." If you’ve ever watched a glossy demo turn into weeks of refactors, you know exactly what this looks like.

To avoid that trap, I deliberately separate the three layers where AI is changing product work: personal productivity, team process, and product strategy. Treating these as different stacks keeps expectations clean: a prompt that accelerates personal workflows isn’t the same as an AI-enhanced process that reshapes delivery, and neither automatically produces durable product advantage. Don’t conflate them.

Discovery remains stubbornly human. "Why discovery still requires talking to your customers (sorry)" is more than a friendly nudge. AI can broaden our search space and sharpen analysis, but it doesn’t replace qualitative conversations or the judgment that comes from pattern recognition across real customer contexts. Continuous discovery and disciplined customer interviews are still the most reliable compasses we have.

Where does "vibe coding" fit? It’s great for roughing out concepts, de-risking slices, and communicating intent when words or static mocks won’t cut it. Tools like Claude Code make this faster than ever, and familiar stacks like Ruby on Rails lower the bar for spinning up functional prototypes. But remember the design system trap: AI can make bad decisions look good on the surface. If you don’t control for architecture, accessibility, data contracts, and handoff quality, your team pays the integration tax later.

In well-set-up orgs, the output-oriented muscle memory gets rewired. When AI frees up time, strong teams reinvest it into better problem framing, sharper opportunity solution trees, and tighter product strategy—rather than simply chasing more output. That’s a leadership challenge, not a tooling problem, and it shows up quickly in how teams make trade-offs.

Here’s how I operationalize this with empowered product teams: we articulate clear boundaries for prototypes versus shippable code, define decision rights for when PMs or designers "build," and align on review gates that protect quality without stifling speed. We also make the three AI layers explicit in roadmapping and retros, so improvements to personal workflows don’t get mistaken for strategic advantage.

My distilled guidance echoes the episode’s throughline. The product builder trend isn’t a mandate — it’s a tool. Let enjoyment and skill guide who on your team leans into it. Organizational readiness determines whether AI empowers your team or creates chaos. Don’t conflate personal efficiency, process change, and product impact—they require different responses. Discovery fundamentals haven’t changed; AI helps you go deeper, not skip the work. And the real takeaway on product builders: not everyone has to build, but everyone can if they want to.

If you want to hear the full discussion that sparked these reflections, listen on Spotify or Apple Podcasts. Then tell me: where will you apply builder energy in your team—and where will you deliberately say no?

Resources & Links: Follow Teresa Torres: https://ProductTalk.org. Follow Petra Wille: https://Petra-Wille.com. Mentioned in this episode: Claude Code, Vibe coding, Ruby on Rails.

One more quote I loved because it centers autonomy and craft: "It’s a tool in our toolbox. We can decide who on our team has fun with it, wants to do it, wants to contribute." That’s the mindset that sustains both momentum and morale.

Inspired by this post on Product Talk.

May 12, 2026
Vibe Coding Unleashed: How Parallel Agents Build KPI Driver Trees in Under Two Hours

I’ve been exploring what I call the next level of vibe coding: orchestrating agentic AI to build complex product artifacts in minutes, not days. The breakthrough comes from ditching linear handoffs and embracing true parallelism—letting specialized agents tackle the work simultaneously while I steer the orchestration. In product management contexts where speed and clarity matter, this shift changes everything.

Building a KPI Driver Tree in two hours becomes possible when you stop building sequentially and start building with parallel agents.

For product leaders, a KPI Driver Tree is the fastest way to make strategy legible. It ties high-level outcomes to the levers we can actually pull—features, channels, pricing, onboarding, activation, and retention mechanics—so we can prioritize with confidence. Done well, it connects outcomes vs output OKRs, clarifies measurement, and aligns the team around a shared, testable model of growth.

Here’s how I operationalize it with agentic AI and AI workflows. I spin up a small team of specialized parallel agents: a Metrics Librarian (taxonomy and definitions), a Data Modeler (event and table design), a Research Synthesizer (voice of customer and causal hypotheses), a UX Prototyper (visualizing the tree and flows), and a QA/Evaluator (logic and consistency checks). An Orchestrator coordinates these agents, resolves conflicts, and composes outputs into a single, production-ready artifact—while I set constraints, review deltas, and decide.

In a typical two-hour sprint, all agents run at once. While the Metrics Librarian finalizes the KPI ontology, the Data Modeler validates instrumentable events and joins, and the UX Prototyper renders an interactive driver tree for a unified analytics platform. Meanwhile, the Synthesizer maps qualitative insights to quantitative levers, and the Evaluator stress-tests assumptions. Because we’re not waiting for sequential handoffs, we converge on a coherent driver tree and its initial measurement plan in one pass.

The payoff isn’t just speed—it’s higher-quality decisions. Parallel agents reduce context loss, expose trade-offs earlier, and allow me to compare multiple viable paths side-by-side. This accelerates continuous discovery, aligns with product strategy, and gives product managers and LLMs for product managers a clear, living map of how inputs roll up to outcomes. It’s the closest I’ve found to running a product trio at machine speed.

Guardrails matter. I pair this approach with strong data governance, privacy-by-design, and eval-driven development so every agent’s output is testable and auditable. Clear prompts, scoped corpora, and consistent acceptance criteria keep the Orchestrator honest, while lightweight Agent Analytics helps me see where reasoning falters and where to improve the system.

If your team is still tackling analytics artifacts sequentially—requirements, then instrumentation, then visualization—consider switching mental models. Treat the driver tree as the backbone, empower parallel agents to co-create around it, and reserve human judgment for the critical calls. This is vibe coding for product management: creative, fast, and grounded in measurable outcomes.

Inspired by this post on Pendo – Best Practices.

February 5, 2026