Tag: privacy-by-design

  • Package Hack Wake-Up Call: My Playbook for Securing Cowork, Coding Agents, and Secrets

    Package Hack Wake-Up Call: My Playbook for Securing Cowork, Coding Agents, and Secrets

    I love being a builder. It feels like a superpower I can’t stop using, and lately I’ve been channeling it into better workflows, faster experimentation, and sharper product thinking.

    I tinker with my Claude Code workflows to make every day more effortless. I’m having a blast creating AI-generated interview snapshots and opportunity solution trees for Vistaly. I also spend time digging into traces and iterating on the AI coaches I use for our discovery courses.

    Then the recent wave of malicious software spreading through the open-source community popped my bubble. It hit companies big and small—names like OpenAI, PostHog, and Zapier. As I dug in, I realized what many cybersecurity experts have long known: this is a deep rabbit hole. If I want to build responsibly, I have to get significantly better at protecting my devices, credentials, and code. And if you’re building with AI or modern tooling, you likely do, too.

    Here’s why. We all rely on open-source software. Most modern applications assemble tried-and-true components—parsing a PDF, handling dates across time zones, visualizing spreadsheet data, connecting to an API—rather than reinventing them. The same is true for agent skills and MCP servers; they accelerate how we get value from models. This is overwhelmingly a good thing. But it also creates an attack surface that bad actors exploit.

    We don’t need to abandon third-party code. We do need to understand the mechanisms attackers use and consistently defend against them.

    Infographic titled 'When Trusted Packages Go Rogue' summarizing a talk on package hacks: worm spread, defense framework, risks from AI coding tools, and practical mitigation steps, with security-themed icons.
    When one malicious worm compromises hundreds of packages, what should dev teams do? This visual teaser maps the agenda—how it spreads, how to guard against it, AI tool risks, and concrete steps to mitigate.

    On May 11th, I started seeing tweets about a TanStack hack. At that time, I didn’t know what TanStack was. But apparently, it’s a popular set of JavaScript libraries that are used by a lot of React sites. At first, I didn’t pay much attention. Then I learned the packages were compromised by a worm—malicious software that self-replicates—and it spread quickly. Within hours, dozens of packages were implicated; by day’s end, it was in the hundreds. That’s when I knew I had to lean in.

    If you’ve explored safe development practices with coding agents before, you’ve seen the basics of package safety. A package is a bundle of reusable code shared through registries, and nearly every app you use depends on them. The unfortunate twist with this specific hack, known as the Mini Shai-Hulud worm, is that it shows prior “safe enough” heuristics aren’t sufficient. Popularity and trust signals don’t guarantee safety. We have to do more.

    So here’s what I’ll cover today: how malicious software typically works, a practical framework for guarding against it, the specific risks of using Cowork to write and run code, and concrete steps to mitigate that risk. My goal is simple: help you keep building—despite the risks—while protecting your data and your business.

    Quick disclaimer: I’m not a security expert. I’m sharing my personal journey and what I’ve learned through research and hands-on work. Please use your best judgment when applying any of this.

    Infographic showing a 3‑step pattern in malicious software: enter via package or script, search a device for sensitive data, then exfiltrate to an attacker, with icons and expanding entry points.
    Package hacks share a simple playbook: get in, sweep for secrets, and phone home. This visual breaks down the 3 steps and flags new entry points—from packages to MCP servers, agent skills, and app extensions.

    An agent recently scoured over 230,000 malicious software incidents and found that most malicious software follows a similar pattern. First, it needs an entry point onto your computer. Once installed, it scours your device for sensitive data, and then it uses your network connection to send that data to its own servers. The Mini Shai-Hulud worm spreads via malicious package install scripts that run at download time, then searches the device for credentials (including package publishing rights), poisons additional packages to continue replicating, and uses multiple channels—including the victim’s own GitHub public repos—to distribute secrets.

    In practice, most attacks boil down to three steps: 1) It finds an entry point to your device. 2) It searches your device for sensitive data. 3) It sends that data to its own server. The good news: this pattern also tells us how to defend. We can harden entry points, minimize what code and agents can access, and constrain outgoing network traffic.

    Keep in mind that install scripts aren’t the only entry vector. Any code that runs on your machine could contain malicious payloads: third-party packages, agent skills, MCP servers, browser or desktop extensions—the list is long. As coding agents and “vibe coding” tools become mainstream, more non-engineers are exposed to the same risks engineers have managed for years.

    You might be at elevated risk if you do any of the following: you download and use third-party skills or MCP servers; you let Claude Code, Codex, or other coding agents write scripts that run locally and use third-party packages; you use an IDE like VS Code or Cursor with third-party extensions; or you install third-party extensions in tools like Obsidian. This isn’t an exhaustive list, but if any of these apply, it’s worth tightening your approach.

    Infographic titled 'Are You at Risk?' listing third-party code exposure points: agent skills and MCP servers, coding agents on local devices, IDE extensions (VS Code, Cursor), and Obsidian plugins.
    Relying on third-party code? This visual highlights four common risk zones—agent skills/MCP servers, coding agents, IDE extensions, and Obsidian plugins—and urges a review of downloads, local scripts, and add-ons.

    The “safest” approach would be to avoid installing third-party software on your local device entirely. That’s not realistic. We all depend on third-party components in our stack. So I’ll start with one of the most common paths for non-engineers writing and running code today: Cowork.

    Evaluating Cowork’s safety was eye-opening. Cowork offers meaningful protection—more than running code directly on your machine—but it isn’t bulletproof. There’s a notable gap you should understand.

    Here’s how Cowork helps. It runs code inside a virtual machine, which isolates the execution environment from your real device—a quarantine room for code. While Cowork doesn’t fully control what comes into the room (that part is on you), if malicious code gets in, it’s contained and cannot reach the rest of your filesystem. Cowork also limits outbound network traffic from the virtual machine, which helps disrupt data exfiltration. However, it’s not foolproof.

    Because Claude can install packages inside Cowork, it remains susceptible to malicious code like the Mini Shai-Hulud worm. And GitHub is on the allow list so Cowork can read and write to your repos. Since the Mini Shai-Hulud worm uses GitHub to publish secrets, this creates exposure. The crucial mitigation: if you never give Cowork access to sensitive data, there’s nothing for an attacker to steal.

    Infographic titled 'Does Cowork Keep You Safe?' with three points: entry point contained, data safe only if kept outside, and partially limited network traffic, highlighting risks in package attacks.
    A quick visual from a security deep dive on package hacks shows how Cowork handles threats: entry points are contained, data is only safe when kept outside, and network traffic is partly limited—making shared data the gap to watch.

    Your responsibility is straightforward but critical: your data is only safe if it stays outside the virtual machine. When you mount folders into Cowork, those folders become accessible to any code running inside the VM. That includes malicious scripts. Before sharing, ask two questions: do the folders contain any credentials or secrets, and do they include proprietary data that would be harmful if accessed?

    It’s common for code to need credentials. That’s why Cowork includes connectors to third-party sources like Google Drive and Slack. Credentials configured for these connectors never enter the VM—they remain outside the quarantine room—so they’re not exposed to malicious code. But if your code requires additional credentials inside the VM, scope them tightly and assume they could be compromised.

    You can also use custom MCP servers you create yourself with Cowork. Those credentials stay outside the VM as well, provided the MCP servers are remote (hosted on a web server, not downloaded locally). It’s more work than dropping in a local server, but it keeps secrets out of reach from VM-executed code.

    Beyond credentials, scrutinize the actual content you share with Cowork, including anything accessed through connectors. Least privilege is the rule: grant only what’s absolutely necessary for the task, and nothing more.

    Infographic titled 'Keep Building. Stay Safe.' outlining a 3-part series for AI builders: 1 Cowork Safety, 2 Claude Code Config, 3 Off-Device Development, with teal security, AI, and cloud icons and a 'Product Talk' label.
    Amid a wave of package-supply attacks, this Product Talk visual launches a 3-part guide to safer AI building—starting with Cowork safety today, then Claude code config next week, and off-device development coming soon.

    What about skills? Cowork supports skills, and you can add third-party skills inside the quarantine room. If you’re not placing your own data in that room, you can afford more risk. The moment you add sensitive or proprietary data, be selective. Skills can include third-party code, and bad actors use skill directories to distribute malicious payloads. Personally, I never use third-party skills as-is. If one looks useful, I read through the files, then ask Claude to recreate it so I understand what it does and maintain control. If I were to use third-party skills, I’d do it in Cowork and keep their data access to the minimum necessary.

    Overall, Cowork is a solid, “safe-ish” option if you’re disciplined about what you share. The challenge is that utility often requires access to real data—exactly what we’re trying to protect. In an upcoming deep dive, I’ll outline strategies to keep malicious code out in the first place. While I’ll focus on local development, the same patterns can extend to Cowork with a bit of setup.

    One more important clarification: don’t confuse Cowork with the Code tab in the Claude Desktop app. Cowork runs code inside a virtual machine. The Code tab does not. If you ask Claude to write and execute code from the Code tab, that code runs on your local device and you’re fully responsible for security. There is one exception: the Code tab can run code in Anthropic’s cloud; I’ll cover that approach when we get into moving development off the local machine.

    To summarize Cowork’s protections against the attacker’s three-step pattern: installs and scripts still run, but they’re contained inside an isolated virtual machine instead of your real device; access to sensitive data is strongly limited to the specific folders you mount, leaving the rest of your filesystem (including unrelated credentials) out of reach; data exfiltration is partially constrained because Anthropic limits outbound network traffic from the VM—helpful, but not absolute. By contrast, local Code tab sessions offer no isolation, no filesystem restrictions, and no network limits—so any malicious install scripts run directly on your machine with full access and open egress.

    My takeaways so far: I still love building with AI, but I’m doing it more cautiously. Cowork offers meaningful containment when used deliberately. I still prefer the flexibility of Claude Code, and I’ve reconfigured my setup to reduce risk. Even so, “safer” isn’t “safe,” which is why I’m increasingly shifting development off my local device to more controlled environments. I’ll share the practical details—tools, configs, and scripts—in the next installments.

    If this perspective is useful, let me know. I want builders to move fast—and safely—through this new era of agentic AI. Until then, stay safe out there.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Turn Customer Data into Real Experiences: What I Look For in a Brussels Strategy Partner

    Turn Customer Data into Real Experiences: What I Look For in a Brussels Strategy Partner

    I focus every day on turning raw customer signals into meaningful product experiences that create measurable outcomes. Human37 is a Brussels-based customer data strategy agency helping organizations turn data into real customer experiences. That statement sets a useful standard for the kind of partner I look for: one that helps us move beyond reports and into shipped value customers can feel.

    What matters most to me is the bridge between discovery and delivery—how insights inform product strategy and roadmaps without slowing execution. The strongest partners operationalize behavioral analytics within a unified analytics platform, connect qualitative learning with quantitative evidence, and make journey mapping a living artifact rather than a slide. Tools like Amplitude analytics can accelerate this work, but the real differentiator is the operating model that converts data into decisions and decisions into outcomes.

    When I evaluate a customer data strategy partner, I look for five things: rigorous data governance and privacy-by-design; clean event taxonomy and robust identity resolution; clear experimentation workflows that tie to activation and retention analysis; practical enablement for product teams (not just analysts); and a bias for product-led growth rooted in real user behavior. If a partner can’t articulate how insights ladder to user activation and long-term value, they’re not ready to guide the roadmap.

    Here’s how I sequence the work to turn signals into experiences: first, define the outcomes that matter and the driver trees behind them; second, instrument events and unify identities to power trustworthy behavioral analytics; third, map critical paths with journey mapping to expose friction and moments of delight; fourth, run focused experiments linked to product strategy, not vanity metrics; finally, scale what works with in-product experiences and lifecycle messaging that compounds retention.

    The payoff is speed and clarity: faster time-to-insight, more confident bets, and fewer handoffs between data teams and product builders. If you’re exploring European partners, a Brussels-based agency with a sharp customer data strategy capability can help you move from analysis to action. The litmus test is simple—can they help your team ship experiences that customers notice and your metrics confirm?


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • My Playbook for Safe AI Analytics in Financial Services: Compliance, Trust, and Real Workflows

    My Playbook for Safe AI Analytics in Financial Services: Compliance, Trust, and Real Workflows

    I spend a lot of time helping financial services teams adopt AI analytics without compromising on risk, compliance, or customer trust. The stakes are high: regulations are evolving, data sensitivity is non‑negotiable, and a single misstep can erode confidence. That’s why my approach centers on governed AI, rigorous data governance, and measurable business value—not flashy demos.

    Learn how Amplitude delivers safe, governed AI analytics for financial services—aligned to compliance, built for trust, and ready for real workflows.

    In practice, “safe and governed” means clear lines of accountability and controls that hold up under audit. I look for privacy-by-design principles, role-based access controls, robust audit trails, and granular data permissions that keep sensitive data segregated. Strong AI risk management also requires model oversight—documented policies, human-in-the-loop review where needed, and explainability for high-impact decisions. Above all, the platform must meet regulatory compliance expectations and support the organization’s risk posture without slowing teams down.

    Real workflows are where the value shows up. In financial services, that can mean using behavioral analytics to understand user intent, applying anomaly detection to surface suspicious patterns earlier, and empowering product managers and analysts to iterate safely within a unified analytics platform. When these capabilities are built into the core analytics motion, I see faster detection of issues, clearer attribution of outcomes, and more confident decision-making—all while staying within governance guardrails.

    When I evaluate a solution, my checklist is simple and strict: does it enforce strong data governance by default; does it provide transparent, auditable AI behaviors; can it scale securely to meet enterprise requirements; does it tie insights directly to product and growth outcomes; and will it help risk, compliance, and product teams work together instead of at cross purposes? If the answer is yes across that list, the platform earns a place in the enterprise toolbelt.

    Done right, governed AI analytics give financial services teams the confidence to move faster with less risk. You gain sharper insights from behavioral data, earlier warning from anomalies, and the trust that comes from controls that are aligned to compliance and resilient under scrutiny. That’s the path to durable advantage: responsible AI that accelerates learning, protects customers, and translates directly into better products and performance.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Unlocking Session Replay at Scale: How Amplitude Elevates UX, Observability, and Trust

    Unlocking Session Replay at Scale: How Amplitude Elevates UX, Observability, and Trust

    I build products to translate noisy interaction data into clear, actionable decisions. Few capabilities deliver that clarity like session replay. It closes the gap between what analytics tells us and what users actually experience, empowering product, design, and SRE teams to learn faster, resolve issues sooner, and improve customer trust.

    Lew Gordon is a Senior Staff Engineer at Amplitude focusing on Session Replay. He was formerly an engineer at Twilio.

    In my practice, session replay complements Amplitude analytics and behavioral analytics by adding rich context to the unified analytics platform—turning charts into stories we can act on. When I can see the precise clicks, hesitations, and error states behind a spike or a drop, prioritization becomes straightforward and the path to product-market fit becomes easier to navigate.

    Operationally, replay deepens observability. I correlate console errors, network traces, and layout shifts with user intent, then tie those signals to Web Vitals, performance budgets, and SRE workflows. The result is a tighter feedback loop from incident to insight—one that shortens mean time to resolution and raises the bar on reliability without guesswork.

    Privacy-by-design is non-negotiable. I start with strong data governance: selective capture and redaction, explicit consent and retention policies, role-based access, and environment-aware sampling. These controls keep sensitive data protected while still providing the fidelity product and engineering need to diagnose issues and improve experiences responsibly.

    Strategically, I deploy replay where it moves the needle most: onboarding and activation moments, high-friction conversion flows, and critical paths with outsized revenue or trust impact. I track signals like rage clicks, dead clicks, scroll depth, and error states to inform product strategy and reduce UX debt, while linking improvements to activation and retention analysis, time to resolution, and DORA metrics.

    At scale, success requires platform scalability: efficient indexing, low-latency retrieval, and smooth playback across browsers and devices—all while maintaining tight CPU, memory, and bandwidth budgets. When integrated with CI/CD and experimentation, replay becomes a force multiplier for continuous discovery and confident, rapid iteration.

    My takeaway: session replay is not just a debugging tool—it’s a shared language across product, engineering, and design. With the right guardrails and operating model, it elevates decision quality, accelerates learning, and builds the trust customers feel with every interaction.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • AI Data Security for Product Teams: Protect Sensitive Product Data Without Slowing Innovation

    AI Data Security for Product Teams: Protect Sensitive Product Data Without Slowing Innovation

    Protecting product data has never felt more urgent. Every week, my teams experiment with gen ai prototypes and LLM-powered capabilities, and I’m accountable for ensuring our innovation never compromises cybersecurity, privacy, or customer trust. The goal is not to slow down—it's to build in the right guardrails so speed and safety reinforce each other.

    Understand AI data security risks in product teams, what product data is most exposed, and how to use AI tools responsibly without slowing innovation.

    When I assess AI risk with product managers, I start with how data moves. The biggest threats usually come from prompt and context leaks, unsafe logging of sensitive inputs or outputs, permissive access controls, unmanaged third-party model usage (shadow AI), and unclear data-retention policies. For LLMs for product managers, I emphasize that every step in AI workflows—from collection to processing to storage—must assume adversarial conditions.

    In my experience, the product data most exposed includes customer PII and payment identifiers, internal strategy documents and roadmaps, analytics and behavioral telemetry tied to users, feature flags and configuration values, embeddings and vector stores that can reveal sensitive patterns, and the prompts or contexts themselves. Even “harmless” evaluation datasets can contain inferred identities. Treat all of this as high-value assets in your data governance model.

    I apply privacy-by-design from the first discovery conversation: minimize data by default, redact or tokenize before any external model call, and separate identities from content wherever possible. A retrieval-first pipeline helps keep raw customer data within our boundary while still enabling relevant context. We combine deterministic safeguards (policy-based redaction, allow/deny lists) with runtime observability to detect anomalous prompts, outputs, or access patterns.

    To keep velocity high, we operationalize risk rather than debate it ad hoc. A lightweight risk scoring rubric classifies each capability (e.g., internal-only, customer-facing, regulated data adjacent) and dictates controls: redaction requirements, human-in-the-loop thresholds, eval-driven development gates, and incident response readiness. These controls live in CI/CD so product teams get fast, automated feedback without waiting on meetings.

    Partnership is essential. I bring Security, Legal, and Data partners into the product trios early to align on regulatory compliance and threat modeling while scoping solutions that meet outcome goals. We maintain a shared catalog of approved providers and architectures, document data flows, and version our policies just like code—so everyone can see what changed and why.

    Vendor diligence is non-negotiable. I ask LLM providers about data retention and training usage, encryption at rest and in transit, key management, regional data controls, audit posture (SOC 2, ISO 27001, HIPAA where needed), and support for private networking. We restrict scopes with least-privilege access and instrument robust observability for threat detection and response across the full path, not just the API call.

    Culture makes the biggest difference. I coach teams on prompt hygiene, secret handling, and context window management; we publish redaction patterns, approved libraries, and clear do/don’t examples. When incidents happen, we treat them as learning opportunities, run blameless reviews, and update our playbooks, guardrails, and training materials accordingly.

    The outcome I aim for is confidence with speed: we ship AI features that customers love while protecting the data they entrust to us. With a clear risk model, strong data governance, and embedded controls, product teams can innovate boldly—without compromising on security or trust.


    Inspired by this post on Product School.


    Book a consult png image
  • How I Safely Deploy Amplitude AI in Healthcare: Governed Analytics, PHI-Safe Workflows, Real ROI

    How I Safely Deploy Amplitude AI in Healthcare: Governed Analytics, PHI-Safe Workflows, Real ROI

    Healthcare leaders ask me the same question every week: how do we unlock AI-driven insights without risking patient trust or regulatory missteps? My approach is pragmatic and proven—connect business goals to measurable behavioral analytics, wrap everything in clear governance, and keep protected health information (PHI) out of the analytics layer by default. In other words, we earn the right to scale by making safety, compliance, and transparency visible in every step of the workflow with Amplitude AI.

    At the core, I anchor our rollout on "governed analytics"—curated events, certified metrics, and role-based access that make audits straightforward and decision-making fast. When product, data, security, and compliance share a single source of truth in Amplitude analytics, we reduce rework, eliminate ambiguous definitions, and ship improvements with confidence. This is where AI Strategy meets operational excellence: a unified analytics platform that balances velocity with verification.

    From there, I establish "PHI-safe workflows" by drawing a hard boundary around what data enters analytics. Behavioral signals flow in; identifiers stay in clinical systems. I lean on privacy-by-design, data minimization, and clear data governance so we can demonstrate regulatory compliance before a single end user is exposed to a new AI-powered experience. That alignment builds trust with legal and security, shortens review cycles, and operationalizes AI risk management without slowing innovation.

    Insights must be "trusted insights"—reliable enough to drive care pathways, staffing decisions, and patient communications. I emphasize repeatable instrumentation, observability of data quality, and transparent lineage so teams can trace outcomes back to inputs. In practice, that means we agree on event contracts, enforce change control, and verify that behavioral analytics reflect real-world adoption and efficacy across patient and provider journeys.

    To move decisively from legal review to production, I run a two-speed rollout. First, we validate in a sandbox with synthetic or de-identified data to pressure-test prompts, dashboards, and alerting. Then we graduate to controlled pilots with strict guardrails, documented data flows, and pre-agreed risk mitigations. By the time we scale, stakeholders have evidence, not just assurances—accelerating approvals and reducing last-minute scope churn.

    One pattern I rely on is connecting AI outcomes to product metrics that matter: activation, time-to-first-value, task completion rates, and variance in outcomes across segments. With Amplitude analytics, we can spot drop-offs, attribute improvements to specific design or model changes, and quantify impact in language that resonates with executives and clinicians alike. That rigor is what transforms AI from a promising prototype into a dependable operating capability.

    Success looks like faster time-to-insight, fewer compliance iterations, and audit-ready documentation built into normal workflows. It also looks like teams who are confident enough in their data to run A/B testing and continuous discovery—because they know their dashboards reflect reality. When governance, safety, and clarity are designed in, product-led growth becomes compatible with healthcare’s unique regulatory and ethical obligations.

    "See how to adopt AI in healthcare safely with Amplitude, using governed analytics, PHI-safe workflows, and trusted insights that help teams move from legal review to real usage." That’s the journey I guide teams through—measurable, compliant, and humane—so we can deliver AI that clinicians trust, patients respect, and leaders can scale.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Bad Advice from Your AI Clone? Ethics, IP, and How Product Leaders Protect Quality

    Bad Advice from Your AI Clone? Ethics, IP, and How Product Leaders Protect Quality

    What happens when an AI starts giving advice in your voice—advice you’d never actually give? I’ve been thinking a lot about that question, and this conversation hit home for me as a product leader navigating the fast-evolving reality of AI “clones.”

    Listen to this episode on: https://open.spotify.com/episode/7DNDIlIimwbbMOytArewRp?ref=producttalk.org | https://podcasts.apple.com/kh/podcast/bad-advice/id1794203808?i=1000756914818&ref=producttalk.org. Prefer video? Watch on YouTube: https://www.youtube.com/embed/RF4BwaeMMlg?feature=oembed

    The episode examines AI “clones” built from podcast transcripts and public content—where the experimentation feels exciting, where it crosses ethical lines, and what happens when mediocre AI outputs get attributed to real people. The tension is real: when a bot confidently answers in your style but misses the nuance, “it’s not me” becomes more than a disclaimer—it’s a reputational defense.

    We dig into the messy parts: IP ownership of open-sourced transcripts, the role of pirated books in LLM training sets, rising inference costs, and the uncomfortable economic question: if anyone can prompt “act like Teresa,” how do creators make a living? In my own decision-making, I look for clear consent, guardrails that prevent impersonation, and transparent UX that never confuses a synthetic perspective with a human expert.

    This isn’t anti-AI. It’s a nuanced conversation about quality, consent, and remembering there are real humans behind the ideas.

    Here’s how I translate the key takeaways into practice. Using AI for perspective is fine—equating it to the real person isn’t. Free-feeling AI outputs still rely on someone’s work. Expertise is more than past content—it’s context, judgment, and evolution. If someone’s work influences you, find a way to support them. These principles help teams benefit from gen ai without eroding trust or the creator ecosystem.

    “Technically possible” doesn’t mean “ethically okay.” My AI Strategy playbook includes privacy-by-design, clear data governance on training materials, and a bright line between inspiration and impersonation. When we ship AI features, we label synthetic outputs, avoid mimicking living experts without permission, and create paths to compensate or promote the humans whose thinking underpins the experience.

    I’ve also tested the “act like X” pattern to stress-test product quality. Even when outputs sound plausible, they rarely capture the expert’s mental models, trade-offs, or the evolution of their thinking—especially in complex product discovery work. That gap is the difference between average AI text and expert product management leadership.

    If you listen, consider a few reflection prompts: Have you ever used AI to “act like” someone you admire? Could you tell whether the output matched that person’s actual thinking? How do you decide what’s ethically okay when using public content in LLMs? And how can we support creators while still embracing new tools?

    Resources & Links you may find helpful: Follow Teresa Torres: https://ProductTalk.org; Follow Petra Wille: https://Petra-Wille.com; Delphi.ai (AI bot platform discussed): https://www.delphi.ai/?ref=producttalk.org; Lenny’s Podcast: https://www.lennysnewsletter.com/podcast?ref=producttalk.org; ChatGPT: https://chatgpt.com/?ref=producttalk.org; Petra’s Coaching Packages: https://www.petra-wille.com/coaching-packages?ref=producttalk.org; Teresa’s Product Talk: https://www.producttalk.org/; Teresa’s book Continuous Discovery Habits: https://www.producttalk.org/continuous-discovery-habits/; Lenny’s open-sourced podcast transcripts: https://www.dropbox.com/scl/fo/yxi4s2w998p1gvtpu4193/AMdNPR8AOw0lMklwtnC0TrQ?rlkey=j06x0nipoti519e0xgm23zsn9&e=1&st=ahz0fj11&dl=0&ref=producttalk.org

    Have thoughts on this episode or practices that have worked in your org? Share them below—I’m keen to learn how other teams are balancing innovation with integrity.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Building Foundational AI Platforms That Ignite Innovation and Redefine Analytics Strategy

    Building Foundational AI Platforms That Ignite Innovation and Redefine Analytics Strategy

    I’ve spent my career building and scaling product platforms, and I’ve seen firsthand how the right AI Strategy can unlock disproportionate impact. Foundational AI platforms are the engine room of modern analytics—when they’re done well, they compress time-to-insight, improve quality, and empower empowered product teams to deliver outcomes that matter.

    Across leading analytics ecosystems, including Amplitude analytics, the winning pattern is consistent: invest in a unified analytics platform that abstracts complexity while enabling rapid iteration. By standardizing data governance and privacy-by-design, teams gain the freedom to experiment confidently without sacrificing compliance or security.

    For me, “foundational AI platforms” means pragmatic building blocks that product and engineering can trust: evaluation harnesses for models, retrieval pipelines that surface the right context, feature stores that ensure consistency, and CI/CD with robust observability. When these AI workflows are in place, behavioral analytics, anomaly detection, and A/B testing stop being one-off projects and become repeatable capabilities.

    The payoff isn’t just efficiency—it’s strategic differentiation. Internal innovation accelerates when teams can go from idea to live experiment in days, not quarters. That speed shapes the future of AI analytics: richer insights woven directly into product experiences, LLMs for product managers to prototype faster, and analytics that feel conversational, contextual, and deeply actionable.

    Execution still makes or breaks the vision. I align product strategy around outcomes vs output OKRs, pair product trios with forward-deployed engineers, and use a clear build vs buy rubric for platform components. The goal is platform scalability without reinventing the wheel—own the parts that differentiate, integrate the rest, and keep your interfaces painfully simple.

    If you’re leading this journey, start by mapping your critical use cases to platform capabilities, close gaps in data governance, and stand up an eval-driven development loop. Within one or two quarters, you should see a measurable lift in deployment frequency, a sharper signal on performance, and a culture that ships with confidence. That’s how foundational AI platforms empower internal innovation and help define the future of AI analytics.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Multi‑Agent Systems Demystified: Why One AI Isn’t Enough—and How I Ship Faster With Many

    Multi‑Agent Systems Demystified: Why One AI Isn’t Enough—and How I Ship Faster With Many

    In my day-to-day building AI products, I’ve learned a simple truth: a single model can be brilliant, but a coordinated team of specialized agents is what consistently ships outcomes customers trust. That’s the promise of multi-agent systems—multiple AIs with distinct roles collaborating inside robust AI workflows to deliver accuracy, speed, and resilience you can’t get from a lone model.

    Think of a multi-agent system as a well-run product trio for machines: a planner decomposes the job, specialists execute focused tasks, a reviewer checks quality, and an orchestrator keeps everyone aligned. This agentic AI approach mirrors how high-performing teams work—divide complex problems, play to strengths, and create tight feedback loops.

    When does one AI stop being enough? Whenever tasks require tool use, domain retrieval, multi-step reasoning, or policy adherence under real-world constraints. In those moments, specialized agents shine—one for search using a retrieval-first pipeline, another for reasoning, another for action execution, and a final one for validation. The result is better accuracy with manageable latency and cost.

    The core architecture I rely on starts with a planner that breaks a goal into steps, followed by execution agents equipped with tools and grounded context. I pair this with context window management to keep prompts lean and relevant, and I insert a verifier (or critic) to catch logic slips and policy violations before results reach customers. A lightweight orchestrator coordinates handoffs and retries to keep the whole flow resilient.

    To make this production-grade, I treat observability as non-negotiable. Agent Analytics helps me see which agents are adding value versus adding latency, where failures cluster, and how prompts drift over time. From there, eval-driven development gives me measurable confidence: I codify representative tasks, run offline and shadow evaluations, and only promote changes that move accuracy and safety in the right direction.

    Governance is equally critical. I design privacy-by-design from the start, restrict data movement with strong data governance, and enforce policy constraints inside the workflow rather than after the fact. This includes red-teaming failure modes, rate-limiting tools, and capturing immutable traces for audits and post-incident reviews—habits borrowed from SRE culture that map well to AI systems.

    On the practical side, prompt engineering remains foundational, but it’s the system design that converts clever prompts into reliable outcomes. Tool access, retrieval quality, memory strategy, and error handling matter more than wordsmithing alone. I’ve found that small prompt improvements are amplified when the surrounding workflow is sound—and are overwhelmed when it isn’t.

    If you’re just starting, begin with a narrow use case and a minimal set of agents—planner, executor, and verifier—then expand. Use continuous discovery with real users to learn where the workflow fails in the wild, and iterate with tight release cycles. Treat every agent like a microservice with clear contracts, test coverage, and metrics, and you’ll unlock compounding gains without losing control.

    The payoff is tangible: faster shipping cycles, fewer regressions, and outcomes customers can actually rely on. When stakes are high and ambiguity is real, one AI is often a talented soloist—but a disciplined ensemble of agents is how I deliver dependable, scalable value at product velocity.


    Inspired by this post on Product School.


    Book a consult png image
  • AI Agent Deployment Mastery: My Proven Checklist to Ship Safely, Faster, and at Scale

    AI Agent Deployment Mastery: My Proven Checklist to Ship Safely, Faster, and at Scale

    Shipping AI agents is not like shipping a typical feature. The system learns, reasons, and takes action in unpredictable environments, and when it’s customer-facing, the stakes are high. Over the past few years, I’ve refined a practical checklist that helps my teams move quickly without breaking trust. It balances speed with safety, and ambition with accountability—exactly what you need to scale agentic AI in production.

    This checklist was forged in real launches—some smooth, some humbling. Early on, I watched an otherwise brilliant agent confidently offer a refund policy we didn’t have. That one incident made it clear: AI agents require a higher bar for guardrails, evals, and observability. Today, I won’t greenlight an AI rollout without these steps being explicit, owned, and testable.

    Start with outcomes, not output. I define the job-to-be-done, the target users, and the measurable business impact using outcomes vs output OKRs and driver trees. Success is not “ship an agent,” it’s “reduce first-response time by 40% with no drop in CSAT,” or “increase qualified demo bookings by 20% at a lower cost per acquisition.” Clear outcomes give the agent a purpose and the team a north star.

    Prepare the knowledge the agent will use. A retrieval-first pipeline beats raw prompting for most enterprise cases. I inventory sources of truth, set access controls, and enforce data governance from day one. That includes PII handling, redaction, retention policies, and privacy-by-design. If the agent can’t reliably retrieve the right fact at the right time, the rest doesn’t matter.

    Choose models and prompts with discipline. I align model selection with context window management, cost, latency, and tool-use requirements. Then I build prompts and tools together, not in isolation, and I keep temperature, stop conditions, and function-calling explicit. Most importantly, I use eval-driven development: golden datasets, task-specific metrics (accuracy, helpfulness, latency, cost), and target thresholds that must be met before widening rollout.

    Manage AI risk upfront. I treat jailbreaks, toxicity, and data leakage as product risks, not just security issues. I implement layered defenses—input/output filtering, policy checks, rate limits, and abuse monitoring—and define escalation paths and human-in-the-loop handoffs for ambiguous cases. Every risky capability needs an owner, a playbook, and a test.

    Build the pipeline that lets you iterate safely. Prompts, tools, policies, and retrieval configs go through the same CI/CD rigor as code. I use feature flags for progressive delivery, canary cohorts to limit blast radius, and clear rollback procedures. Observability isn’t optional; I track latency, token usage, cost, failure modes, and user outcomes. I also watch DORA metrics and deployment frequency to ensure we’re improving the engine, not just the output.

    Constrain autonomy intentionally. Agent behavior design matters as much as model choice. I set step limits, define tool whitelists, separate read vs write permissions, and specify decision checkpoints. When the agent is uncertain or confidence drops below a threshold, it hands off to a human or a deterministic workflow. Guardrails aren’t barriers; they’re bumpers that keep you on the track.

    Instrument what users experience, not just what models produce. I track activation, task success, self-serve completion rates, and time-to-value. I pair Agent Analytics with journey analytics so I can see where the agent helps or hurts. I also invest in UX trust cues—transparent explanations, undo paths, and in-app guides—so users feel in control. When the agent changes behavior through learning, the interface should make that understandable.

    If you’re shipping a voice AI agent, test in realistic conditions. I set targets for ASR accuracy, barge-in responsiveness, TTS prosody, and end-to-end latency. I predefine safe transfer logic for complex calls and ensure compliance for call recording and data retention. Voice amplifies both the magic and the mistakes; operational excellence is non-negotiable.

    Plan the business rollout like a product, not a press release. I align pricing (often consumption SaaS pricing), packaging, and SLAs with actual unit economics—tokens, inference, and retrieval. I equip solutions engineering with playbooks and reference architectures, wire up CRM integration for attribution, and put feedback loops into Intercom or the support stack so we learn from every interaction.

    Run operations like an SRE team. I define incident severity for AI-specific failures (e.g., harmful output, runaway cost, degraded retrieval), add alerting, and keep runbooks current. I schedule postmortems that feed directly into eval baselines and backlog priorities. Continuous discovery isn’t a ceremony; it’s the safety net that keeps improvements compounding.

    Close the loop on compliance and governance. From day zero, I document data flows, vendor scopes, and audit logs. I verify regulatory compliance and adopt privacy-by-design so I’m not retrofitting later. Transparency, user consent, and opt-outs aren’t just legal checkboxes; they’re trust-building tools that differentiate your product.

    The result of this checklist is speed with confidence. It gives my teams a common language to debate trade-offs, a clear path to production, and the guardrails to scale safely. If you’re preparing to deploy an agent, adapt these steps to your stack and your customers. Your future self—and your users—will thank you.


    Inspired by this post on Product School.


    Book a consult png image
  • Becoming AI Native: A Practical Playbook to Transform Strategy, Teams, Data, and Tech

    Becoming AI Native: A Practical Playbook to Transform Strategy, Teams, Data, and Tech

    AI Native is more than a feature set—it’s an operating system for the entire business. In my role leading product, I’ve seen that companies win when they treat AI as a first-class citizen across strategy, architecture, workflows, and go-to-market. In this narrative, I unpack what “AI Native: What It Means and How to Get There” looks like in practice, sharing the frameworks I use to align vision, technology, and teams around measurable customer outcomes.

    When I say AI Native, I mean a company where core value creation, customer experience, and internal operations are powered by AI end-to-end. It’s not just bolting on a chatbot. It’s rethinking product strategy, data foundations, and execution so we can deliver differentiated experiences faster, at lower cost, and with higher reliability. This shift demands clarity on where AI truly creates leverage—and the courage to say no where it doesn’t.

    The starting point is strategy. I ground teams in outcomes vs output OKRs and a crisp value proposition: Which customer jobs-to-be-done benefit most from generative AI? Where can we unlock 10x improvements in speed, accuracy, or personalization? We prioritize a small number of high-signal use cases, size impact, and design Minimum Viable Experiments (MVEs) to de-risk assumptions before scaling. This is where build vs buy decisions matter—use foundation models and platforms for commodity needs, and invest your scarce engineering time where differentiation lives.

    Next comes architecture and data. AI Native products thrive on a retrieval-first pipeline, strong context window management, and model-agnostic abstraction so we can swap providers as needs evolve. I emphasize privacy-by-design, robust data governance, and observability across prompts, embeddings, latency, and cost. These guardrails let us move quickly without compromising trust, especially in regulated or enterprise settings.

    Execution shifts as well. I organize empowered product teams and product trios around the highest-value workflows, not components. Continuous discovery pairs with CI/CD, feature flags, and telemetry so we can test safely in production. Eval-driven development is non-negotiable: we design offline and online evaluations that mirror real user success criteria—accuracy, helpfulness, safety, and business outcomes—then wire those evals into the build pipeline to prevent regressions.

    On the intelligence layer, we increasingly rely on AI workflows and agentic AI to orchestrate multi-step tasks—retrieval, reasoning, tool use, and verification—with human-in-the-loop where appropriate. Clear system prompts, tool definitions, and fallbacks keep behavior predictable. This is where product craft meets prompt engineering and LLMs for product managers: the best teams codify patterns, share prompts in a living library, and standardize on a lightweight AI product toolbox.

    Risk and reliability are part of the product, not an afterthought. I run AI risk management as a continuous program spanning red teaming, content filters, PII handling, audit trails, and incident response. We tie policies to concrete controls and create simple dashboards leaders can trust. The goal is to ship boldly with safety, maintainability, and scale in mind.

    Becoming AI Native also changes how we grow. We lean into product-led growth with clear in-app guides, product tours, and activation paths that teach users where AI shines. CRM integration ensures sales and success teams have context to coach customers. Pricing experiments—often usage- or value-based—align revenue with the impact customers feel, while retention analysis helps us double down on the use cases that drive compounding value.

    To make this real, I use a 90-day plan. Days 0–30: align on strategy, top use cases, and risk posture; stand up data pipelines and a basic retrieval-first stack; define evaluation metrics. Days 31–60: ship MVEs behind feature flags, run head-to-head evals, and instrument observability; start a cross-functional community of practice. Days 61–90: scale the winning use cases, formalize governance, and publish a roadmap tied to outcomes—not just features—with clear SLAs and success metrics.

    The destination is a durable advantage: faster iteration cycles, smarter experiences, and a product strategy that compounds with every interaction. If you’re ready to make the leap, start small, measure obsessively, and build the muscle to ship, learn, and adapt. That’s the heart of becoming AI Native—and it’s well within reach.


    Inspired by this post on Product School.


    Book a consult png image
  • Stop Drowning in Dashboards: Real-Time Digital Analytics for Finserv Contact Centers

    Stop Drowning in Dashboards: Real-Time Digital Analytics for Finserv Contact Centers

    I’ve sat in enough finserv contact center reviews to know the pattern: wall-to-wall dashboards, weekly exports, and colorful charts that still leave teams asking, “So what should we do next?” The truth is, more dashboards rarely create better decisions. What we need is digital analytics that translates signals into action—fast, precise, and privacy-safe.

    When I say digital analytics, I mean a unified analytics platform that captures real-time behavioral data across voice, chat, IVR, email, and in-app journeys, then operationalizes it for agents, supervisors, and automated workflows. See how real-time behavioral analytics helps finserv contact centers lower costs, improve resolution speed, and deliver better member experiences.

    Dashboards tend to be lagging, siloed, and optimized for reporting, not resolving. They spotlight vanity metrics, bury journey-level friction, and rarely surface the “next best action” that actually moves a member request toward resolution. By the time a trend shows up in a weekly readout, the expensive part—handle time, repeat contacts, churn risk—has already accumulated.

    Real-time digital analytics flips that script. Instead of passively describing performance, it continuously detects intent, risk, and friction as interactions unfold—then powers targeted responses. For example, it can route high-risk transactions to specialized agents, prompt dynamic guidance during an escalated call, or trigger a proactive message that deflects a repeat contact. In practice, that means fewer transfers, faster resolution speed, and measurable reductions in operating costs.

    For finserv specifically, the payoff is immediate. Agent Analytics surfaces coaching opportunities (e.g., where scripts stall or compliance steps get missed). Retention analysis identifies members at churn risk after a negative experience. Journey analytics exposes where authentication fails or balance inquiries overwhelm queues, so you can intelligently deflect to self-service. And when a potential fraud signal appears mid-session, real-time insights can prioritize routing and alerting without sacrificing compliance.

    Implementation should be iterative and outcomes-driven. Start by instrumenting the top five journeys that drive the most cost or dissatisfaction (lost card, fraud dispute, loan status, password reset, payment issue). Tie each to clear outcomes vs output OKRs—think first-contact resolution, repeat-contact reduction, containment rate, and average time-to-resolution—so every analytic signal earns its keep. Then activate insights inside the workflow: agent assist prompts, smart routing, and targeted follow-ups that close the loop.

    Governance matters just as much as speed. In a regulated environment, privacy-by-design and data governance are non-negotiable. Build data access controls, audit trails, and consent management into your operating model from day one. Align analytics with regulatory compliance requirements to ensure that what you measure and automate is defensible, explainable, and safe for members and the business.

    To accelerate learning, pair digital analytics with controlled experiments. Use A/B testing on IVR flows, authentication steps, and post-call follow-ups to quantify what truly reduces transfers and repeat contacts. Define a minimum detectable effect (MDE) upfront so tests are fast and conclusive. Run continuous discovery with cross-functional product trios (operations, data, compliance) to turn insights into shippable improvements every sprint.

    On the stack side, focus on connecting systems you already trust. CRM integration ensures that context follows the member, while tools like Amplitude analytics, Pendo, or Intercom can instrument key digital touchpoints. Whether you choose build vs buy, the principle is the same: consolidate signals into a unified analytics platform, then push decisions and guidance back into the tools agents and members already use.

    The cultural shift is from reporting to decisioning. Instead of celebrating more charts, celebrate faster resolutions and fewer escalations. Replace static executive reports with alerting and action playbooks. Make it trivial for supervisors to see what changed, why it mattered, and which play to run next. That’s how you convert data into durable operating advantage.

    The mandate is clear: stop drowning in dashboards. Move to digital analytics that captures behavior in real time, respects compliance, and powers operational decisions where they matter most—in the member journey. When you do, cost curves flatten, resolution speed climbs, and member trust compounds.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image