Tag: agentic AI

  • How Incident.io’s AI SRE Diagnoses, Hypothesizes, and Fixes Outages in Slack at Record Speed

    How Incident.io’s AI SRE Diagnoses, Hypothesizes, and Fixes Outages in Slack at Record Speed

    When your site goes down, every second counts. I’ve lived that reality across multiple product lines, and the difference between a five-minute blip and a two-hour outage is felt by customers, engineers, and the business. That’s why I’ve been closely following how Incident.io has evolved from coordination during chaos to intelligent, proactive response.

    Now, they’re building something new: an AI SRE that can actually help diagnose and respond to incidents. As someone who thinks deeply about reliability, velocity, and customer trust, that promise hits the intersection of AI Strategy, product management leadership, and operational excellence.

    I recently spent time with Lawrence Jones, Founding Engineer at Incident.io and Ed Dean Product Lead for AI at Incident.io, digging into how their team is teaching AI to think like a site reliability engineer. They shared how they went from simple prototypes that summarized incidents to a multi-agent system that forms hypotheses, tests them, and even drafts fixes—all from within Slack.

    Here’s what stood out to me first: AI’s biggest impact comes from compressing time—identifying causes minutes instead of hours. In practice, that means fewer cycles lost to paging the wrong on-call, clearer paths to root cause, and faster recovery—without cutting humans out of the decision loop.

    Equally important is deciding where automation belongs. The team’s approach aligns with how I evaluate high-risk workflows: Identify which parts of debugging can safely be automated. Combine retrieval, tagging, and re-ranking to find relevant context fast. Use post-incident “time travel” evals to measure how well their AI performed. Balance human trust and AI confidence inside high-stakes workflows. The human remains accountable; the AI accelerates context, options, and execution.

    On the technical side, the retrieval choices were refreshingly pragmatic. Retrieval-augmented reasoning still benefits from simplicity: deterministic tagging and re-ranking often beat complex vector setups. I’ve seen the same in production: start with crisp, deterministic signals, then layer embeddings where they truly add value. This keeps systems debuggable and stable as you scale.

    The interface choices matter just as much as the models. “Slack as the interface for human-AI collaboration” puts the agent where incidents already live, reducing friction and increasing adoption. Under the hood, they’ve been pragmatic with “PGVector and Postgres for retrieval experiments”, using “RAG (Retrieval-Augmented Generation)” and “Multi-agent orchestration” to chain context gathering, hypothesis formation, and action proposals. The north star is compelling: “AI as your company’s immune system”.

    What impressed me operationally was the rigor around evaluation. Post-incident “time travel” evals let teams score AI accuracy after they know what really happened. That’s the standard we should all adopt: test the agent against reality, not just synthetic prompts, and feed those learnings back into prompts, tools, and guardrails.

    Trust is the currency in incidents, so the product surface must reflect uncertainty with care. Building trust in AI isn’t just about precision—it’s about showing reasoning and uncertainty in ways humans understand. In other words, show the chain of thought as a structured artifact (signals considered, hypotheses rejected, evidence gathered), expose confidence bands, and always make it easy for humans to override or guide.

    From a workflow standpoint, the investigation loop mirrors seasoned SRE practice: fast scoping, parallel checks and data sources, building hypotheses and refining findings, then proposing remediations paired with the context that justifies them. Human-agent collaboration here is not a handoff—it’s a tight copilot loop where the agent gathers, tests, and drafts, and the human confirms, prioritizes, and executes.

    For platform and security leaders, this approach blends speed with safety. Clear permissions, auditable actions, blast-radius constraints, and CI/CD integration keep the AI inside defined guardrails while still delivering material acceleration. The payoff is higher deployment frequency without compromising reliability—because detection, triage, and rollback become faster and more repeatable.

    My takeaway as a product leader: this is a blueprint for agentic AI in mission-critical workflows. Start in the tools users live in (Slack), nail retrieval with deterministic foundations, model the expert’s playbook (not just their summaries), and make evaluation a first-class part of the product. Do that well, and the AI goes from assistant to teammate—conservative when it should be, bold when the evidence supports it, and always legible to the humans in the loop.

    The momentum around Incident.io’s AI SRE suggests where we’re headed next: deeper integrations, broader coverage across service catalogs, and richer automations that remain transparent and controllable. For teams investing in reliability, this is the moment to operationalize agentic AI—measured, auditable, and designed for trust—so you can move faster when it matters most.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Turn Claude Code Into a Trusted Teammate: My 3-Layer Memory System You Can Copy

    Turn Claude Code Into a Trusted Teammate: My 3-Layer Memory System You Can Copy

    "Can you critique the landing page for my new Story-Based Customer Interviews course?" That simple ask used to kick off hours of back-and-forth where I fed an AI the same context over and over—only to get generic feedback that wouldn’t land with my audience or fit my products. As a product leader, that inefficiency was unacceptable; as a writer, it was just plain frustrating.

    Not anymore. Today, Claude not only critiques my work, it helps me produce it. It generates marketing copy—in my voice. It helps me write blog posts. It knows what search terms are relevant to my business and helps me optimize my articles for SEO and now AEO. It helps me with competitive research, academic research, and discovery research. And it does all of this with little prompting from me.

    I don’t upload files to a web-based project. I don’t manage elaborate prompt libraries. I don’t repeat myself. I ask for help and Claude knows exactly what to do. The shift happened when I learned how to give Claude Code a memory. Claude now knows who my target customer is, the key value propositions I focus on, the specific opportunities each product addresses, my revenue model, my marketing channels, and so much more.

    Dark-mode slide with monospaced white text outlining an SEO plan: add CLAUDE.md to an AI glossary as the entry point, with bullets on article focus, audience, and search architecture for Give Claude Code a Memory.
    A dark-themed strategy slide for the post Stop Repeating Yourself: Give Claude Code a Memory, showing how to lead with a CLAUDE.md glossary page, write clearly for nontechnical readers, and link glossary and article to boost discovery and engagement.

    With that memory, I consistently get high-quality output tailored to my audience and aligned to my products and services. I don’t retype the same context; Claude just remembers. In this article, I’ll show you exactly how I set up that memory. It relies on Claude Code (which requires a Pro subscription), and it’s worth it. If you’re new to Claude Code, start with "Claude Code: What It Is, How It’s Different, and Why Non-Technical People Should Use It."

    Here’s the underlying problem: with large language models, every conversation starts from scratch. Yes, ChatGPT can remember some things and Claude can search past conversations, but practically speaking each new thread wipes the slate clean. If I were working on a new landing page, I’d normally need to upload target customer context, product details, primary and secondary value propositions, FAQ questions and answers, plus testimonials and logos for social proof—every single time.

    Dark-theme screenshot of the Claude interface with a large prompt field, model selector set to Sonnet 4.5, and quick-action buttons for Write, Learn, Code, Life stuff, and Claude’s choice on the home screen.
    Start fast with Claude’s home screen: Sonnet 4.5 is ready, and quick actions for writing, learning, and coding sit beneath a clean prompt box—ideal for showing how memory cuts repetition and streamlines daily development.

    Projects in web-based tools help a bit, but they introduce a new dilemma. When I move to the next landing page targeting the same customer but a different product and value proposition, do I start a new Project (tedious) or keep expanding the old one (which muddies the context window and degrades output quality)? The good news: Claude Code solves this by giving the model a precise, durable memory without overloading any single conversation.

    Claude Code can read files on my local machine, which is an understated superpower. I use those files to create a persistent, reusable memory that works across all chats and Projects. Files can be mixed and matched, so I give Claude exactly what it needs for the task at hand—and nothing more. For a first landing page, I reference the target customer and the relevant product; for the second, I reuse the same target customer file and point to the new product file.

    Screenshot of a macOS Notes window in dark mode showing an AI-assisted review of producttalk.org, listing Fetch and Read steps and a "Homepage Evaluation" for a first-time B2C visitor.
    Dark-mode Notes screenshot captures Claude Code in action: it fetches producttalk.org, reads context files, and delivers a concise homepage evaluation—showing how memory streamlines repeated analysis tasks.

    When you give an LLM the exact right context, output quality jumps. More context only helps if it’s the right context. For a landing page, Claude needs to know about the current product and perhaps related products for differentiation—but it doesn’t need to know about unrelated offerings. Structure your memory so Claude gets precisely what’s required.

    Once I did this, Claude shifted from “intern who needs handholding” to trusted advisor and capable teammate. It doesn’t guess at my value propositions—I’ve already told it. It writes in my voice because it has my writing guide and samples. It knows who owns which course and which use cases map to which features. The setup takes a bit of upfront work, but it compounds: update a file when something changes and you’re done. Most of this information already lives in your system; the trick is making it easy for Claude to use.

    Diagram of the Claude Code interface with a terminal-style dashboard. Arrows show Global Preferences (~/.claude/CLAUDE.md), Project Preferences (Project/CLAUDE.md), and Custom Files feeding memory into the coding chat.
    See how Claude Code stops repetition: global and project CLAUDE.md files, plus custom reference docs, flow into the editor so the assistant remembers your preferences and context while you code and run commands.

    Because the files live on my machine, I own the system. No vendor or device lock-in. I decide when and who to share with. I can work with Claude on one project and ChatGPT on another—both can rely on the same file-based memory strategy. It’s an AI strategy that scales with product discovery, accelerates go-to-market content, sharpens competitive differentiation, and supports product-led growth.

    Here’s how I design the memory: I use three layers. Claude Code already encourages global preferences and Project-specific instructions, but the third layer—reference context—is where the real power lives.

    Dark-mode screenshot of a macOS editor showing a 'Claude Code Preferences' markdown file with sections on writing conventions, planning protocol, and feedback for collaborating with Claude.
    Peek inside a markdown playbook for Claude Code: concise rules for writing, multi-level planning, and clear feedback that turn repeated reminders into reusable memory and smoother, faster coding sessions.

    Layer 1: Global Preferences (Always on). The first time I launched Claude Code, I created a CLAUDE.md file at ~/.claude/CLAUDE.md. This is where I keep the cross-project rules of engagement—how I like to work with Claude. Mine includes: Always create a plan for me to review before you start any work; Give me direct feedback (no hedging, no gentle suggestions); Use bullet points for summaries; Ask clarifying questions one at a time so I can give complete answers; No emojis unless I explicitly ask for them. Claude Code automatically loads this file at the start of every session, so I never restate my preferences.

    Layer 2: Project-Specific Instructions. Different projects have different rules. In my writing workspace, the Project CLAUDE.md sets the roles (I’m the primary writer; Claude is my thought partner and editor), defines a multi-round review flow (content → structure → accuracy → typos), prioritizes human readability over SEO, and points to my writing style guide. In my task management system, I include how my Trello integration works, file naming conventions for tasks, and how to process research papers into summaries. In my code projects, I specify the technology stack (Node.js vs. Python), testing framework (Jest for Node.js, pytest for Python), code style and conventions, project architecture and directory structure, and which dependencies and libraries to use. Each project directory has its own CLAUDE.md, and Claude automatically loads the relevant file when I’m working there.

    Dark-themed text editor screenshot of a markdown file titled 'Claude Instructions,' featuring sections for session setup, working relationship, editor responsibilities, and research and development guidelines.
    Peek inside a markdown playbook for collaborating with Claude—covering session setup, roles, editorial standards, and research steps—to show how saved instructions create consistent results without repeating yourself.

    Layer 3: Reference Context (Pull as Needed)—the real power. LLMs have a context window—a limit to how much they can process at once. Even within that limit, loading too much degrades performance due to “context rot.” The remedy is ruthless context management: small, targeted files that load only when needed. Keep CLAUDE.md files concise and focused on rules and workflows. For detailed knowledge, create separate reference files and list them in your CLAUDE.md so Claude knows they exist and when to fetch them. When I ask for help creating a landing page, Claude knows to use my business profile, the product file, and my target customers context.

    Here’s what most people miss: you don’t cram everything into global or Project files. You maintain small, reusable reference files that Claude only loads on demand. In my walkthrough, I share exactly which context files I created and why; how I got Claude Code to help me create them; how I break them into small, reusable components so Claude gets precisely what it needs; how I keep everything up to date; and step-by-step instructions so you can set up a similar memory system.

    Diagram of three markdown files (business-profile.md, story-based-customer-interviews.md, target-customers.md) feeding into a Claude Code IDE panel, showing context files powering an AI assistant.
    Three project notes funnel into Claude Code, turning reusable context into working output. This visual shows how saving key docs as memory lets the AI pick up where you left off and skip repetitive prompting across tasks.

    Let’s dive in.


    Inspired by this post on Product Talk.


    Book a consult png image
  • AI at Home, Impact at Work: Experiments That Supercharged My Product Leadership

    AI at Home, Impact at Work: Experiments That Supercharged My Product Leadership

    I recently tuned into an insightful All Things Product episode featuring Teresa Torres and Petra Wille on how experimenting with AI in everyday life sharpens how we build AI-powered products at work. The core premise resonated deeply with my AI Strategy: low-stakes, personal experiments accelerate confidence, clarify limitations, and build an AI product toolbox we can bring into the office with rigor.

    If you want to dive in, you can listen on Spotify or Apple Podcasts. I found the conversation especially relevant for product trios and anyone shaping LLMs for product managers in high-stakes environments.

    The idea is simple but powerful: when I prototype with AI at home—where the stakes are low—I learn faster, make safer mistakes, and internalize critical product patterns. Over time, those patterns transfer directly to work: tighter context management, sharper bias awareness, clearer human-in-the-loop guardrails, and a more nuanced view of when to use AI as a thought partner versus when to consider agentic AI.

    In my own practice, I’ve mirrored many of the scenarios discussed: using ChatGPT by OpenAI to plan meals, analyze public data sets like school budgets, and even sanity-check real estate evaluations. These seemingly mundane tasks are fertile ground for learning about context window limits, hallucination (artificial intelligence), AI bias, and privacy-by-design trade-offs. Each experiment helps me craft better prompts, structure data for clarity, and decide when a human review step is non-negotiable—core habits for AI risk management.

    At work, I treat AI as a thought partner for writing, research synthesis, and contract review. I also explore when and how to responsibly evolve toward agentic AI for repeatable workflows. The distinction matters: a thought partner augments judgment; an agent automates execution. Building the right scaffolding—data governance, auditability, constraints, and escalation paths—ensures we unlock speed without compromising safety.

    Three lines from the episode stayed with me: “I’m trying to write things that only I can write — that’s my guiding writing light right now.” — Teresa. “The more we use AI, the more we learn what it’s good at, what it’s not good at, and where context becomes a limitation.” — Teresa. “It’s a safer playground — we can build our toolbox at home before bringing those lessons to work.” — Petra. These are practical north stars for product management leadership in the GenAI era.

    For anyone getting started, here’s what worked for me: begin with “low-stakes” personal experiments, write down your prompts and outcomes, and reflect on failure modes. Treat each activity as product discovery: What problem am I solving? What outcome matters? What data and context does the model need? Which decisions must stay human-in-the-loop? This discipline builds an AI product toolbox you can confidently apply to real customer problems.

    I also keep a running toolkit of references and tools that inform my practice: Context window as a concept helps me size and sequence information. Visual and video tools like Midjourney and Sora expand how I think about multimodal experiences. I rotate between Claude by Anthropic and ChatGPT by OpenAI depending on task fit, and I’ve used Claude Code when I need structured assistance with code review. For knowledge capture and workflow, Readwise and Ghost help me structure insights and ship content.

    If you want more structured learning paths, I found Josh Seiden’s Learn AI With Me, A 30-Day Sprint to be a practical primer, and the broader community conversation at Product at Heart Conference is invaluable. For a deeper grounding in risk, I recommend reviewing topics like Hallucination (artificial intelligence), AI bias, and Agentic AI—and revisiting the complementary episode, Context is King.

    I’d love to hear how you’re experimenting: Where have you seen AI meaningfully reduce toil? Where does it still struggle? How are you balancing creativity, data safety, and compliance as you scale? Drop a comment below and let’s compare notes—especially on patterns that help product trios move faster without sacrificing trust.

    Bottom line: start small at home, carry lessons into the office, and build with curiosity and intentionality. That’s how we level up our product discovery, sharpen our value proposition, and lead teams confidently through the GenAI transition.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Mastering AI Evals: The Essential Product Manager Skill to Ship Safer, Smarter AI

    Mastering AI Evals: The Essential Product Manager Skill to Ship Safer, Smarter AI

    In every AI-powered product I ship, evaluation is the difference between a compelling demo and a dependable customer experience. AI evaluation isn’t a nice-to-have; it’s a core product management competency that shapes quality, safety, and business outcomes from the first prototype to scale.

    When I talk about AI evaluation, I mean a disciplined, repeatable way to measure model behavior across quality, safety, reliability, latency, and cost. Gen AI has changed the cadence of product decisions—models evolve weekly, prompts drift under real-world load, and edge cases multiply. Without rigorous evals, we risk shipping unpredictability.

    My goal in this piece is simple: “Dive deep into AI evals, why they matter for PMs today, and how to master them with clear steps, examples, and best practices.” If you’re leading product strategy for LLMs, agentic AI, or applied AI features, this is the playbook I rely on.

    Why this matters now: customers don’t judge AI by benchmarks, they judge by trust—did it help me, was it safe, was it fast? Strong AI evals let me set outcomes vs output OKRs, quantify risk, and make transparent trade-offs between accuracy, latency, and cost. They also give engineering and design clear guardrails to move fast without breaking user trust.

    Step 1: Define the product problem and success metrics. I start by tying AI metrics to business outcomes—resolution rate, deflection rate, revenue lift, time-to-value—and include model-centric measures like hallucination rate, harmful content rate, latency, and token cost. This keeps experiments anchored to impact, not just model scores.

    Step 2: Build a high-signal golden dataset. I curate real, anonymized user prompts from discovery and support channels, then add adversarial and long-tail cases. For generative tasks, I create rubric-based criteria for correctness, helpfulness, tone, and safety. This dataset becomes my regression suite as prompts, RAG pipelines, or models change.

    Step 3: Choose the right evaluation methods. I combine deterministic unit tests for rules with LLM-as-judge scoring, pairwise preference tests for prompt variants, human review for critical flows, and red teaming for safety. I also apply privacy-by-design and strong data governance to ensure eval data handling meets compliance and customer expectations.

    Step 4: Operationalize with CI/CD. Evals run automatically on every prompt, retrieval, or model update, with pass/fail gates and alerting. I track results in a unified analytics platform so product, engineering, and go-to-market teams see the same truth. If a change regresses key thresholds, we pause rollout or roll back.

    Step 5: Optimize the cost–quality–latency triangle. Real products live within constraints. I analyze token budgets, caching strategies, model selection (e.g., small for classification, larger for complex generation), prompt structure, retrieval quality, and function-calling patterns. For agentic AI, I evaluate tool-use correctness and task completion reliability, not just text quality.

    Step 6: Close the loop with experimentation. Offline evals get me confidence; online A/B testing validates business impact. I design tests with a clear minimum detectable effect (MDE), guard for novelty bias, and instrument activation, retention, and satisfaction in Amplitude or Pendo. Agent analytics help me pinpoint where users succeed or get stuck.

    Step 7: Govern responsibly. I maintain model cards, decision logs, and incident playbooks. For customer-facing assistants, I gate risky actions, log explanations, and add human-in-the-loop escalation. AI risk management isn’t bureaucracy—it’s how we earn trust at scale.

    A concrete example: building a customer support assistant. My success metrics include deflection rate, first-contact resolution, median response latency, and safe action rate. The golden dataset blends common queries, billing edge cases, account-specific retrieval checks, and adversarial prompts. Evals measure factuality against a knowledge base, tone alignment with brand guidelines, and safe tool use for CRM integration. Only after passing offline gates do we A/B test deflection and CSAT in production.

    Common pitfalls I watch for: overfitting prompts to a tiny test set, relying solely on LLM-as-judge without human calibration, skipping safety tests when latency rises, and treating evaluations as a one-time launch task. The antidote is simple—regularly refresh datasets, diversify eval methods, and wire evals into the same release discipline as any core feature.

    The payoff is compounding. With strong AI evals, we ship confidently, reduce incident rates, accelerate iteration, and communicate trade-offs clearly to stakeholders. More importantly, we build products customers trust—because quality isn’t a promise, it’s a practice we can measure every day.


    Inspired by this post on Product School.


    Book a consult png image
  • Innovation Strategy in the Age of AI: Proven Playbooks, Real-World Examples, and What Works Now

    Innovation Strategy in the Age of AI: Proven Playbooks, Real-World Examples, and What Works Now

    AI has rewritten the rules of how we create value, and I’ve watched the most resilient organizations treat innovation as a disciplined, outcomes-driven capability—not a one-off initiative. In my role leading product teams, I’ve refined a practical approach that blends rigorous product management with an adaptive AI Strategy so we can ship faster, learn faster, and de-risk smarter.

    Learn what an innovation strategy is, how to build one, which types to use, and see real examples that drive meaningful change.

    At its core, an innovation strategy is the intentional system that aligns vision, portfolio bets, and execution mechanics to measurable business outcomes. I anchor this in outcomes vs output OKRs, ensuring every experiment, feature, and GTM motion ties to a clear value proposition and reinforces hard-won product-market fit lessons rather than chasing novelty.

    I design portfolios around three types of innovation that work well in the age of AI. First, core optimization: drive compounding gains with CI/CD, DORA metrics, and A/B testing to improve activation, retention, and profitability. Second, adjacent expansion: extend value via new segments, channels, or use cases—often enabled by product-led growth tactics like in-app guides and product tours. Third, transformational bets: leverage gen ai and agentic AI to create step-change capabilities while proactively addressing AI risk management, data governance, and privacy-by-design.

    Building the strategy starts with empowered product teams and product trios who run continuous product discovery to validate problems before validating solutions. I keep discovery tight with a minimum detectable effect (MDE), instrument the journey with a unified analytics platform, and thread learnings into product roadmapping and sprint planning so we prioritize the smallest, fastest path to decision-quality data.

    On the AI front, my operating model combines an AI product toolbox (prompt patterns, evaluation harnesses, and safety rails) with LLMs for product managers to accelerate research, prototyping, and content generation. We standardize CustomGPT workflows where appropriate, define CRM integration and data boundaries early, and adopt a clear build/partner/buy decision tree to protect focus and speed without compromising risk posture.

    Here are real patterns that consistently deliver meaningful change. We’ve used generative AI for product prototyping to compress concept validation from weeks to days, then confirmed impact with rapid A/B testing tied to MDE. We’ve implemented agentic AI for customer support triage to reduce response times and free human agents for high-complexity cases, all under strict data governance. And we’ve paired new AI features with a focused go-to-market strategy—clear positioning, sharp onboarding, and outcome-centric messaging—to accelerate user activation.

    Measurement makes or breaks innovation. I combine deployment frequency and DORA metrics on the engineering side with activation, retention analysis, and value-moment telemetry on the product side. QBRs vs OKRs alignment keeps leadership focused on outcomes, while experiment scorecards ensure we learn even when results are neutral. The goal is to increase the rate of validated learning across the portfolio, not just ship more.

    Governance is a feature, not a tax. We embed threat detection and response, privacy-by-design, and transparent data policies from day one. Stakeholder management and board management stay tight with simple narratives: the bet, the hypothesis, the metric, the MDE, the timeline, and the kill-or-scale criteria. That clarity builds trust and protects speed.

    If you’re recalibrating your innovation strategy right now, start small and deliberate: define the outcomes, select one core, one adjacent, and one transformational bet, and wire in learning loops from discovery to delivery. With empowered product teams, disciplined analytics, and a pragmatic AI Strategy, you can move from interesting ideas to durable competitive differentiation—faster and with far less risk.


    Inspired by this post on Product School.


    Book a consult png image
  • AI vs. Product Managers by 2035: What Will Change—and How to Future‑Proof Your Career

    AI vs. Product Managers by 2035: What Will Change—and How to Future‑Proof Your Career

    Will AI replace product managers, or simply transform their role? Discover what AI can and cannot do, plus insights from PMs on the future of work.

    I’m asked this question in nearly every leadership meeting now, and my answer is consistent: AI won’t replace great product managers by 2035—but it will radically reshape how we operate. The PMs who thrive will pair sharp product judgment with an intentional AI Strategy and a practical AI product toolbox, unlocking speed, clarity, and scale without sacrificing vision.

    Here’s what AI already does well for us today. With LLMs for product managers, I can synthesize customer feedback at scale, draft PRDs and acceptance criteria, transform notes into user stories, and even auto-generate experiment plans with a minimum detectable effect (MDE) calculation. When I connect these models to Amplitude analytics, Pendo, Intercom, and HubSpot through a unified analytics platform and CRM integration, I accelerate discovery, prioritize confidently, and tighten the loop between signal and action. CustomGPT workflows now handle routine backlog grooming, competitive landscaping, and early concept testing, freeing my team to focus on higher-order decisions.

    By 2035, I expect agentic AI to operate as an execution co-pilot: autonomously scheduling A/B testing, launching targeted in-app guides and product tours, monitoring user activation and onboarding funnels, and raising anomalies via Agent Analytics long before a dashboard review. These systems will propose playbooks, draft UX writing and tooltip design, and recommend next-best actions—then wait for human approval when stakes are high. Think of it as the ultimate forward deployed engineer for operational work, working within clear guardrails.

    What AI cannot do—and is unlikely to master soon—is the essence of product leadership. It won’t craft a resonant value proposition for a new segment, define points of parity vs. competitive differentiation, or set outcomes vs output OKRs that align messy stakeholder incentives. It won’t navigate board management, reconcile conflicting narratives from sales and engineering, or make ethically grounded trade-offs under uncertainty. That’s where privacy-by-design, data governance, and AI risk management converge with human judgment, context, and accountability.

    As the tooling matures, the PM role will tilt from artifact production to decision quality. We’ll spend less time writing and more time deciding: which bets to place, which risks to accept, and where to concentrate our empowered product teams. Product discovery deepens, product positioning sharpens, and product roadmapping and sprint planning become faster and more adaptable—because the busywork is handled, not because the thinking is outsourced.

    Practically, I’m evolving team design and rituals now. We operate as product trios, pair PMs with forward deployed engineers, and embed gen ai into daily workflows. We standardize prompts, set review thresholds, and instrument everything for observability. Our stakeholder management improves because we bring clearer narrative artifacts—and because we can test assumptions earlier and share evidence in real time.

    If you’re building your own AI Strategy, start with three tracks. First, foundations: instrument data pipelines, establish data governance, and codify privacy-by-design. Second, acceleration: deploy CustomGPT workflows for research synthesis, PRD drafting, retention analysis, and experiment design, while keeping humans in the loop for decisions. Third, automation with guardrails: let agentic AI run low-risk playbooks (in-app guides, content suggestions, ops checks) and require human approval for anything customer-facing and irreversible.

    Future-proofing your career is about skill stacking. Double down on first principles decision making, storytelling, and cross-functional influence, and pair that with hands-on fluency in gen ai, prompt engineering, model evaluation, and risk controls. Learn how to frame trade-offs, architect outcomes vs output OKRs, and translate strategy into experiments that AI can help execute. The combination—human judgment plus machine speed—is the new competitive advantage.

    So, will AI replace product managers by 2035? No. It will transform average PMs into good ones and great PMs into force multipliers. The ones who lead will embrace AI as leverage, cultivate empowered product teams, and stay relentlessly focused on customer outcomes. The future belongs to product creators who can wield intelligent tools without surrendering accountability for the product’s direction and impact.


    Inspired by this post on Product School.


    Book a consult png image
  • RAG for Product Managers: Transform Strategy, Speed Discovery, and Win with Confidence

    RAG for Product Managers: Transform Strategy, Speed Discovery, and Win with Confidence

    I’ve watched Retrieval-Augmented Generation (RAG) shift from a buzzword to a practical advantage that changes how my team discovers insights, makes roadmap bets, and competes. When I ground large language models in our own product, customer, and market data, I make faster decisions with more confidence—and I spend far less time debating opinions and more time shipping outcomes.

    Think RAG for product managers is just AI hype? Wait until you see the use cases and ways it’s reshaping your work and product strategy.

    RAG connects the power of LLMs with the credibility of your internal knowledge: user research, support tickets, win/loss notes, specs, QBRs, and analytics. Instead of generic answers, I get contextual, citeable responses that reflect our reality. That means cleaner product discovery, sharper product positioning, and a clearer value proposition grounded in customer truth.

    Day to day, I use RAG to accelerate product discovery by synthesizing interviews and feedback across channels; to de-risk roadmapping by surfacing evidence behind feature requests; and to power go-to-market strategy with crisp messaging that maps to points of parity and true competitive differentiation. It’s equally effective for onboarding new PMs, increasing stakeholder alignment, and unblocking empowered product teams when signals are noisy or fragmented.

    Execution still matters. I treat RAG like any critical system: prioritize data governance, privacy-by-design, and AI risk management. I integrate with our CRM and support stack so the model learns from live customer context, and I instrument everything with product analytics to track impact. When the outputs are measurable, RAG moves from novelty to operating system.

    To start, I focus on a narrow, high-signal slice of the workflow—like summarizing support patterns or synthesizing discovery for a single segment—then iterate. I pair PMs with design and engineering in tight product trios, define quality criteria up front, and review answers with subject-matter experts. As quality rises, I scale to roadmapping and product-led growth experiments, always validating with users before I automate.

    The payoff is real: faster decisions, clearer narratives, and fewer surprises. RAG won’t replace the craft of product management, but it will amplify it—giving us an edge in both speed and accuracy. If you’re serious about LLMs for product managers and want results you can defend, RAG is a strategic bet worth making now.


    Inspired by this post on Product School.


    Book a consult png image
  • From Chaos to Consistency: How I Built a Scalable AI Content Design Agent with RAG

    From Chaos to Consistency: How I Built a Scalable AI Content Design Agent with RAG

    It’s Monday morning, and my Slack and email are already overflowing with content requests: “Can you review this flow?”; “Can you rewrite this screen?”; “Can you name this feature?” I’m not freshly back from holiday—this is just a regular work week kicking off. If you’ve ever been a solo content designer supporting multiple teams, you’ll recognize the pressure. The pipeline for content in product design is always full, and the demand for expertise never stops.

    Fixing this isn’t just a matter of better time management or incremental process tweaks. To truly scale, I needed to extend my reach by bringing AI into the design process—without sacrificing judgment, standards, or quality. That Monday morning, I realized I had to scale my skills, my judgment, and our systems, not just my calendar.

    Building AI is fundamentally about building systems. I wanted to use AI to scale myself without devaluing critical thinking or flooding the product with generic, verbose content. I also knew a useful AI tool must do more than spit out microcopy—it has to plug into a system we can continually shape. As a content designer, the system is always the starting point. Strong design systems create strong content standards; then AI agents can produce content that meets those standards at speed, freeing me from the bulk of standardized work. That’s not a threat—it’s an advantage. To instruct AI well, our systems must be well constructed.

    I often think about this work like a bakery. You need a recipe before you can make a loaf of bread. Most interface content churns out the same loaf, day in and day out. It’s better for the master bakers to focus on the unique, custom bakes—and how the recipe needs to change. With that mindset, I set out to build an AI content design agent.

    Screenshot of a content design assistant interface titled VERBI, showing a chat input field, quick-start prompts like 'Can you write this?', and links to view permissions and agent setup in draft mode.
    Inside the Content Design Agent workspace, a clean chat UI titled VERBI pairs a central prompt box with chips for writing, editing, and reviews, plus clear controls to view permissions and open the agent setup for product teams.

    When I started this project back in May 2025, many LLMs still had frustrating limitations. Google Gemini let me build a custom Gem agent, but I couldn’t share it with other users. ChatGPT could be customized, but only with static files: I couldn’t point it to live, updatable URL sources. I settled on Glean for three simple reasons: everyone at the company had access; Glean could access all internal documentation and treat URLs as sources of truth; and its then-new Agents feature made AI search customizable. Configuring an agent in Glean is straightforward—you choose a trigger, a set of prompts, and a set of actions—but first I needed to get the inputs right.

    AI agents need focus. We had a wealth of internal information at Intercom, but not all of it was current or reliable. I curated exactly what the agent could access and assembled a tightly governed knowledge collection in Glean. Only essential information made the cut: the Intercom style guide—our definitive house style, including regularly-broken rules like “always write in US English” and “use sentence case everywhere”; tone of voice guidance for how we show up across mediums; a product glossary with hundreds of feature names and writing conventions; a monetization glossary for prices, plans, and add-ons; product marketing messaging guides with positioning for every feature and launch; core research insights across the product; and fin.ai and intercom.com/suite as the official, most up-to-date messaging sources.

    This is classic RAG (retrieval-augmented generation) in action, ensuring every answer is grounded in approved sources of truth. With the collection in place, I instructed the agent to prioritize these resources above anything else.

    Screenshot of a no-code workflow builder for a Content Design Agent, with cards for Trigger, Company search, and Respond, plus a sidebar checklist titled The basics to start from scratch.
    Step into a clean, no-code builder that shows how to assemble a Content Design Agent: kick off with a chat-trigger, run a company search, then respond with expert guidance, all guided by a simple starter checklist.

    Then came the fun part—building and branding the agent. “Content Design Assistant” felt bland, so I named it VERBI, a nod to its “verbal” design job. When people interact with VERBI, they usually begin with a question, but the intent varies widely. I defined a set of task prompts to guide expectations and outputs: “Can you write this?”; “Can you edit this?”; “Can you review this?”; “Can you name this?”; “Give me options”; “Give me guidance”; “Give me strategy”; “Give me research.” This mirrors the real breadth of content design, from creation to critique to discovery.

    To manage responses, VERBI needed three things: start with a specific task prompt; understand how to draw on the right resources each time; and connect with other systems. With task prompts defined, I wrote a detailed system prompt covering the essentials. Role: you are a content designer, supporting product designers. Employer: Intercom (consisting of Fin AI Agent and our next-gen Helpdesk). Resources: content design collection, research collection, Storybook design system. Tone of voice: follow a specific tone for our UI, adjust the tone for everything else. Components: for UI, use the specific guidelines in our design system only. Use cases: writing, editing, critiquing, naming, researching, and more.

    One connection mattered most: our design system, recently rebranded as “Surge.” Surge contains detailed content guidelines for every component in our product UI, from accordions and banners to tabs and tooltips. That granularity took months of human effort to codify, and it paid off. Designers no longer guess how to write for a toggle, a button, or a tooltip—and now VERBI understands and enforces those rules, too. A great content design assistant isn’t just a clever system prompt; it needs deep, component-level guidance to retrieve.

    Design system documentation page for a Badge component, with a left navigation of UI elements and a main panel showing content guidelines, examples of statuses, and a color‑coded table of label types.
    UI documentation showcases the Badge component’s content rules, teaching how to name statuses, define types, and apply color so labels read clearly. A handy visual for building a content design agent and ensuring consistent product messaging.

    Accessing the design system wasn’t simple at first. It lives in Storybook, which Glean couldn’t access directly. I started by scraping guidance from Storybook into an HTML file with Cursor and uploading it to VERBI—a functional but clunky workaround that required re-scraping every few days. Then our IT team stepped in. They used the Glean Indexing API to turn Storybook into a live data source. Now VERBI connects to Storybook directly. Ask it something ultra-specific, like the correct date format for Japan, and it returns the right answer. That integration elevated the agent from helpful to indispensable—human-level precision, 24/7, at scale.

    With prompts and resources in place, I launched VERBI and pressure-tested it. It was accurate and well-informed most of the time, but like any AI agent, it had quirks. I needed it to act as a gatekeeper, not a brainstorming partner that might bend rules or invent new ones. So I added a few explicit guardrails to the system prompt. Stopping sycophancy: “Inform, challenge, and assist. Never placate. Don’t agree by default. If something’s wrong, say so. Challenge assumptions.” Halting hallucinations: “If you don’t find the information required in our resources, say you don’t know the answer. Don’t guess and don’t give answers based on general knowledge.” Avoiding verbosity: “Keep answers short and to the point. Cut the fluff. Skip all niceties and social padding. Only give longer answers if the user asks you to.” These constraints keep responses crisp, correct, and consistent. Like any living system, the prompt needs occasional tune-ups, but the maintenance is minor compared to the upside.

    Where we are now: VERBI has been triggered 700+ times since launch. The benefits are tangible. For me, quality scales without constant policing; repetitive questions about naming, style, or punctuation have dropped significantly. I reclaim time because the agent drafts and checks V1 content across teams, enabling me to focus on higher-impact work. For the design team, iteration is faster, confidence is higher, and strategic clarity improves because shared language and grounded guidelines make decisions easier and more consistent.

    I used to spend too much time mopping up basic content mistakes and untangling spaghetti-like UI copy prone to human error. VERBI removes those errors at the source. The real advantage is speed: we get from blank slate to a high-quality first draft quickly, which means we can spend our energy deciding whether the content is right, not just “good enough.” Design is the whole interface—words, visuals, interactions—so reviews now happen with real content, never “copy TBD.” Our principle to sweat the details applies equally whether work is human-made or AI-assisted.

    Knee-jerk critiques of AI-driven content design often assume teams generate content from nothing and ship it. In reality, great AI is the outcome of great human decisions and strong systems. Its value is pulling us together faster—getting us to a complete, standards-compliant design we can review as a team before sharing it with the world. That’s how AI helps us win: by turning chaos into consistency, and consistency into velocity.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • What I Learned from Trainline’s Agentic AI: Building a Trusted Travel Assistant at Scale

    What I Learned from Trainline’s Agentic AI: Building a Trusted Travel Assistant at Scale

    Over the past year, I’ve been shipping agentic AI into production and coaching product teams on what it really takes to make these systems trustworthy in the wild. One story that crystallizes the playbook comes from Trainline’s move to an agentic architecture for travel assistance—an approach that mirrors what I’ve seen work in high-stakes, real-time customer experiences.

    Trainline—the world’s leading rail and coach platform—helps millions of travelers get from point A to point B. Now, they’re using AI to make every step of the journey smoother.

    I studied how "David Eason (Principal Product Manager) Billie Bradley (Product Manager), and Matt Farrelly (Head of AI and Machine Learning)" approached the build of "Travel Assistant, an AI-powered travel companion that helps customers navigate disruptions, find real-time answers, and travel with confidence." Their work exemplifies the kind of end-to-end thinking required to move beyond demos into dependable, on-the-go assistance.

    They share how they: Identified underserved traveler needs beyond ticketing; Built a fully agentic system from day one, combining orchestration, tools, and reasoning loops; Designed layered guardrails for safety, grounding, and human handoff; Expanded from 450 to 700,000 curated pages of information for retrieval; Developed LLM-as-judge evals and a custom user context simulator to measure quality in real-time; Balanced latency, UX, and reliability to make AI assistance feel trustworthy on the go.

    I align strongly with their core takeaways: "AI assistants need both scalable reasoning and deep domain context to be useful." "Tool design and guardrails are as critical as prompt design in agent systems." "LLM-as-judge evals make it possible to measure open-ended systems without massive labeling costs." And perhaps most importantly, "Even legacy companies can move fast when they embrace experimentation and tight PM–engineering collaboration."

    From an AI strategy perspective, starting "fully agentic" was the right call. When the problem space is dynamic—disruptions, route changes, fare conditions—reasoning loops and orchestration aren’t luxuries; they’re table stakes. Tool selection becomes product design: you need the right retrieval interfaces, constraint-aware planners, and API contracts that are resilient to partial failures. Layered guardrails for safety, grounding, and human handoff reduce hallucination risk while preserving responsiveness—critical when users are standing on a platform waiting for an answer.

    The retrieval scale-up—"Expanded from 450 to 700,000 curated pages of information for retrieval"—is a classic inflection point. I’ve seen teams stall here when they treat content growth as a pure indexing problem. The winning move is curation and structure: normalize sources, encode policy-level constraints, and align retrieval chunks to decision boundaries the agent actually uses. That’s how you keep precision high while coverage explodes.

    Evaluation is where most open-ended assistants fail quietly, which is why I was encouraged to see "Developed LLM-as-judge evals and a custom user context simulator to measure quality in real-time." In practice, LLM-as-judge gives you scalable, scenario-based scoring without prohibitive labeling, while a user context simulator surfaces regressions tied to persona, itinerary state, and device constraints. The combination closes the loop between model behavior, tool layer changes, and UX outcomes.

    On product delivery, the decision to have the system "Balanced latency, UX, and reliability to make AI assistance feel trustworthy on the go" shows mature prioritization. For travel, trust accrues in seconds: fast-enough responses, graceful degradation when upstream data lags, and explicit handoff when confidence dips. This is where guardrails meet UX writing—clear, bounded language signals competence even when the system defers.

    Finally, the organizational pattern matters. The teams that win in agentic AI are cross-functional, experimentation-driven, and ruthless about instrumentation. Tight PM–engineering collaboration, explicit safety thresholds, and an eval stack that mirrors real user journeys are what turn promising architectures into dependable products.

    It’s a behind-the-scenes look at how an established company is embracing new AI architectures to serve customers at scale.

    If you’re building agentic AI in production, borrow these moves: invest early in tool and guardrail design, scale retrieval with curation not just volume, adopt LLM-as-judge plus context simulation for continuous evaluation, and treat latency and reliability as core product requirements—not afterthoughts. That’s how you ship AI assistance that customers trust when it matters most.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Why We’re Building Our Next AI R&D Hub in Berlin—and Hiring 100 to Power Fin’s Growth

    Why We’re Building Our Next AI R&D Hub in Berlin—and Hiring 100 to Power Fin’s Growth

    I’m excited to share that we’re opening our next R&D hub in Berlin to support significant investment in our AI customer service platform, Intercom, and market-leading AI Agent, Fin. We intend to hire 100 people in Berlin over the year ahead across engineering, AI, data science, product, and design. This move reflects our AI Strategy, our commitment to product management leadership, and our focus on building enduring product-led growth.

    We believe that in a short number of years, the vast majority of customer service will be done by AI. Fin is already the world’s best Customer Service Agent. At Pioneer, our recent summit for AI customer service leaders in NYC, we talked about how Fin will become a true end-to-end Customer Agent, extending far beyond service. We showcased how companies like WHOOP, Anthropic, and Lightspeed are already pushing Fin in ways that help them grow their business.

    This market opportunity is massive and expanding at unprecedented pace. Our ambition is to earn our place as one of the most successful AI businesses during this wave of AI disruption, and we want more brilliant people on our team to pursue this as aggressively as possible. If you’re motivated by Generative AI, LLMs, and building real products that scale, you’ll find both challenge and impact here.

    We are already on track to be one of the fastest growing private software companies. Fin is the primary contributor to this, and is months away from passing $100m in ARR. So far, more than 7000 businesses have transformed their customer service with Fin, including German companies like electricity provider Ostrom, smart home technology provider tado°, and grocery delivery company Flink, along with global leaders like Vanta, Clay, Lovable, and Miro.

    Why Berlin? We’re drawn to the city’s rare blend of deep technical talent and rich creative culture—within a vibrant, globally connected ecosystem close to our R&D hubs in Dublin and London. It’s a place where top-tier engineers and designers thrive, and where ambitious builders from around the world want to relocate and create category-defining products.

    Orange gradient area chart with a white line and circular markers showing steady growth from about 26% to nearly 70% across monthly labels from May 2023 to Sep 2025, on a light grid with percentage ticks.
    Momentum is building: this month-by-month chart shows a consistent rise from the mid-20s to nearly 70% between May 2023 and Sep 2025—signaling strong progress as we expand engineering, AI, and automation at our new Berlin R&D hub.

    We needed a new location that would sustain the high ambition and standards held by our world-class AI teams in Dublin and London. Berlin has emerged as one of Europe’s hottest centers for AI talent, with a high density of AI-focused startups, applied research labs, and practitioners who bring exceptional literacy, optimism, and ambition. It’s the right accelerator for our AI hiring and a place to bring in brilliant minds to shape the future of our product and business.

    While Intercom’s reach is global with our headquarters in San Francisco, our R&D leadership remains anchored in Dublin, where half of the executive team sits—making Berlin both geographically and strategically an ideal next location for our growth.

    This isn’t our first time expanding our footprint; we previously bet on London and are delighted with how that’s been working. When we shared our Berlin news internally, the energy was palpable, with many teammates volunteering to help spin up the hub successfully—including colleagues who helped make London a big success, like Danny. That level of ownership and momentum is exactly what we aim to cultivate in Berlin.

    We’re looking for people who thrive in a high-intensity, high-ambition, high-standards environment and want to help build one of the world’s best AI companies. For builders like that, the opportunity for impact, growth, and career progression is extraordinary. As with London and Dublin before it, the early Berlin cohort will have a disproportionate influence on team norms, culture, and long-term outcomes. We are in the middle of a huge disruptive wave with AI, and Fin is one of the leading examples of commercially successful AI applications. Joining Intercom is an opportunity to be part of this disruptive wave, and help us build out our vision for Fin becoming the world’s best Customer Agent.

    Four panelists seated on a dark stage during an AI engineering discussion, with on-screen titles above them, at an event announcing a new R&D hub in Berlin.
    On a minimalist stage, four speakers share insights on AI research, automation, and engineering as part of a panel tied to Berlin expansion and the launch of a new European R&D hub.

    There are plenty of AI companies to join, but our technology and culture set us apart. Any AI product is only as good as the AI layer powering it. Ours is industry-leading, built by a highly talented, ambitious, and technical team of over 40 machine learning scientists, engineers, and designers in Europe who continuously optimize Fin’s performance through cutting-edge research, experimentation, and innovation. Fin’s average resolution rate increases 1% every month. That kind of steady, compounding improvement is exactly what great customer support AI strategy looks like in practice.

    We also build in public and share our progress and learnings with the AI community at large. Recently, our Chief AI Officer Fergal Reid and SVP of Engineering Jordan Neill joined leaders from Cognition, Harvey, and Perplexity in San Francisco to share real lessons, challenges, and breakthroughs from building frontier AI products. Our AI team regularly publishes their insights on the AI research blog; from optimizing inference speed and availability, to building our own proprietary models that outperform general purpose models for CX.

    Our AI group and the broader R&D org they operate within work at extraordinary scale and speed. We recognize that moving fast can’t be taken for granted—you must fight for it—and we’re doing just that, embracing the capabilities AI tooling brings us to achieve 2x the throughput. One example of this mindset in practice is us “Betting on the future of frontend at Intercom,” making a technology choice that optimizes for our teams’ ability to build high-quality product, fast.

    Our design and product teams are world-class and forward-thinking; they’re embracing AI to evolve how they work, as shared in our 3-point framework for AI-driven design and recently presented by Emmet Connolly, our SVP of Design, at this year’s Hatch conference in Berlin. As a product leader, I’m grateful to work alongside brilliant product and design thinkers—it gives me confidence that we’re solving the right problems, solving them well, and driving real impact.

    Tech conference collage with a speaker on stage beside four panels: AGI teaser on a tablet, code editor, webcam demo with hand tracking, and a simulation. Banner reads Hatch Conference 2025 Main Stage.
    From live demos to hands-on coding, this snapshot captures the momentum we're bringing to our Berlin R&D hub – AI experiments, hand-tracking prototypes, and simulation tools powering our next wave of engineering.

    We plan to open our Berlin office space in December or January. To get the office started, we’re hiring Senior Product Engineers, Machine Learning Scientists, Product Managers, Senior Product Designers, Engineering Managers, and Data Scientists immediately. If your craft sits at the intersection of LLMs for product managers, agentic AI, and empowered product teams, you’ll be right at home.

    You can learn more about our open roles, company, culture, and locations on our careers site, or feel free to reach out to me, Jordan, Fergal, or Brian directly on LinkedIn if you have any questions.

    Some of our engineering team will also be at LeadDev Berlin on November 3rd—come say hi if you’re attending.

    I’m looking forward to continuing to build Intercom as one of our generation’s best AI companies—and I’m excited for our expansion into Berlin to be a major contribution to that success.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Context Is King: My Playbook to Prep Product Teams for High-Impact AI Collaboration

    Context Is King: My Playbook to Prep Product Teams for High-Impact AI Collaboration

    Context is king in AI-powered product work—and I felt that deeply while digging into “Context is King – All Things Product Podcast with Teresa Torres & Petra Wille.” The conversation affirmed a truth I see daily: AI becomes a powerful teammate only when we give it the right context, just as we do with empowered product teams. When we treat AI like a colleague joining mid-flight—without our company history, industry nuances, or strategy—we instantly unlock better outcomes.

    Listen to this episode on: Spotify | Apple Podcasts

    Here’s what stood out and how I’m applying it. First, most AI outputs fail without proper context. That’s not a model problem; it’s a leadership problem. Thinking of AI like onboarding a new intern is the right mental model—start with the minimum viable context, then iterate. Practical first steps matter: decision logs, clear success metrics, and structured documentation. The art is balancing enough context to guide performance without overloading the system. The parallels are striking: the way we create strategic context for product trios and teams is the same way we’ll empower agentic AI systems.

    In my teams, we prepare for AI collaboration by operationalizing context. We keep decision logs to capture the why behind choices, use outcome-based success metrics (not just output), and maintain machine-readable documentation that LLMs for product managers can parse reliably. We define guardrails up front—constraints, customer segments, privacy-by-design considerations, and the non-goals that often trip up gen ai. This foundation turns AI from a novelty into a force multiplier for product discovery and product roadmapping and sprint planning.

    I use a simple “context pack” to onboard AI agents and teammates alike: 1) business goals and outcomes, 2) constraints and guardrails, 3) canonical artifacts (like PRDs, journey maps, interview notes), 4) domain vocabulary and definitions, and 5) operating procedures (how we make decisions, when to escalate, what good looks like). Start small, then refine as the AI demonstrates capability. This mirrors great onboarding—and it works just as well for agentic AI as it does for humans.

    Not all context is helpful. More isn’t better; the minimum effective context is. I resist the urge to dump our entire Confluence on an AI system. Instead, I progressively reveal relevant details—just like I would with a new PM on a complex problem space. This keeps signals high, noise low, and performance measurable against clear success metrics.

    If your org isn’t adopting AI yet, don’t wait. You can become AI-ready now by documenting strategic intent, decision rationale, and definitions in structured, searchable, machine-readable ways. Treat this as core AI Strategy work that strengthens empowered product teams—regardless of tooling—while building your AI product toolbox for tomorrow.

    For those who want to explore further, these resources and mentions are a strong complement to the episode’s themes.

    Follow Teresa Torres: https://ProductTalk.org

    Follow Petra Wille: https://Petra-Wille.com

    Agentic AI

    Teresa’s new podcast, Just Now Possible in Youtube, Apple Podcast, and Spotify

    Petra’s Coaching Packages

    ChatGPT

    Henrik Kniberg’s talk at Product at Heart on treating AI agents like interns

    Teresa’s webinars on how she built the Product Talk Interview Coach: Behind the Scenes: Building the Product Talk Interview Coach and How I Designed & Implemented Evals for Product Talk’s Interview Coach

    Josh Seiden’s blog series about AI

    Teresa’s new blog posts: 15 Ways to Use AI at Home (and Fill Your AI Product Toolbox) and 21 Ways to Use AI at Work (And Build Your AI Product Toolbox)

    Petra's new blog post: Why Context, Not Just Data, Will Define AI-Ready Product Teams

    Have thoughts on this episode or how you’re preparing your teams to collaborate with AI? Leave a comment below—let’s compare playbooks and level up together.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Beyond Digital: How AI Transformation Builds Adaptive, Intelligent Organizations That Win

    Beyond Digital: How AI Transformation Builds Adaptive, Intelligent Organizations That Win

    Digital transformation rewired our systems; AI transformation rewires how we learn, decide, and compete. “AI transformation goes beyond automation to create adaptive, intelligent organizations. Discover why it’s the next imperative and how to measure success.” That statement captures what I experience daily: we’re moving from scripted workflows to living systems that improve with every interaction.

    When I talk about AI transformation, I’m not describing a tool rollout. I’m describing an operating model where data, models, and product strategy converge to create compounding advantage. In practice, that means agentic AI orchestrating tasks, robust data governance and privacy-by-design from day one, and empowered product teams that ship, measure, and iterate at high tempo.

    The imperative is strategic, not merely technical. Markets are compressing cycle times, and customers now expect intelligent experiences by default. Organizations that master AI Strategy and product-led growth will set the pace—using AI for competitive differentiation rather than feature parity.

    This shift changes how I build teams and backlogs. I lean on product trios, forward deployed engineers, and tight product discovery loops to reduce uncertainty early. We design for resilience and learning: human-in-the-loop feedback, clear escalation paths, and telemetry that turns every interaction into a hypothesis test.

    Governance is a first-class feature. AI risk management, data governance, and threat detection and response sit alongside performance metrics in the same dashboard. We codify guardrails—policy, provenance, and permissions—so innovation scales safely and sustainably.

    Measurement is where transformation becomes real. I anchor on outcomes vs output OKRs tied to customer value and revenue impact. At the product layer, I track activation, time-to-value, retention, and adoption by persona. For ML quality, I monitor precision/recall, coverage, hallucination rate, and model drift. In experimentation, A/B testing with a thoughtful minimum detectable effect (MDE) prevents false wins, while Amplitude analytics, Pendo, and Intercom instrumentation expose where guidance or UX writing can unlock activation.

    The fastest wins often start in service and sales. A customer support ai strategy can deflect tickets with high-resolution answers while escalating edge cases to humans with full context. CRM integration with HubSpot and a ChatGPT connector enables reps to generate next-best-actions, summarize calls, and personalize outreach—measurably lifting conversion and lowering cost-to-serve.

    On the build side, LLMs for product managers and gen ai for product prototyping accelerate discovery cycles. I use CustomGPT workflows to validate value propositions quickly, then harden successful flows with engineering. Throughout, product positioning and a crisp value proposition ensure that what we ship is understandable, differentiated, and priced to match ROI—consumption SaaS pricing when usage scales value.

    If you’re getting started, begin with a single, high-frequency journey, instrument it deeply, and publish transparent OKRs. Pair empowered product teams with clear governance, and iterate toward agentic AI experiences. The payoff isn’t a one-time launch; it’s a continuously learning system—and a culture—that compounds advantage release after release.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image