Category: AI Strategy

  • Mastering AI Evals: The Essential Product Manager Skill to Ship Safer, Smarter AI

    Mastering AI Evals: The Essential Product Manager Skill to Ship Safer, Smarter AI

    In every AI-powered product I ship, evaluation is the difference between a compelling demo and a dependable customer experience. AI evaluation isn’t a nice-to-have; it’s a core product management competency that shapes quality, safety, and business outcomes from the first prototype to scale.

    When I talk about AI evaluation, I mean a disciplined, repeatable way to measure model behavior across quality, safety, reliability, latency, and cost. Gen AI has changed the cadence of product decisions—models evolve weekly, prompts drift under real-world load, and edge cases multiply. Without rigorous evals, we risk shipping unpredictability.

    My goal in this piece is simple: “Dive deep into AI evals, why they matter for PMs today, and how to master them with clear steps, examples, and best practices.” If you’re leading product strategy for LLMs, agentic AI, or applied AI features, this is the playbook I rely on.

    Why this matters now: customers don’t judge AI by benchmarks, they judge by trust—did it help me, was it safe, was it fast? Strong AI evals let me set outcomes vs output OKRs, quantify risk, and make transparent trade-offs between accuracy, latency, and cost. They also give engineering and design clear guardrails to move fast without breaking user trust.

    Step 1: Define the product problem and success metrics. I start by tying AI metrics to business outcomes—resolution rate, deflection rate, revenue lift, time-to-value—and include model-centric measures like hallucination rate, harmful content rate, latency, and token cost. This keeps experiments anchored to impact, not just model scores.

    Step 2: Build a high-signal golden dataset. I curate real, anonymized user prompts from discovery and support channels, then add adversarial and long-tail cases. For generative tasks, I create rubric-based criteria for correctness, helpfulness, tone, and safety. This dataset becomes my regression suite as prompts, RAG pipelines, or models change.

    Step 3: Choose the right evaluation methods. I combine deterministic unit tests for rules with LLM-as-judge scoring, pairwise preference tests for prompt variants, human review for critical flows, and red teaming for safety. I also apply privacy-by-design and strong data governance to ensure eval data handling meets compliance and customer expectations.

    Step 4: Operationalize with CI/CD. Evals run automatically on every prompt, retrieval, or model update, with pass/fail gates and alerting. I track results in a unified analytics platform so product, engineering, and go-to-market teams see the same truth. If a change regresses key thresholds, we pause rollout or roll back.

    Step 5: Optimize the cost–quality–latency triangle. Real products live within constraints. I analyze token budgets, caching strategies, model selection (e.g., small for classification, larger for complex generation), prompt structure, retrieval quality, and function-calling patterns. For agentic AI, I evaluate tool-use correctness and task completion reliability, not just text quality.

    Step 6: Close the loop with experimentation. Offline evals get me confidence; online A/B testing validates business impact. I design tests with a clear minimum detectable effect (MDE), guard for novelty bias, and instrument activation, retention, and satisfaction in Amplitude or Pendo. Agent analytics help me pinpoint where users succeed or get stuck.

    Step 7: Govern responsibly. I maintain model cards, decision logs, and incident playbooks. For customer-facing assistants, I gate risky actions, log explanations, and add human-in-the-loop escalation. AI risk management isn’t bureaucracy—it’s how we earn trust at scale.

    A concrete example: building a customer support assistant. My success metrics include deflection rate, first-contact resolution, median response latency, and safe action rate. The golden dataset blends common queries, billing edge cases, account-specific retrieval checks, and adversarial prompts. Evals measure factuality against a knowledge base, tone alignment with brand guidelines, and safe tool use for CRM integration. Only after passing offline gates do we A/B test deflection and CSAT in production.

    Common pitfalls I watch for: overfitting prompts to a tiny test set, relying solely on LLM-as-judge without human calibration, skipping safety tests when latency rises, and treating evaluations as a one-time launch task. The antidote is simple—regularly refresh datasets, diversify eval methods, and wire evals into the same release discipline as any core feature.

    The payoff is compounding. With strong AI evals, we ship confidently, reduce incident rates, accelerate iteration, and communicate trade-offs clearly to stakeholders. More importantly, we build products customers trust—because quality isn’t a promise, it’s a practice we can measure every day.


    Inspired by this post on Product School.


    Book a consult png image
  • Innovation Strategy in the Age of AI: Proven Playbooks, Real-World Examples, and What Works Now

    Innovation Strategy in the Age of AI: Proven Playbooks, Real-World Examples, and What Works Now

    AI has rewritten the rules of how we create value, and I’ve watched the most resilient organizations treat innovation as a disciplined, outcomes-driven capability—not a one-off initiative. In my role leading product teams, I’ve refined a practical approach that blends rigorous product management with an adaptive AI Strategy so we can ship faster, learn faster, and de-risk smarter.

    Learn what an innovation strategy is, how to build one, which types to use, and see real examples that drive meaningful change.

    At its core, an innovation strategy is the intentional system that aligns vision, portfolio bets, and execution mechanics to measurable business outcomes. I anchor this in outcomes vs output OKRs, ensuring every experiment, feature, and GTM motion ties to a clear value proposition and reinforces hard-won product-market fit lessons rather than chasing novelty.

    I design portfolios around three types of innovation that work well in the age of AI. First, core optimization: drive compounding gains with CI/CD, DORA metrics, and A/B testing to improve activation, retention, and profitability. Second, adjacent expansion: extend value via new segments, channels, or use cases—often enabled by product-led growth tactics like in-app guides and product tours. Third, transformational bets: leverage gen ai and agentic AI to create step-change capabilities while proactively addressing AI risk management, data governance, and privacy-by-design.

    Building the strategy starts with empowered product teams and product trios who run continuous product discovery to validate problems before validating solutions. I keep discovery tight with a minimum detectable effect (MDE), instrument the journey with a unified analytics platform, and thread learnings into product roadmapping and sprint planning so we prioritize the smallest, fastest path to decision-quality data.

    On the AI front, my operating model combines an AI product toolbox (prompt patterns, evaluation harnesses, and safety rails) with LLMs for product managers to accelerate research, prototyping, and content generation. We standardize CustomGPT workflows where appropriate, define CRM integration and data boundaries early, and adopt a clear build/partner/buy decision tree to protect focus and speed without compromising risk posture.

    Here are real patterns that consistently deliver meaningful change. We’ve used generative AI for product prototyping to compress concept validation from weeks to days, then confirmed impact with rapid A/B testing tied to MDE. We’ve implemented agentic AI for customer support triage to reduce response times and free human agents for high-complexity cases, all under strict data governance. And we’ve paired new AI features with a focused go-to-market strategy—clear positioning, sharp onboarding, and outcome-centric messaging—to accelerate user activation.

    Measurement makes or breaks innovation. I combine deployment frequency and DORA metrics on the engineering side with activation, retention analysis, and value-moment telemetry on the product side. QBRs vs OKRs alignment keeps leadership focused on outcomes, while experiment scorecards ensure we learn even when results are neutral. The goal is to increase the rate of validated learning across the portfolio, not just ship more.

    Governance is a feature, not a tax. We embed threat detection and response, privacy-by-design, and transparent data policies from day one. Stakeholder management and board management stay tight with simple narratives: the bet, the hypothesis, the metric, the MDE, the timeline, and the kill-or-scale criteria. That clarity builds trust and protects speed.

    If you’re recalibrating your innovation strategy right now, start small and deliberate: define the outcomes, select one core, one adjacent, and one transformational bet, and wire in learning loops from discovery to delivery. With empowered product teams, disciplined analytics, and a pragmatic AI Strategy, you can move from interesting ideas to durable competitive differentiation—faster and with far less risk.


    Inspired by this post on Product School.


    Book a consult png image
  • 11 Unconventional Product Management Moves That Supercharge Strategy, Teams, and Impact

    11 Unconventional Product Management Moves That Supercharge Strategy, Teams, and Impact

    I’ve spent years leading product strategy at HighLevel, Inc., and the patterns I rely on don’t always show up in the usual playbooks. In practice, the moves that compound impact are often the quiet ones—unsexy, rigorous, and relentlessly customer-centered.

    These product management best practices challenge the norm. Read and you’ll sharpen your strategy and elevate your impact beyond just features.

    What follows are the 11 under-discussed habits I return to when the stakes are high and the path is foggy. They help me ship meaningful outcomes, develop empowered product teams, and align our go-to-market strategy without getting trapped in feature theater.

    Best practice 1 — Anchor goals to outcomes, not output. I frame “outcomes vs output OKRs” so teams focus on behavior change and business results, not ticket counts. Activation rate, retained revenue, and cycle time beat launch volume every time.

    Best practice 2 — Run discovery with product trios. I put design, engineering, and product in the same room early, often with forward deployed engineers. This trio model accelerates product discovery, uncovers risks faster, and builds shared ownership.

    Best practice 3 — Decide from first principles, then apply the try do consider framework. I separate points of parity from true differentiation and protect our value proposition. The result: clearer choices, less rework, and a strategy that compounds.

    Best practice 4 — Be statistically honest with A/B testing. I size experiments by minimum detectable effect (MDE), guard against peeking, and follow through with retention analysis. This discipline prevents false positives from steering the roadmap.

    Best practice 5 — Treat delivery as a learning engine. CI/CD, feature flags, and progressive rollouts let us learn without gambling the brand. I track deployment frequency and DORA metrics to raise quality while increasing the tempo of validated learning.

    Best practice 6 — Build a unified analytics backbone. I connect product telemetry to a unified analytics platform and CRM integration so we can see the full funnel. Amplitude analytics, Pendo, and Intercom help us tie behaviors to value realization and inform prioritization.

    Best practice 7 — Make onboarding a first-class product. In-app guides, product tours, UX writing, and thoughtful tooltip design shorten time-to-value and lift user activation. This is the quiet lever behind sustainable product-led growth.

    Best practice 8 — Systematize stakeholder management. I pair QBRs vs OKRs to balance narrative and numbers, keep board management transparent, and align sequencing through product roadmapping and sprint planning. Clear rituals minimize thrash and build trust.

    Best practice 9 — Connect strategy to positioning early. I pressure-test product positioning, clarify our value proposition, and deliberately choose which points of parity to match and which to ignore. This reduces me-too work and sharpens competitive differentiation.

    Best practice 10 — Use AI as a responsible force multiplier. I employ LLMs for product managers and gen ai for product prototyping while enforcing privacy-by-design, AI risk management, and strong data governance. The goal is leverage without compromising trust.

    Best practice 11 — Write it down to move faster together. I keep crisp decision logs, assumptions, and pre-mortems so empowered product teams can act with context. This simple habit makes onboarding easy, reduces re-litigating, and keeps momentum through change.

    When I apply these practices consistently, the team ships less noise and more value. The compounding effect is real: clearer priorities, faster learning cycles, stronger alignment, and a roadmap that tells a coherent story from discovery to adoption.


    Inspired by this post on Product School.


    Book a consult png image
  • Inside Our AI-Native Product Training: Accelerating Adoption, ROI, and Measurable Growth

    Inside Our AI-Native Product Training: Accelerating Adoption, ROI, and Measurable Growth

    AI is reshaping how we build products, learn new skills, and lead teams. I’ve seen great organizations stall when training lags behind technology. That’s why we rebuilt our approach to product training from first principles—so every team can operate confidently with AI at the core of their product management practice.

    Our north star is simple: operationalize AI Strategy for every product manager and cross-functional partner. We designed a learning system that shortens time-to-adoption, amplifies ROI, and links capability-building to clear, measurable outcomes.

    Product School transforms product teams into AI-native organizations with training that accelerates adoption, maximizes ROI, and drives measurable growth.

    That ambition informs how we design curriculum and delivery. We combine gen AI foundations, LLMs for product managers, applied product discovery, product roadmapping and sprint planning, and product management leadership. The learning experience blends case-based instruction with simulations and real product data so teams practice exactly how they’ll perform.

    To ensure knowledge becomes behavior, we embed training directly into product workflows: in-app guides, product tours, onboarding sequences, and user activation loops tied to outcomes vs output OKRs. This closes the gap between knowing and doing, and it makes capability visible in the metrics that matter.

    We focus on empowering product teams—clarifying decision rights, elevating accountability, and creating feedback loops that enable faster iteration. When teams own their roadmap and understand the AI building blocks, they move from experimentation to repeatable, scalable value creation.

    Measurement is built in from day one. We instrument for adoption, time-to-first-value, feature activation, and ROI attribution, enabling continuous improvement and transparent stakeholder communication. The result is a system that compounds learning into performance.

    This is how we’re building AI-native organizations: practical, data-informed, and outcomes-driven. It’s not just training—it’s an operating model that helps teams learn faster, ship smarter, and grow with confidence.


    Inspired by this post on Product School.


    Book a consult png image
  • 9 Corporate Innovation Trends Redefining Business—and How I’m Turning Them into Wins

    9 Corporate Innovation Trends Redefining Business—and How I’m Turning Them into Wins

    Corporate innovation isn’t a side project anymore—it’s the operating system for how we build, scale, and win. In my product leadership work, I’ve watched the pace of change accelerate across every function, from engineering and data to go-to-market and customer success. The companies pulling ahead are the ones translating trends into execution with clarity, speed, and measurable outcomes.

    We researched corporate innovation to reveal top trends, types, and examples that can spark growth and keep your business ahead.

    Here’s how I’m seeing that play out right now—and the nine trends I’m actively using to guide roadmaps, prioritize bets, and ship value faster.

    Trend 1: Generative AI is moving from pilots to products. Teams are evolving beyond demos into durable capabilities powered by gen ai, LLMs for product managers, and agentic AI patterns that automate workflows end-to-end. The winners pair bold AI Strategy with AI risk management, privacy-by-design, and clear value propositions so customers trust what we ship and can see its impact on outcomes, not just outputs.

    Trend 2: Product-led growth is becoming the default go-to-market motion. I’m doubling down on onboarding, in-app guides, product tours, and activation loops that reduce time-to-value. We back this with disciplined A/B testing, well-chosen minimum detectable effect (MDE), and retention analysis to prove what actually moves the needle. PLG isn’t a tactic—it’s a cultural shift toward continuous learning and self-serve experience design.

    Trend 3: Unified analytics and experimentation are the new backbone. A unified analytics platform, instrumented with tools like Amplitude analytics, Pendo, and CRM integration via HubSpot or Intercom, gives us a single source of truth from acquisition through expansion. I push teams to connect user journeys to revenue and to operationalize insights into roadmapping and sprint planning—not monthly reports that sit on a shelf.

    Trend 4: Outcome-driven operating models are replacing feature factories. We align on outcomes vs output OKRs, empower product teams, and structure product trios to balance customer insight, technical feasibility, and commercial impact. First principles decision making helps us cut through noise, set sharper points of parity, and focus on differentiation that customers will pay for.

    Trend 5: Velocity and reliability matter more than ever in engineering. Continuous delivery via CI/CD, healthy deployment frequency, and DORA metrics are my leading indicators for a team’s ability to learn fast. I’ve seen forward deployed engineers and thoughtful developer evangelism tighten the feedback loop with customers and speed up iteration without compromising quality.

    Trend 6: Data governance and security are strategic differentiators. Trust is a product feature. I prioritize data governance, cybersecurity, and threat detection and response alongside usability. Privacy-by-design isn’t a compliance checkbox; it’s table stakes for enterprise adoption and a durable moat when paired with transparent controls and auditability.

    Trend 7: Pricing and packaging innovation is unlocking growth. We’re testing SaaS pricing models, including consumption SaaS pricing, to align value delivered with value captured. Clear articulation of the value proposition and thoughtful packaging reduce friction in sales and support product-led expansion. Pricing experiments belong in the product backlog—not just in finance spreadsheets.

    Trend 8: Customer-in-the-loop discovery is the fastest path to relevance. I treat product discovery as a continuous practice, weaving QBR-style business reviews into roadmaps and using stakeholder management to align incentives across sales, success, and product. Customer support ai strategy helps surface high-signal insights from tickets and conversations, turning support into a discovery engine.

    Trend 9: Open platforms and ecosystems amplify innovation. From API-first thinking and ChatGPT connector patterns to integrations that meet customers where they work, ecosystems drive stickiness and reduce time-to-value. The strongest roadmaps combine a focused core with extensibility that partners and customers can build on.

    How to act now: I recommend a simple try do consider framework. Try one high-conviction AI use case with clear guardrails. Do instrumented experiments across onboarding and activation to fuel product-led growth. Consider pricing and packaging tests tied to measurable outcomes. With disciplined learning cycles and empowered teams, these trends stop being headlines—and start becoming compounding advantages.

    Innovation favors teams that ship, learn, and adapt. If these trends are on your roadmap, align them to outcomes, measure obsessively, and keep customers in the loop. That’s how we turn momentum into durable growth.


    Inspired by this post on Product School.


    Book a consult png image
  • AI vs. Product Managers by 2035: What Will Change—and How to Future‑Proof Your Career

    AI vs. Product Managers by 2035: What Will Change—and How to Future‑Proof Your Career

    Will AI replace product managers, or simply transform their role? Discover what AI can and cannot do, plus insights from PMs on the future of work.

    I’m asked this question in nearly every leadership meeting now, and my answer is consistent: AI won’t replace great product managers by 2035—but it will radically reshape how we operate. The PMs who thrive will pair sharp product judgment with an intentional AI Strategy and a practical AI product toolbox, unlocking speed, clarity, and scale without sacrificing vision.

    Here’s what AI already does well for us today. With LLMs for product managers, I can synthesize customer feedback at scale, draft PRDs and acceptance criteria, transform notes into user stories, and even auto-generate experiment plans with a minimum detectable effect (MDE) calculation. When I connect these models to Amplitude analytics, Pendo, Intercom, and HubSpot through a unified analytics platform and CRM integration, I accelerate discovery, prioritize confidently, and tighten the loop between signal and action. CustomGPT workflows now handle routine backlog grooming, competitive landscaping, and early concept testing, freeing my team to focus on higher-order decisions.

    By 2035, I expect agentic AI to operate as an execution co-pilot: autonomously scheduling A/B testing, launching targeted in-app guides and product tours, monitoring user activation and onboarding funnels, and raising anomalies via Agent Analytics long before a dashboard review. These systems will propose playbooks, draft UX writing and tooltip design, and recommend next-best actions—then wait for human approval when stakes are high. Think of it as the ultimate forward deployed engineer for operational work, working within clear guardrails.

    What AI cannot do—and is unlikely to master soon—is the essence of product leadership. It won’t craft a resonant value proposition for a new segment, define points of parity vs. competitive differentiation, or set outcomes vs output OKRs that align messy stakeholder incentives. It won’t navigate board management, reconcile conflicting narratives from sales and engineering, or make ethically grounded trade-offs under uncertainty. That’s where privacy-by-design, data governance, and AI risk management converge with human judgment, context, and accountability.

    As the tooling matures, the PM role will tilt from artifact production to decision quality. We’ll spend less time writing and more time deciding: which bets to place, which risks to accept, and where to concentrate our empowered product teams. Product discovery deepens, product positioning sharpens, and product roadmapping and sprint planning become faster and more adaptable—because the busywork is handled, not because the thinking is outsourced.

    Practically, I’m evolving team design and rituals now. We operate as product trios, pair PMs with forward deployed engineers, and embed gen ai into daily workflows. We standardize prompts, set review thresholds, and instrument everything for observability. Our stakeholder management improves because we bring clearer narrative artifacts—and because we can test assumptions earlier and share evidence in real time.

    If you’re building your own AI Strategy, start with three tracks. First, foundations: instrument data pipelines, establish data governance, and codify privacy-by-design. Second, acceleration: deploy CustomGPT workflows for research synthesis, PRD drafting, retention analysis, and experiment design, while keeping humans in the loop for decisions. Third, automation with guardrails: let agentic AI run low-risk playbooks (in-app guides, content suggestions, ops checks) and require human approval for anything customer-facing and irreversible.

    Future-proofing your career is about skill stacking. Double down on first principles decision making, storytelling, and cross-functional influence, and pair that with hands-on fluency in gen ai, prompt engineering, model evaluation, and risk controls. Learn how to frame trade-offs, architect outcomes vs output OKRs, and translate strategy into experiments that AI can help execute. The combination—human judgment plus machine speed—is the new competitive advantage.

    So, will AI replace product managers by 2035? No. It will transform average PMs into good ones and great PMs into force multipliers. The ones who lead will embrace AI as leverage, cultivate empowered product teams, and stay relentlessly focused on customer outcomes. The future belongs to product creators who can wield intelligent tools without surrendering accountability for the product’s direction and impact.


    Inspired by this post on Product School.


    Book a consult png image
  • From Chaos to Consistency: How I Built a Scalable AI Content Design Agent with RAG

    From Chaos to Consistency: How I Built a Scalable AI Content Design Agent with RAG

    It’s Monday morning, and my Slack and email are already overflowing with content requests: “Can you review this flow?”; “Can you rewrite this screen?”; “Can you name this feature?” I’m not freshly back from holiday—this is just a regular work week kicking off. If you’ve ever been a solo content designer supporting multiple teams, you’ll recognize the pressure. The pipeline for content in product design is always full, and the demand for expertise never stops.

    Fixing this isn’t just a matter of better time management or incremental process tweaks. To truly scale, I needed to extend my reach by bringing AI into the design process—without sacrificing judgment, standards, or quality. That Monday morning, I realized I had to scale my skills, my judgment, and our systems, not just my calendar.

    Building AI is fundamentally about building systems. I wanted to use AI to scale myself without devaluing critical thinking or flooding the product with generic, verbose content. I also knew a useful AI tool must do more than spit out microcopy—it has to plug into a system we can continually shape. As a content designer, the system is always the starting point. Strong design systems create strong content standards; then AI agents can produce content that meets those standards at speed, freeing me from the bulk of standardized work. That’s not a threat—it’s an advantage. To instruct AI well, our systems must be well constructed.

    I often think about this work like a bakery. You need a recipe before you can make a loaf of bread. Most interface content churns out the same loaf, day in and day out. It’s better for the master bakers to focus on the unique, custom bakes—and how the recipe needs to change. With that mindset, I set out to build an AI content design agent.

    Screenshot of a content design assistant interface titled VERBI, showing a chat input field, quick-start prompts like 'Can you write this?', and links to view permissions and agent setup in draft mode.
    Inside the Content Design Agent workspace, a clean chat UI titled VERBI pairs a central prompt box with chips for writing, editing, and reviews, plus clear controls to view permissions and open the agent setup for product teams.

    When I started this project back in May 2025, many LLMs still had frustrating limitations. Google Gemini let me build a custom Gem agent, but I couldn’t share it with other users. ChatGPT could be customized, but only with static files: I couldn’t point it to live, updatable URL sources. I settled on Glean for three simple reasons: everyone at the company had access; Glean could access all internal documentation and treat URLs as sources of truth; and its then-new Agents feature made AI search customizable. Configuring an agent in Glean is straightforward—you choose a trigger, a set of prompts, and a set of actions—but first I needed to get the inputs right.

    AI agents need focus. We had a wealth of internal information at Intercom, but not all of it was current or reliable. I curated exactly what the agent could access and assembled a tightly governed knowledge collection in Glean. Only essential information made the cut: the Intercom style guide—our definitive house style, including regularly-broken rules like “always write in US English” and “use sentence case everywhere”; tone of voice guidance for how we show up across mediums; a product glossary with hundreds of feature names and writing conventions; a monetization glossary for prices, plans, and add-ons; product marketing messaging guides with positioning for every feature and launch; core research insights across the product; and fin.ai and intercom.com/suite as the official, most up-to-date messaging sources.

    This is classic RAG (retrieval-augmented generation) in action, ensuring every answer is grounded in approved sources of truth. With the collection in place, I instructed the agent to prioritize these resources above anything else.

    Screenshot of a no-code workflow builder for a Content Design Agent, with cards for Trigger, Company search, and Respond, plus a sidebar checklist titled The basics to start from scratch.
    Step into a clean, no-code builder that shows how to assemble a Content Design Agent: kick off with a chat-trigger, run a company search, then respond with expert guidance, all guided by a simple starter checklist.

    Then came the fun part—building and branding the agent. “Content Design Assistant” felt bland, so I named it VERBI, a nod to its “verbal” design job. When people interact with VERBI, they usually begin with a question, but the intent varies widely. I defined a set of task prompts to guide expectations and outputs: “Can you write this?”; “Can you edit this?”; “Can you review this?”; “Can you name this?”; “Give me options”; “Give me guidance”; “Give me strategy”; “Give me research.” This mirrors the real breadth of content design, from creation to critique to discovery.

    To manage responses, VERBI needed three things: start with a specific task prompt; understand how to draw on the right resources each time; and connect with other systems. With task prompts defined, I wrote a detailed system prompt covering the essentials. Role: you are a content designer, supporting product designers. Employer: Intercom (consisting of Fin AI Agent and our next-gen Helpdesk). Resources: content design collection, research collection, Storybook design system. Tone of voice: follow a specific tone for our UI, adjust the tone for everything else. Components: for UI, use the specific guidelines in our design system only. Use cases: writing, editing, critiquing, naming, researching, and more.

    One connection mattered most: our design system, recently rebranded as “Surge.” Surge contains detailed content guidelines for every component in our product UI, from accordions and banners to tabs and tooltips. That granularity took months of human effort to codify, and it paid off. Designers no longer guess how to write for a toggle, a button, or a tooltip—and now VERBI understands and enforces those rules, too. A great content design assistant isn’t just a clever system prompt; it needs deep, component-level guidance to retrieve.

    Design system documentation page for a Badge component, with a left navigation of UI elements and a main panel showing content guidelines, examples of statuses, and a color‑coded table of label types.
    UI documentation showcases the Badge component’s content rules, teaching how to name statuses, define types, and apply color so labels read clearly. A handy visual for building a content design agent and ensuring consistent product messaging.

    Accessing the design system wasn’t simple at first. It lives in Storybook, which Glean couldn’t access directly. I started by scraping guidance from Storybook into an HTML file with Cursor and uploading it to VERBI—a functional but clunky workaround that required re-scraping every few days. Then our IT team stepped in. They used the Glean Indexing API to turn Storybook into a live data source. Now VERBI connects to Storybook directly. Ask it something ultra-specific, like the correct date format for Japan, and it returns the right answer. That integration elevated the agent from helpful to indispensable—human-level precision, 24/7, at scale.

    With prompts and resources in place, I launched VERBI and pressure-tested it. It was accurate and well-informed most of the time, but like any AI agent, it had quirks. I needed it to act as a gatekeeper, not a brainstorming partner that might bend rules or invent new ones. So I added a few explicit guardrails to the system prompt. Stopping sycophancy: “Inform, challenge, and assist. Never placate. Don’t agree by default. If something’s wrong, say so. Challenge assumptions.” Halting hallucinations: “If you don’t find the information required in our resources, say you don’t know the answer. Don’t guess and don’t give answers based on general knowledge.” Avoiding verbosity: “Keep answers short and to the point. Cut the fluff. Skip all niceties and social padding. Only give longer answers if the user asks you to.” These constraints keep responses crisp, correct, and consistent. Like any living system, the prompt needs occasional tune-ups, but the maintenance is minor compared to the upside.

    Where we are now: VERBI has been triggered 700+ times since launch. The benefits are tangible. For me, quality scales without constant policing; repetitive questions about naming, style, or punctuation have dropped significantly. I reclaim time because the agent drafts and checks V1 content across teams, enabling me to focus on higher-impact work. For the design team, iteration is faster, confidence is higher, and strategic clarity improves because shared language and grounded guidelines make decisions easier and more consistent.

    I used to spend too much time mopping up basic content mistakes and untangling spaghetti-like UI copy prone to human error. VERBI removes those errors at the source. The real advantage is speed: we get from blank slate to a high-quality first draft quickly, which means we can spend our energy deciding whether the content is right, not just “good enough.” Design is the whole interface—words, visuals, interactions—so reviews now happen with real content, never “copy TBD.” Our principle to sweat the details applies equally whether work is human-made or AI-assisted.

    Knee-jerk critiques of AI-driven content design often assume teams generate content from nothing and ship it. In reality, great AI is the outcome of great human decisions and strong systems. Its value is pulling us together faster—getting us to a complete, standards-compliant design we can review as a team before sharing it with the world. That’s how AI helps us win: by turning chaos into consistency, and consistency into velocity.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • What I Learned from Trainline’s Agentic AI: Building a Trusted Travel Assistant at Scale

    What I Learned from Trainline’s Agentic AI: Building a Trusted Travel Assistant at Scale

    Over the past year, I’ve been shipping agentic AI into production and coaching product teams on what it really takes to make these systems trustworthy in the wild. One story that crystallizes the playbook comes from Trainline’s move to an agentic architecture for travel assistance—an approach that mirrors what I’ve seen work in high-stakes, real-time customer experiences.

    Trainline—the world’s leading rail and coach platform—helps millions of travelers get from point A to point B. Now, they’re using AI to make every step of the journey smoother.

    I studied how "David Eason (Principal Product Manager) Billie Bradley (Product Manager), and Matt Farrelly (Head of AI and Machine Learning)" approached the build of "Travel Assistant, an AI-powered travel companion that helps customers navigate disruptions, find real-time answers, and travel with confidence." Their work exemplifies the kind of end-to-end thinking required to move beyond demos into dependable, on-the-go assistance.

    They share how they: Identified underserved traveler needs beyond ticketing; Built a fully agentic system from day one, combining orchestration, tools, and reasoning loops; Designed layered guardrails for safety, grounding, and human handoff; Expanded from 450 to 700,000 curated pages of information for retrieval; Developed LLM-as-judge evals and a custom user context simulator to measure quality in real-time; Balanced latency, UX, and reliability to make AI assistance feel trustworthy on the go.

    I align strongly with their core takeaways: "AI assistants need both scalable reasoning and deep domain context to be useful." "Tool design and guardrails are as critical as prompt design in agent systems." "LLM-as-judge evals make it possible to measure open-ended systems without massive labeling costs." And perhaps most importantly, "Even legacy companies can move fast when they embrace experimentation and tight PM–engineering collaboration."

    From an AI strategy perspective, starting "fully agentic" was the right call. When the problem space is dynamic—disruptions, route changes, fare conditions—reasoning loops and orchestration aren’t luxuries; they’re table stakes. Tool selection becomes product design: you need the right retrieval interfaces, constraint-aware planners, and API contracts that are resilient to partial failures. Layered guardrails for safety, grounding, and human handoff reduce hallucination risk while preserving responsiveness—critical when users are standing on a platform waiting for an answer.

    The retrieval scale-up—"Expanded from 450 to 700,000 curated pages of information for retrieval"—is a classic inflection point. I’ve seen teams stall here when they treat content growth as a pure indexing problem. The winning move is curation and structure: normalize sources, encode policy-level constraints, and align retrieval chunks to decision boundaries the agent actually uses. That’s how you keep precision high while coverage explodes.

    Evaluation is where most open-ended assistants fail quietly, which is why I was encouraged to see "Developed LLM-as-judge evals and a custom user context simulator to measure quality in real-time." In practice, LLM-as-judge gives you scalable, scenario-based scoring without prohibitive labeling, while a user context simulator surfaces regressions tied to persona, itinerary state, and device constraints. The combination closes the loop between model behavior, tool layer changes, and UX outcomes.

    On product delivery, the decision to have the system "Balanced latency, UX, and reliability to make AI assistance feel trustworthy on the go" shows mature prioritization. For travel, trust accrues in seconds: fast-enough responses, graceful degradation when upstream data lags, and explicit handoff when confidence dips. This is where guardrails meet UX writing—clear, bounded language signals competence even when the system defers.

    Finally, the organizational pattern matters. The teams that win in agentic AI are cross-functional, experimentation-driven, and ruthless about instrumentation. Tight PM–engineering collaboration, explicit safety thresholds, and an eval stack that mirrors real user journeys are what turn promising architectures into dependable products.

    It’s a behind-the-scenes look at how an established company is embracing new AI architectures to serve customers at scale.

    If you’re building agentic AI in production, borrow these moves: invest early in tool and guardrail design, scale retrieval with curation not just volume, adopt LLM-as-judge plus context simulation for continuous evaluation, and treat latency and reliability as core product requirements—not afterthoughts. That’s how you ship AI assistance that customers trust when it matters most.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Why We’re Building Our Next AI R&D Hub in Berlin—and Hiring 100 to Power Fin’s Growth

    Why We’re Building Our Next AI R&D Hub in Berlin—and Hiring 100 to Power Fin’s Growth

    I’m excited to share that we’re opening our next R&D hub in Berlin to support significant investment in our AI customer service platform, Intercom, and market-leading AI Agent, Fin. We intend to hire 100 people in Berlin over the year ahead across engineering, AI, data science, product, and design. This move reflects our AI Strategy, our commitment to product management leadership, and our focus on building enduring product-led growth.

    We believe that in a short number of years, the vast majority of customer service will be done by AI. Fin is already the world’s best Customer Service Agent. At Pioneer, our recent summit for AI customer service leaders in NYC, we talked about how Fin will become a true end-to-end Customer Agent, extending far beyond service. We showcased how companies like WHOOP, Anthropic, and Lightspeed are already pushing Fin in ways that help them grow their business.

    This market opportunity is massive and expanding at unprecedented pace. Our ambition is to earn our place as one of the most successful AI businesses during this wave of AI disruption, and we want more brilliant people on our team to pursue this as aggressively as possible. If you’re motivated by Generative AI, LLMs, and building real products that scale, you’ll find both challenge and impact here.

    We are already on track to be one of the fastest growing private software companies. Fin is the primary contributor to this, and is months away from passing $100m in ARR. So far, more than 7000 businesses have transformed their customer service with Fin, including German companies like electricity provider Ostrom, smart home technology provider tado°, and grocery delivery company Flink, along with global leaders like Vanta, Clay, Lovable, and Miro.

    Why Berlin? We’re drawn to the city’s rare blend of deep technical talent and rich creative culture—within a vibrant, globally connected ecosystem close to our R&D hubs in Dublin and London. It’s a place where top-tier engineers and designers thrive, and where ambitious builders from around the world want to relocate and create category-defining products.

    Orange gradient area chart with a white line and circular markers showing steady growth from about 26% to nearly 70% across monthly labels from May 2023 to Sep 2025, on a light grid with percentage ticks.
    Momentum is building: this month-by-month chart shows a consistent rise from the mid-20s to nearly 70% between May 2023 and Sep 2025—signaling strong progress as we expand engineering, AI, and automation at our new Berlin R&D hub.

    We needed a new location that would sustain the high ambition and standards held by our world-class AI teams in Dublin and London. Berlin has emerged as one of Europe’s hottest centers for AI talent, with a high density of AI-focused startups, applied research labs, and practitioners who bring exceptional literacy, optimism, and ambition. It’s the right accelerator for our AI hiring and a place to bring in brilliant minds to shape the future of our product and business.

    While Intercom’s reach is global with our headquarters in San Francisco, our R&D leadership remains anchored in Dublin, where half of the executive team sits—making Berlin both geographically and strategically an ideal next location for our growth.

    This isn’t our first time expanding our footprint; we previously bet on London and are delighted with how that’s been working. When we shared our Berlin news internally, the energy was palpable, with many teammates volunteering to help spin up the hub successfully—including colleagues who helped make London a big success, like Danny. That level of ownership and momentum is exactly what we aim to cultivate in Berlin.

    We’re looking for people who thrive in a high-intensity, high-ambition, high-standards environment and want to help build one of the world’s best AI companies. For builders like that, the opportunity for impact, growth, and career progression is extraordinary. As with London and Dublin before it, the early Berlin cohort will have a disproportionate influence on team norms, culture, and long-term outcomes. We are in the middle of a huge disruptive wave with AI, and Fin is one of the leading examples of commercially successful AI applications. Joining Intercom is an opportunity to be part of this disruptive wave, and help us build out our vision for Fin becoming the world’s best Customer Agent.

    Four panelists seated on a dark stage during an AI engineering discussion, with on-screen titles above them, at an event announcing a new R&D hub in Berlin.
    On a minimalist stage, four speakers share insights on AI research, automation, and engineering as part of a panel tied to Berlin expansion and the launch of a new European R&D hub.

    There are plenty of AI companies to join, but our technology and culture set us apart. Any AI product is only as good as the AI layer powering it. Ours is industry-leading, built by a highly talented, ambitious, and technical team of over 40 machine learning scientists, engineers, and designers in Europe who continuously optimize Fin’s performance through cutting-edge research, experimentation, and innovation. Fin’s average resolution rate increases 1% every month. That kind of steady, compounding improvement is exactly what great customer support AI strategy looks like in practice.

    We also build in public and share our progress and learnings with the AI community at large. Recently, our Chief AI Officer Fergal Reid and SVP of Engineering Jordan Neill joined leaders from Cognition, Harvey, and Perplexity in San Francisco to share real lessons, challenges, and breakthroughs from building frontier AI products. Our AI team regularly publishes their insights on the AI research blog; from optimizing inference speed and availability, to building our own proprietary models that outperform general purpose models for CX.

    Our AI group and the broader R&D org they operate within work at extraordinary scale and speed. We recognize that moving fast can’t be taken for granted—you must fight for it—and we’re doing just that, embracing the capabilities AI tooling brings us to achieve 2x the throughput. One example of this mindset in practice is us “Betting on the future of frontend at Intercom,” making a technology choice that optimizes for our teams’ ability to build high-quality product, fast.

    Our design and product teams are world-class and forward-thinking; they’re embracing AI to evolve how they work, as shared in our 3-point framework for AI-driven design and recently presented by Emmet Connolly, our SVP of Design, at this year’s Hatch conference in Berlin. As a product leader, I’m grateful to work alongside brilliant product and design thinkers—it gives me confidence that we’re solving the right problems, solving them well, and driving real impact.

    Tech conference collage with a speaker on stage beside four panels: AGI teaser on a tablet, code editor, webcam demo with hand tracking, and a simulation. Banner reads Hatch Conference 2025 Main Stage.
    From live demos to hands-on coding, this snapshot captures the momentum we're bringing to our Berlin R&D hub – AI experiments, hand-tracking prototypes, and simulation tools powering our next wave of engineering.

    We plan to open our Berlin office space in December or January. To get the office started, we’re hiring Senior Product Engineers, Machine Learning Scientists, Product Managers, Senior Product Designers, Engineering Managers, and Data Scientists immediately. If your craft sits at the intersection of LLMs for product managers, agentic AI, and empowered product teams, you’ll be right at home.

    You can learn more about our open roles, company, culture, and locations on our careers site, or feel free to reach out to me, Jordan, Fergal, or Brian directly on LinkedIn if you have any questions.

    Some of our engineering team will also be at LeadDev Berlin on November 3rd—come say hi if you’re attending.

    I’m looking forward to continuing to build Intercom as one of our generation’s best AI companies—and I’m excited for our expansion into Berlin to be a major contribution to that success.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Context Is King: My Playbook to Prep Product Teams for High-Impact AI Collaboration

    Context Is King: My Playbook to Prep Product Teams for High-Impact AI Collaboration

    Context is king in AI-powered product work—and I felt that deeply while digging into “Context is King – All Things Product Podcast with Teresa Torres & Petra Wille.” The conversation affirmed a truth I see daily: AI becomes a powerful teammate only when we give it the right context, just as we do with empowered product teams. When we treat AI like a colleague joining mid-flight—without our company history, industry nuances, or strategy—we instantly unlock better outcomes.

    Listen to this episode on: Spotify | Apple Podcasts

    Here’s what stood out and how I’m applying it. First, most AI outputs fail without proper context. That’s not a model problem; it’s a leadership problem. Thinking of AI like onboarding a new intern is the right mental model—start with the minimum viable context, then iterate. Practical first steps matter: decision logs, clear success metrics, and structured documentation. The art is balancing enough context to guide performance without overloading the system. The parallels are striking: the way we create strategic context for product trios and teams is the same way we’ll empower agentic AI systems.

    In my teams, we prepare for AI collaboration by operationalizing context. We keep decision logs to capture the why behind choices, use outcome-based success metrics (not just output), and maintain machine-readable documentation that LLMs for product managers can parse reliably. We define guardrails up front—constraints, customer segments, privacy-by-design considerations, and the non-goals that often trip up gen ai. This foundation turns AI from a novelty into a force multiplier for product discovery and product roadmapping and sprint planning.

    I use a simple “context pack” to onboard AI agents and teammates alike: 1) business goals and outcomes, 2) constraints and guardrails, 3) canonical artifacts (like PRDs, journey maps, interview notes), 4) domain vocabulary and definitions, and 5) operating procedures (how we make decisions, when to escalate, what good looks like). Start small, then refine as the AI demonstrates capability. This mirrors great onboarding—and it works just as well for agentic AI as it does for humans.

    Not all context is helpful. More isn’t better; the minimum effective context is. I resist the urge to dump our entire Confluence on an AI system. Instead, I progressively reveal relevant details—just like I would with a new PM on a complex problem space. This keeps signals high, noise low, and performance measurable against clear success metrics.

    If your org isn’t adopting AI yet, don’t wait. You can become AI-ready now by documenting strategic intent, decision rationale, and definitions in structured, searchable, machine-readable ways. Treat this as core AI Strategy work that strengthens empowered product teams—regardless of tooling—while building your AI product toolbox for tomorrow.

    For those who want to explore further, these resources and mentions are a strong complement to the episode’s themes.

    Follow Teresa Torres: https://ProductTalk.org

    Follow Petra Wille: https://Petra-Wille.com

    Agentic AI

    Teresa’s new podcast, Just Now Possible in Youtube, Apple Podcast, and Spotify

    Petra’s Coaching Packages

    ChatGPT

    Henrik Kniberg’s talk at Product at Heart on treating AI agents like interns

    Teresa’s webinars on how she built the Product Talk Interview Coach: Behind the Scenes: Building the Product Talk Interview Coach and How I Designed & Implemented Evals for Product Talk’s Interview Coach

    Josh Seiden’s blog series about AI

    Teresa’s new blog posts: 15 Ways to Use AI at Home (and Fill Your AI Product Toolbox) and 21 Ways to Use AI at Work (And Build Your AI Product Toolbox)

    Petra's new blog post: Why Context, Not Just Data, Will Define AI-Ready Product Teams

    Have thoughts on this episode or how you’re preparing your teams to collaborate with AI? Leave a comment below—let’s compare playbooks and level up together.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Beyond Digital: How AI Transformation Builds Adaptive, Intelligent Organizations That Win

    Beyond Digital: How AI Transformation Builds Adaptive, Intelligent Organizations That Win

    Digital transformation rewired our systems; AI transformation rewires how we learn, decide, and compete. “AI transformation goes beyond automation to create adaptive, intelligent organizations. Discover why it’s the next imperative and how to measure success.” That statement captures what I experience daily: we’re moving from scripted workflows to living systems that improve with every interaction.

    When I talk about AI transformation, I’m not describing a tool rollout. I’m describing an operating model where data, models, and product strategy converge to create compounding advantage. In practice, that means agentic AI orchestrating tasks, robust data governance and privacy-by-design from day one, and empowered product teams that ship, measure, and iterate at high tempo.

    The imperative is strategic, not merely technical. Markets are compressing cycle times, and customers now expect intelligent experiences by default. Organizations that master AI Strategy and product-led growth will set the pace—using AI for competitive differentiation rather than feature parity.

    This shift changes how I build teams and backlogs. I lean on product trios, forward deployed engineers, and tight product discovery loops to reduce uncertainty early. We design for resilience and learning: human-in-the-loop feedback, clear escalation paths, and telemetry that turns every interaction into a hypothesis test.

    Governance is a first-class feature. AI risk management, data governance, and threat detection and response sit alongside performance metrics in the same dashboard. We codify guardrails—policy, provenance, and permissions—so innovation scales safely and sustainably.

    Measurement is where transformation becomes real. I anchor on outcomes vs output OKRs tied to customer value and revenue impact. At the product layer, I track activation, time-to-value, retention, and adoption by persona. For ML quality, I monitor precision/recall, coverage, hallucination rate, and model drift. In experimentation, A/B testing with a thoughtful minimum detectable effect (MDE) prevents false wins, while Amplitude analytics, Pendo, and Intercom instrumentation expose where guidance or UX writing can unlock activation.

    The fastest wins often start in service and sales. A customer support ai strategy can deflect tickets with high-resolution answers while escalating edge cases to humans with full context. CRM integration with HubSpot and a ChatGPT connector enables reps to generate next-best-actions, summarize calls, and personalize outreach—measurably lifting conversion and lowering cost-to-serve.

    On the build side, LLMs for product managers and gen ai for product prototyping accelerate discovery cycles. I use CustomGPT workflows to validate value propositions quickly, then harden successful flows with engineering. Throughout, product positioning and a crisp value proposition ensure that what we ship is understandable, differentiated, and priced to match ROI—consumption SaaS pricing when usage scales value.

    If you’re getting started, begin with a single, high-frequency journey, instrument it deeply, and publish transparent OKRs. Pair empowered product teams with clear governance, and iterate toward agentic AI experiences. The payoff isn’t a one-time launch; it’s a continuously learning system—and a culture—that compounds advantage release after release.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image
  • 3 Hidden Hurdles Blocking Effective AI Agents—and How I Turn Them into Business Wins

    3 Hidden Hurdles Blocking Effective AI Agents—and How I Turn Them into Business Wins

    AI agents promise leverage at scale, yet too many proofs of concept stall before they create measurable value. Over the past several launches, I’ve seen the same patterns repeat across IT and operations. The mandate is clear: “Discover three key challenges IT and ops teams face when building and managing AI agents that drive real business wins.” Here’s how I frame the work, where teams get stuck, and the playbook I use to move from demo to durable outcomes.

    Hurdle 1: fragmented data and weak data governance. Agentic AI is only as strong as the data it can reliably access. In most organizations, knowledge is scattered across CRMs, ticketing tools, wikis, and data lakes—each with different schemas, permissions, and freshness guarantees. Without privacy-by-design and consistent access patterns, agents hallucinate, miss context, or violate policies. This isn’t a model problem—it’s an information architecture problem.

    My approach starts with an integration-first mindset: anchor the agent to authoritative systems via CRM integration, unify retrieval across knowledge sources, and enforce role-based access at query time. I pair this with data contracts, lineage, and content freshness SLAs so the agent never acts on stale or restricted information. A unified analytics platform and strong data governance let me monitor coverage, drift, and security posture as the knowledge footprint grows.

    Hurdle 2: reliability, observability, and AI risk management. Even well-fed agents can behave unpredictably without tight control loops. Teams often lack Agent Analytics, standardized evals, and guardrails to catch prompt injection, tool abuse, or subtle regressions. The result is fragile behavior that erodes trust with IT, security, and front-line operators.

    I build a reliability stack that looks a lot like SRE for agentic AI: scenario-based evaluations before release, production tracing of every step and tool call, red-teaming for threat detection and response, and policy enforcement at runtime. Hallucination mitigation, input validation, and fallbacks (including human-in-the-loop) are non-negotiable. We track latency, cost, accuracy, and safety incidents in one Agent Analytics view so we can ship confidently and iterate quickly.

    Hurdle 3: workflow integration and organizational adoption. The best agent can still fail if it can’t take action in real systems or if change management is an afterthought. Agents must fit the way people actually work—permission models, SLAs, audit trails, and existing approval paths—instead of creating shadow processes that confuse teams.

    I integrate agents directly into systems of record and daily tools—ticketing, CRM, knowledge bases—so outcomes are auditable and reversible. I define clear RACI, rollout guardrails, and metrics in product roadmapping and sprint planning (e.g., first-contact resolution, time-to-resolution, deflection, cost per task). We ship narrowly scoped capabilities first, pair them with in-app guides and product tours, and expand privileges as confidence and KPIs improve. This is product management leadership, not just prompt engineering.

    In practice, the pattern is consistent. For customer support, we anchored the agent to the CRM, knowledge base, and incident runbooks with strict access controls, then layered policy checks for regulated data. With unified analytics, we measured precision/recall of suggested actions, tracked cost and latency, and flagged risky prompts. The result: higher accuracy, cleaner handoffs, and faster time-to-value without sacrificing compliance.

    If your agents aren’t delivering, start here: fix the data plane, instrument the control plane, and design for real workflows. Do this well and you’ll move beyond flashy demos to durable productivity gains and competitive differentiation—while keeping security, governance, and stakeholders on your side.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image