Tag: AI risk management

  • Stop Falling for Hollywood Demos: The Unfiltered Truth of Live AI Voice for Support

    Stop Falling for Hollywood Demos: The Unfiltered Truth of Live AI Voice for Support

    I’ve sat through countless AI demos, and I’ve learned there are really two kinds: the “Hollywood demo,” which is polished to perfection, and the “real-world demo,” which shows the product raw—imperfections and all. The former dazzles, but the latter is where you discover what’s actually ready for prime time.

    Hollywood demos look great, but sometimes need a closer look to make sure what you see is what you’ll get. When I’m evaluating an AI Agent for customer service, I always look past the polish. I’m assessing how well it will handle real-world scenarios—the messy, complex conversations your team deals with every day. That’s especially true on voice, the toughest channel to get right.

    Voice is one of the toughest tests of any AI system. It’s not just “chat with speech.” An AI Agent needs to be able to listen, respond, and adapt in real time. Timing, tone, and turn-taking are all part of the product, they shape the experience as much as accuracy or reasoning.

    An edited video might sound seamless, but it can’t show how a system behaves in a real support environment—like when a conversation takes an unexpected turn or when it pauses briefly to reason or retrieve data. Those small moments—latency, clarifications, interruptions—are when you see what the AI Agent is really capable of. A real-world demo lets you see and hear how the system actually behaves under real conditions, not in a controlled environment that’s been smoothed out with editing.

    That’s why the live Fin Voice demo at Pioneer stood out. The team called Fin live on stage to show the real thing (with real latency and interruptions) so people could understand the product they’d be deploying to their own customers. As a product leader, I appreciate that level of transparency because it mirrors how customers will experience the system in production.

    When Paul Adams, Chief Product Officer, demoed Fin Voice at Pioneer, the goal was to show the product exactly as customers experience it. In 90 seconds, Fin verified his identity, retrieved account data, managed an interruption, offered options, completed the workflow, and sent a follow-up email. That’s the kind of end-to-end outcome I look for—fast verification, accurate retrieval, natural pacing, and a closed loop.

    Latency. You could hear brief pauses while Fin fetched subscription details and checked backend systems. That wasn’t lag—it was work happening in real time. In voice AI, thoughtful latency that signals reasoning is far better than synthetic speed that collapses under real load.

    Natural conversation flow. Fin detected when Paul finished speaking, handled interruptions gracefully, and replied in short, human-like turns. That turn-taking behavior is essential for trust and comprehension in voice customer support.

    Awareness and tone. Subtle changes in pacing when Paul laughed or hesitated showed sensitivity to context. Tone control is not a “nice to have” in voice—it’s a core UX capability.

    Unscripted conversation design. No rigid IVR menus or fixed paths. Paul spoke naturally, and Fin adapted to resolve his query. That adaptability is what differentiates a true AI Agent from a glorified decision tree.

    Those details are the real test. A voice AI Agent that performs well in a live demo is one that will perform well for you and your customers too.

    Voice has been one of the most demanding, and rewarding, areas of development for Fin. Since launch, we’ve been expanding what it can do so support leaders can customize how Fin sounds, behaves, and aligns with their brand.

    Voice and tone customization: Choose from multiple natural voices, set greetings, and fine-tune how Fin communicates with customers.

    Escalation and conversational guidance: Teach Fin to use your terminology, ask clarifying follow-ups, and escalate when needed.

    Deployment controls: Manage rollouts, test safely in internal environments, and fine-tune before going live.

    Flexible integrations: Connect to any telephony system via call forwarding, and link Fin Voice to backend systems or APIs to take action.

    Multilingual capability: Fin Voice now supports 28 languages natively.

    Alongside these features, we’ve made big improvements to Fin’s answer quality—the foundation of a great voice experience. When people call, they’re looking for accurate, immediate answers they can trust.

    So we’ve focused on three key areas: low latency, which is down roughly 30–40% since launch; clarification flow, so Fin asks smart follow-up questions to reduce back and forth and improve resolution rates; and voice-specific answer structure, so Fin delivers information in shorter sentences with pacing designed for listening.

    Together, these improvements mean customers get the highest-quality answers as quickly as possible, resulting in more resolutions and better experiences.

    Running a live demo always carries risk because things can go wrong. But that’s also why it matters—because that’s how customers experience it too. Support leaders stake their reputation on the systems they choose, so the only way to understand what you’re putting in front of your customers is to see it under real conditions.

    When you see Fin in a demo, you’re seeing the same system that runs in production. Real-world demos take more effort and don’t always go perfectly, but they show what’s real—and that’s exactly what you need to evaluate before you deploy voice AI at scale.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • How Incident.io’s AI SRE Diagnoses, Hypothesizes, and Fixes Outages in Slack at Record Speed

    How Incident.io’s AI SRE Diagnoses, Hypothesizes, and Fixes Outages in Slack at Record Speed

    When your site goes down, every second counts. I’ve lived that reality across multiple product lines, and the difference between a five-minute blip and a two-hour outage is felt by customers, engineers, and the business. That’s why I’ve been closely following how Incident.io has evolved from coordination during chaos to intelligent, proactive response.

    Now, they’re building something new: an AI SRE that can actually help diagnose and respond to incidents. As someone who thinks deeply about reliability, velocity, and customer trust, that promise hits the intersection of AI Strategy, product management leadership, and operational excellence.

    I recently spent time with Lawrence Jones, Founding Engineer at Incident.io and Ed Dean Product Lead for AI at Incident.io, digging into how their team is teaching AI to think like a site reliability engineer. They shared how they went from simple prototypes that summarized incidents to a multi-agent system that forms hypotheses, tests them, and even drafts fixes—all from within Slack.

    Here’s what stood out to me first: AI’s biggest impact comes from compressing time—identifying causes minutes instead of hours. In practice, that means fewer cycles lost to paging the wrong on-call, clearer paths to root cause, and faster recovery—without cutting humans out of the decision loop.

    Equally important is deciding where automation belongs. The team’s approach aligns with how I evaluate high-risk workflows: Identify which parts of debugging can safely be automated. Combine retrieval, tagging, and re-ranking to find relevant context fast. Use post-incident “time travel” evals to measure how well their AI performed. Balance human trust and AI confidence inside high-stakes workflows. The human remains accountable; the AI accelerates context, options, and execution.

    On the technical side, the retrieval choices were refreshingly pragmatic. Retrieval-augmented reasoning still benefits from simplicity: deterministic tagging and re-ranking often beat complex vector setups. I’ve seen the same in production: start with crisp, deterministic signals, then layer embeddings where they truly add value. This keeps systems debuggable and stable as you scale.

    The interface choices matter just as much as the models. “Slack as the interface for human-AI collaboration” puts the agent where incidents already live, reducing friction and increasing adoption. Under the hood, they’ve been pragmatic with “PGVector and Postgres for retrieval experiments”, using “RAG (Retrieval-Augmented Generation)” and “Multi-agent orchestration” to chain context gathering, hypothesis formation, and action proposals. The north star is compelling: “AI as your company’s immune system”.

    What impressed me operationally was the rigor around evaluation. Post-incident “time travel” evals let teams score AI accuracy after they know what really happened. That’s the standard we should all adopt: test the agent against reality, not just synthetic prompts, and feed those learnings back into prompts, tools, and guardrails.

    Trust is the currency in incidents, so the product surface must reflect uncertainty with care. Building trust in AI isn’t just about precision—it’s about showing reasoning and uncertainty in ways humans understand. In other words, show the chain of thought as a structured artifact (signals considered, hypotheses rejected, evidence gathered), expose confidence bands, and always make it easy for humans to override or guide.

    From a workflow standpoint, the investigation loop mirrors seasoned SRE practice: fast scoping, parallel checks and data sources, building hypotheses and refining findings, then proposing remediations paired with the context that justifies them. Human-agent collaboration here is not a handoff—it’s a tight copilot loop where the agent gathers, tests, and drafts, and the human confirms, prioritizes, and executes.

    For platform and security leaders, this approach blends speed with safety. Clear permissions, auditable actions, blast-radius constraints, and CI/CD integration keep the AI inside defined guardrails while still delivering material acceleration. The payoff is higher deployment frequency without compromising reliability—because detection, triage, and rollback become faster and more repeatable.

    My takeaway as a product leader: this is a blueprint for agentic AI in mission-critical workflows. Start in the tools users live in (Slack), nail retrieval with deterministic foundations, model the expert’s playbook (not just their summaries), and make evaluation a first-class part of the product. Do that well, and the AI goes from assistant to teammate—conservative when it should be, bold when the evidence supports it, and always legible to the humans in the loop.

    The momentum around Incident.io’s AI SRE suggests where we’re headed next: deeper integrations, broader coverage across service catalogs, and richer automations that remain transparent and controllable. For teams investing in reliability, this is the moment to operationalize agentic AI—measured, auditable, and designed for trust—so you can move faster when it matters most.


    Inspired by this post on Product Talk.


    Book a consult png image
  • AI at Home, Impact at Work: Experiments That Supercharged My Product Leadership

    AI at Home, Impact at Work: Experiments That Supercharged My Product Leadership

    I recently tuned into an insightful All Things Product episode featuring Teresa Torres and Petra Wille on how experimenting with AI in everyday life sharpens how we build AI-powered products at work. The core premise resonated deeply with my AI Strategy: low-stakes, personal experiments accelerate confidence, clarify limitations, and build an AI product toolbox we can bring into the office with rigor.

    If you want to dive in, you can listen on Spotify or Apple Podcasts. I found the conversation especially relevant for product trios and anyone shaping LLMs for product managers in high-stakes environments.

    The idea is simple but powerful: when I prototype with AI at home—where the stakes are low—I learn faster, make safer mistakes, and internalize critical product patterns. Over time, those patterns transfer directly to work: tighter context management, sharper bias awareness, clearer human-in-the-loop guardrails, and a more nuanced view of when to use AI as a thought partner versus when to consider agentic AI.

    In my own practice, I’ve mirrored many of the scenarios discussed: using ChatGPT by OpenAI to plan meals, analyze public data sets like school budgets, and even sanity-check real estate evaluations. These seemingly mundane tasks are fertile ground for learning about context window limits, hallucination (artificial intelligence), AI bias, and privacy-by-design trade-offs. Each experiment helps me craft better prompts, structure data for clarity, and decide when a human review step is non-negotiable—core habits for AI risk management.

    At work, I treat AI as a thought partner for writing, research synthesis, and contract review. I also explore when and how to responsibly evolve toward agentic AI for repeatable workflows. The distinction matters: a thought partner augments judgment; an agent automates execution. Building the right scaffolding—data governance, auditability, constraints, and escalation paths—ensures we unlock speed without compromising safety.

    Three lines from the episode stayed with me: “I’m trying to write things that only I can write — that’s my guiding writing light right now.” — Teresa. “The more we use AI, the more we learn what it’s good at, what it’s not good at, and where context becomes a limitation.” — Teresa. “It’s a safer playground — we can build our toolbox at home before bringing those lessons to work.” — Petra. These are practical north stars for product management leadership in the GenAI era.

    For anyone getting started, here’s what worked for me: begin with “low-stakes” personal experiments, write down your prompts and outcomes, and reflect on failure modes. Treat each activity as product discovery: What problem am I solving? What outcome matters? What data and context does the model need? Which decisions must stay human-in-the-loop? This discipline builds an AI product toolbox you can confidently apply to real customer problems.

    I also keep a running toolkit of references and tools that inform my practice: Context window as a concept helps me size and sequence information. Visual and video tools like Midjourney and Sora expand how I think about multimodal experiences. I rotate between Claude by Anthropic and ChatGPT by OpenAI depending on task fit, and I’ve used Claude Code when I need structured assistance with code review. For knowledge capture and workflow, Readwise and Ghost help me structure insights and ship content.

    If you want more structured learning paths, I found Josh Seiden’s Learn AI With Me, A 30-Day Sprint to be a practical primer, and the broader community conversation at Product at Heart Conference is invaluable. For a deeper grounding in risk, I recommend reviewing topics like Hallucination (artificial intelligence), AI bias, and Agentic AI—and revisiting the complementary episode, Context is King.

    I’d love to hear how you’re experimenting: Where have you seen AI meaningfully reduce toil? Where does it still struggle? How are you balancing creativity, data safety, and compliance as you scale? Drop a comment below and let’s compare notes—especially on patterns that help product trios move faster without sacrificing trust.

    Bottom line: start small at home, carry lessons into the office, and build with curiosity and intentionality. That’s how we level up our product discovery, sharpen our value proposition, and lead teams confidently through the GenAI transition.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Mastering AI Evals: The Essential Product Manager Skill to Ship Safer, Smarter AI

    Mastering AI Evals: The Essential Product Manager Skill to Ship Safer, Smarter AI

    In every AI-powered product I ship, evaluation is the difference between a compelling demo and a dependable customer experience. AI evaluation isn’t a nice-to-have; it’s a core product management competency that shapes quality, safety, and business outcomes from the first prototype to scale.

    When I talk about AI evaluation, I mean a disciplined, repeatable way to measure model behavior across quality, safety, reliability, latency, and cost. Gen AI has changed the cadence of product decisions—models evolve weekly, prompts drift under real-world load, and edge cases multiply. Without rigorous evals, we risk shipping unpredictability.

    My goal in this piece is simple: “Dive deep into AI evals, why they matter for PMs today, and how to master them with clear steps, examples, and best practices.” If you’re leading product strategy for LLMs, agentic AI, or applied AI features, this is the playbook I rely on.

    Why this matters now: customers don’t judge AI by benchmarks, they judge by trust—did it help me, was it safe, was it fast? Strong AI evals let me set outcomes vs output OKRs, quantify risk, and make transparent trade-offs between accuracy, latency, and cost. They also give engineering and design clear guardrails to move fast without breaking user trust.

    Step 1: Define the product problem and success metrics. I start by tying AI metrics to business outcomes—resolution rate, deflection rate, revenue lift, time-to-value—and include model-centric measures like hallucination rate, harmful content rate, latency, and token cost. This keeps experiments anchored to impact, not just model scores.

    Step 2: Build a high-signal golden dataset. I curate real, anonymized user prompts from discovery and support channels, then add adversarial and long-tail cases. For generative tasks, I create rubric-based criteria for correctness, helpfulness, tone, and safety. This dataset becomes my regression suite as prompts, RAG pipelines, or models change.

    Step 3: Choose the right evaluation methods. I combine deterministic unit tests for rules with LLM-as-judge scoring, pairwise preference tests for prompt variants, human review for critical flows, and red teaming for safety. I also apply privacy-by-design and strong data governance to ensure eval data handling meets compliance and customer expectations.

    Step 4: Operationalize with CI/CD. Evals run automatically on every prompt, retrieval, or model update, with pass/fail gates and alerting. I track results in a unified analytics platform so product, engineering, and go-to-market teams see the same truth. If a change regresses key thresholds, we pause rollout or roll back.

    Step 5: Optimize the cost–quality–latency triangle. Real products live within constraints. I analyze token budgets, caching strategies, model selection (e.g., small for classification, larger for complex generation), prompt structure, retrieval quality, and function-calling patterns. For agentic AI, I evaluate tool-use correctness and task completion reliability, not just text quality.

    Step 6: Close the loop with experimentation. Offline evals get me confidence; online A/B testing validates business impact. I design tests with a clear minimum detectable effect (MDE), guard for novelty bias, and instrument activation, retention, and satisfaction in Amplitude or Pendo. Agent analytics help me pinpoint where users succeed or get stuck.

    Step 7: Govern responsibly. I maintain model cards, decision logs, and incident playbooks. For customer-facing assistants, I gate risky actions, log explanations, and add human-in-the-loop escalation. AI risk management isn’t bureaucracy—it’s how we earn trust at scale.

    A concrete example: building a customer support assistant. My success metrics include deflection rate, first-contact resolution, median response latency, and safe action rate. The golden dataset blends common queries, billing edge cases, account-specific retrieval checks, and adversarial prompts. Evals measure factuality against a knowledge base, tone alignment with brand guidelines, and safe tool use for CRM integration. Only after passing offline gates do we A/B test deflection and CSAT in production.

    Common pitfalls I watch for: overfitting prompts to a tiny test set, relying solely on LLM-as-judge without human calibration, skipping safety tests when latency rises, and treating evaluations as a one-time launch task. The antidote is simple—regularly refresh datasets, diversify eval methods, and wire evals into the same release discipline as any core feature.

    The payoff is compounding. With strong AI evals, we ship confidently, reduce incident rates, accelerate iteration, and communicate trade-offs clearly to stakeholders. More importantly, we build products customers trust—because quality isn’t a promise, it’s a practice we can measure every day.


    Inspired by this post on Product School.


    Book a consult png image
  • AI vs. Product Managers by 2035: What Will Change—and How to Future‑Proof Your Career

    AI vs. Product Managers by 2035: What Will Change—and How to Future‑Proof Your Career

    Will AI replace product managers, or simply transform their role? Discover what AI can and cannot do, plus insights from PMs on the future of work.

    I’m asked this question in nearly every leadership meeting now, and my answer is consistent: AI won’t replace great product managers by 2035—but it will radically reshape how we operate. The PMs who thrive will pair sharp product judgment with an intentional AI Strategy and a practical AI product toolbox, unlocking speed, clarity, and scale without sacrificing vision.

    Here’s what AI already does well for us today. With LLMs for product managers, I can synthesize customer feedback at scale, draft PRDs and acceptance criteria, transform notes into user stories, and even auto-generate experiment plans with a minimum detectable effect (MDE) calculation. When I connect these models to Amplitude analytics, Pendo, Intercom, and HubSpot through a unified analytics platform and CRM integration, I accelerate discovery, prioritize confidently, and tighten the loop between signal and action. CustomGPT workflows now handle routine backlog grooming, competitive landscaping, and early concept testing, freeing my team to focus on higher-order decisions.

    By 2035, I expect agentic AI to operate as an execution co-pilot: autonomously scheduling A/B testing, launching targeted in-app guides and product tours, monitoring user activation and onboarding funnels, and raising anomalies via Agent Analytics long before a dashboard review. These systems will propose playbooks, draft UX writing and tooltip design, and recommend next-best actions—then wait for human approval when stakes are high. Think of it as the ultimate forward deployed engineer for operational work, working within clear guardrails.

    What AI cannot do—and is unlikely to master soon—is the essence of product leadership. It won’t craft a resonant value proposition for a new segment, define points of parity vs. competitive differentiation, or set outcomes vs output OKRs that align messy stakeholder incentives. It won’t navigate board management, reconcile conflicting narratives from sales and engineering, or make ethically grounded trade-offs under uncertainty. That’s where privacy-by-design, data governance, and AI risk management converge with human judgment, context, and accountability.

    As the tooling matures, the PM role will tilt from artifact production to decision quality. We’ll spend less time writing and more time deciding: which bets to place, which risks to accept, and where to concentrate our empowered product teams. Product discovery deepens, product positioning sharpens, and product roadmapping and sprint planning become faster and more adaptable—because the busywork is handled, not because the thinking is outsourced.

    Practically, I’m evolving team design and rituals now. We operate as product trios, pair PMs with forward deployed engineers, and embed gen ai into daily workflows. We standardize prompts, set review thresholds, and instrument everything for observability. Our stakeholder management improves because we bring clearer narrative artifacts—and because we can test assumptions earlier and share evidence in real time.

    If you’re building your own AI Strategy, start with three tracks. First, foundations: instrument data pipelines, establish data governance, and codify privacy-by-design. Second, acceleration: deploy CustomGPT workflows for research synthesis, PRD drafting, retention analysis, and experiment design, while keeping humans in the loop for decisions. Third, automation with guardrails: let agentic AI run low-risk playbooks (in-app guides, content suggestions, ops checks) and require human approval for anything customer-facing and irreversible.

    Future-proofing your career is about skill stacking. Double down on first principles decision making, storytelling, and cross-functional influence, and pair that with hands-on fluency in gen ai, prompt engineering, model evaluation, and risk controls. Learn how to frame trade-offs, architect outcomes vs output OKRs, and translate strategy into experiments that AI can help execute. The combination—human judgment plus machine speed—is the new competitive advantage.

    So, will AI replace product managers by 2035? No. It will transform average PMs into good ones and great PMs into force multipliers. The ones who lead will embrace AI as leverage, cultivate empowered product teams, and stay relentlessly focused on customer outcomes. The future belongs to product creators who can wield intelligent tools without surrendering accountability for the product’s direction and impact.


    Inspired by this post on Product School.


    Book a consult png image
  • What I Learned from Trainline’s Agentic AI: Building a Trusted Travel Assistant at Scale

    What I Learned from Trainline’s Agentic AI: Building a Trusted Travel Assistant at Scale

    Over the past year, I’ve been shipping agentic AI into production and coaching product teams on what it really takes to make these systems trustworthy in the wild. One story that crystallizes the playbook comes from Trainline’s move to an agentic architecture for travel assistance—an approach that mirrors what I’ve seen work in high-stakes, real-time customer experiences.

    Trainline—the world’s leading rail and coach platform—helps millions of travelers get from point A to point B. Now, they’re using AI to make every step of the journey smoother.

    I studied how "David Eason (Principal Product Manager) Billie Bradley (Product Manager), and Matt Farrelly (Head of AI and Machine Learning)" approached the build of "Travel Assistant, an AI-powered travel companion that helps customers navigate disruptions, find real-time answers, and travel with confidence." Their work exemplifies the kind of end-to-end thinking required to move beyond demos into dependable, on-the-go assistance.

    They share how they: Identified underserved traveler needs beyond ticketing; Built a fully agentic system from day one, combining orchestration, tools, and reasoning loops; Designed layered guardrails for safety, grounding, and human handoff; Expanded from 450 to 700,000 curated pages of information for retrieval; Developed LLM-as-judge evals and a custom user context simulator to measure quality in real-time; Balanced latency, UX, and reliability to make AI assistance feel trustworthy on the go.

    I align strongly with their core takeaways: "AI assistants need both scalable reasoning and deep domain context to be useful." "Tool design and guardrails are as critical as prompt design in agent systems." "LLM-as-judge evals make it possible to measure open-ended systems without massive labeling costs." And perhaps most importantly, "Even legacy companies can move fast when they embrace experimentation and tight PM–engineering collaboration."

    From an AI strategy perspective, starting "fully agentic" was the right call. When the problem space is dynamic—disruptions, route changes, fare conditions—reasoning loops and orchestration aren’t luxuries; they’re table stakes. Tool selection becomes product design: you need the right retrieval interfaces, constraint-aware planners, and API contracts that are resilient to partial failures. Layered guardrails for safety, grounding, and human handoff reduce hallucination risk while preserving responsiveness—critical when users are standing on a platform waiting for an answer.

    The retrieval scale-up—"Expanded from 450 to 700,000 curated pages of information for retrieval"—is a classic inflection point. I’ve seen teams stall here when they treat content growth as a pure indexing problem. The winning move is curation and structure: normalize sources, encode policy-level constraints, and align retrieval chunks to decision boundaries the agent actually uses. That’s how you keep precision high while coverage explodes.

    Evaluation is where most open-ended assistants fail quietly, which is why I was encouraged to see "Developed LLM-as-judge evals and a custom user context simulator to measure quality in real-time." In practice, LLM-as-judge gives you scalable, scenario-based scoring without prohibitive labeling, while a user context simulator surfaces regressions tied to persona, itinerary state, and device constraints. The combination closes the loop between model behavior, tool layer changes, and UX outcomes.

    On product delivery, the decision to have the system "Balanced latency, UX, and reliability to make AI assistance feel trustworthy on the go" shows mature prioritization. For travel, trust accrues in seconds: fast-enough responses, graceful degradation when upstream data lags, and explicit handoff when confidence dips. This is where guardrails meet UX writing—clear, bounded language signals competence even when the system defers.

    Finally, the organizational pattern matters. The teams that win in agentic AI are cross-functional, experimentation-driven, and ruthless about instrumentation. Tight PM–engineering collaboration, explicit safety thresholds, and an eval stack that mirrors real user journeys are what turn promising architectures into dependable products.

    It’s a behind-the-scenes look at how an established company is embracing new AI architectures to serve customers at scale.

    If you’re building agentic AI in production, borrow these moves: invest early in tool and guardrail design, scale retrieval with curation not just volume, adopt LLM-as-judge plus context simulation for continuous evaluation, and treat latency and reliability as core product requirements—not afterthoughts. That’s how you ship AI assistance that customers trust when it matters most.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Beyond Digital: How AI Transformation Builds Adaptive, Intelligent Organizations That Win

    Beyond Digital: How AI Transformation Builds Adaptive, Intelligent Organizations That Win

    Digital transformation rewired our systems; AI transformation rewires how we learn, decide, and compete. “AI transformation goes beyond automation to create adaptive, intelligent organizations. Discover why it’s the next imperative and how to measure success.” That statement captures what I experience daily: we’re moving from scripted workflows to living systems that improve with every interaction.

    When I talk about AI transformation, I’m not describing a tool rollout. I’m describing an operating model where data, models, and product strategy converge to create compounding advantage. In practice, that means agentic AI orchestrating tasks, robust data governance and privacy-by-design from day one, and empowered product teams that ship, measure, and iterate at high tempo.

    The imperative is strategic, not merely technical. Markets are compressing cycle times, and customers now expect intelligent experiences by default. Organizations that master AI Strategy and product-led growth will set the pace—using AI for competitive differentiation rather than feature parity.

    This shift changes how I build teams and backlogs. I lean on product trios, forward deployed engineers, and tight product discovery loops to reduce uncertainty early. We design for resilience and learning: human-in-the-loop feedback, clear escalation paths, and telemetry that turns every interaction into a hypothesis test.

    Governance is a first-class feature. AI risk management, data governance, and threat detection and response sit alongside performance metrics in the same dashboard. We codify guardrails—policy, provenance, and permissions—so innovation scales safely and sustainably.

    Measurement is where transformation becomes real. I anchor on outcomes vs output OKRs tied to customer value and revenue impact. At the product layer, I track activation, time-to-value, retention, and adoption by persona. For ML quality, I monitor precision/recall, coverage, hallucination rate, and model drift. In experimentation, A/B testing with a thoughtful minimum detectable effect (MDE) prevents false wins, while Amplitude analytics, Pendo, and Intercom instrumentation expose where guidance or UX writing can unlock activation.

    The fastest wins often start in service and sales. A customer support ai strategy can deflect tickets with high-resolution answers while escalating edge cases to humans with full context. CRM integration with HubSpot and a ChatGPT connector enables reps to generate next-best-actions, summarize calls, and personalize outreach—measurably lifting conversion and lowering cost-to-serve.

    On the build side, LLMs for product managers and gen ai for product prototyping accelerate discovery cycles. I use CustomGPT workflows to validate value propositions quickly, then harden successful flows with engineering. Throughout, product positioning and a crisp value proposition ensure that what we ship is understandable, differentiated, and priced to match ROI—consumption SaaS pricing when usage scales value.

    If you’re getting started, begin with a single, high-frequency journey, instrument it deeply, and publish transparent OKRs. Pair empowered product teams with clear governance, and iterate toward agentic AI experiences. The payoff isn’t a one-time launch; it’s a continuously learning system—and a culture—that compounds advantage release after release.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image
  • 3 Hidden Hurdles Blocking Effective AI Agents—and How I Turn Them into Business Wins

    3 Hidden Hurdles Blocking Effective AI Agents—and How I Turn Them into Business Wins

    AI agents promise leverage at scale, yet too many proofs of concept stall before they create measurable value. Over the past several launches, I’ve seen the same patterns repeat across IT and operations. The mandate is clear: “Discover three key challenges IT and ops teams face when building and managing AI agents that drive real business wins.” Here’s how I frame the work, where teams get stuck, and the playbook I use to move from demo to durable outcomes.

    Hurdle 1: fragmented data and weak data governance. Agentic AI is only as strong as the data it can reliably access. In most organizations, knowledge is scattered across CRMs, ticketing tools, wikis, and data lakes—each with different schemas, permissions, and freshness guarantees. Without privacy-by-design and consistent access patterns, agents hallucinate, miss context, or violate policies. This isn’t a model problem—it’s an information architecture problem.

    My approach starts with an integration-first mindset: anchor the agent to authoritative systems via CRM integration, unify retrieval across knowledge sources, and enforce role-based access at query time. I pair this with data contracts, lineage, and content freshness SLAs so the agent never acts on stale or restricted information. A unified analytics platform and strong data governance let me monitor coverage, drift, and security posture as the knowledge footprint grows.

    Hurdle 2: reliability, observability, and AI risk management. Even well-fed agents can behave unpredictably without tight control loops. Teams often lack Agent Analytics, standardized evals, and guardrails to catch prompt injection, tool abuse, or subtle regressions. The result is fragile behavior that erodes trust with IT, security, and front-line operators.

    I build a reliability stack that looks a lot like SRE for agentic AI: scenario-based evaluations before release, production tracing of every step and tool call, red-teaming for threat detection and response, and policy enforcement at runtime. Hallucination mitigation, input validation, and fallbacks (including human-in-the-loop) are non-negotiable. We track latency, cost, accuracy, and safety incidents in one Agent Analytics view so we can ship confidently and iterate quickly.

    Hurdle 3: workflow integration and organizational adoption. The best agent can still fail if it can’t take action in real systems or if change management is an afterthought. Agents must fit the way people actually work—permission models, SLAs, audit trails, and existing approval paths—instead of creating shadow processes that confuse teams.

    I integrate agents directly into systems of record and daily tools—ticketing, CRM, knowledge bases—so outcomes are auditable and reversible. I define clear RACI, rollout guardrails, and metrics in product roadmapping and sprint planning (e.g., first-contact resolution, time-to-resolution, deflection, cost per task). We ship narrowly scoped capabilities first, pair them with in-app guides and product tours, and expand privileges as confidence and KPIs improve. This is product management leadership, not just prompt engineering.

    In practice, the pattern is consistent. For customer support, we anchored the agent to the CRM, knowledge base, and incident runbooks with strict access controls, then layered policy checks for regulated data. With unified analytics, we measured precision/recall of suggested actions, tracked cost and latency, and flagged risky prompts. The result: higher accuracy, cleaner handoffs, and faster time-to-value without sacrificing compliance.

    If your agents aren’t delivering, start here: fix the data plane, instrument the control plane, and design for real workflows. Do this well and you’ll move beyond flashy demos to durable productivity gains and competitive differentiation—while keeping security, governance, and stakeholders on your side.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image
  • Turning Community Noise into Action: My Product Lessons from Zencity’s AI That Listens

    Turning Community Noise into Action: My Product Lessons from Zencity’s AI That Listens

    I’m constantly looking for ways to turn messy, multi-source signals into decisions leaders can trust. Recently, I dug into how Zencity powers government decision-making with community voices—and it’s a masterclass in building AI products that are both responsible and useful.

    Noa Reikhav, Head of Product, Zencity; Andrew Therriault, VP of Data Science, Zencity; and Shota Papiashvili, SVP of R&D, Zencity share a comprehensive view of how they designed an AI that listens and acts without sacrificing rigor.

    How do you use AI to help city leaders truly hear their residents?

    I was struck by the clarity of their platform vision—“They share how Zencity brings together survey data, 311 calls, social media, and local news into a unified platform that helps cities understand what people care about—and act on it.” That single line captures the essence of a unified analytics platform done right.

    You’ll hear how the team built their AI assistant and workflow engine by being thoughtful about their data layers, how they combined deterministic systems with LLM-driven synthesis, and how they keep accuracy and trust at the core of every AI decision.

    It’s a fascinating look at how modern AI infrastructure can turn noisy, messy civic data into clear, actionable insight.

    Here are the takeaways that resonated with me most, and they align closely with how I approach AI Strategy and product management leadership. Data architecture defines what AI can do. Guardrails and transparency matter more than flashy outputs. Agentic systems become powerful when grounded in real, multi-tenant data. AI in the public sector can make democracy more responsive—if built responsibly.

    The team’s layered data model is the backbone that enables trustworthy synthesis: raw data → elements → highlights → insights → briefs. As a product leader, I love how each layer introduces meaning and structure while preserving traceability. It’s the difference between a demo-friendly prototype and a durable platform.

    Why context is everything when building AI for civic use. That’s not a platitude—it’s a requirement. Community conversations are hyper-local, emotionally charged, and policy-laden. Without context and rigorous data governance, you risk misclassification, bias, and broken trust.

    How the team designed their AI assistant using MCP servers to safely negotiate data access. This is a smart pattern for privacy-by-design: let the assistant request access, let the system adjudicate, and make the boundary explicit and auditable. In multi-tenant environments, that clarity is the difference between scaling confidently and shipping risk.

    Balancing agentic flexibility with deterministic trust. I’ve found this to be the most practical framing for real-world agentic AI: give the system room to explore, but bind its outputs to deterministic rails where it matters—taxonomy, citations, permissions, and evaluation criteria.

    Evaluating accuracy when latency matters: how they think about evals, citations, and model-as-judge systems. I appreciate the pragmatism here. In production, you don’t have the luxury of slow truth-finding. You need tight feedback loops, interpretable citations, and layered evals to keep both precision and speed.

    Using workflows like annual budgeting or crisis communication to deliver AI-generated briefs to the right people at the right time. This is where product-market fit shows up: not in features, but in end-to-end workflows aligned to real decision cycles and stakeholders.

    Why government workflows are the ultimate “jobs to be done” framework. When the job is a public process—with deadlines, accountability, and high scrutiny—you don’t just need insights; you need timely, contextualized briefs that match the cadence of the work.

    From my lens, the magic isn’t any single model. It’s the orchestration: deterministic systems with LLM-driven synthesis, strong guardrails, transparent citations, and an orchestration layer that routes the right brief to the right role at the right moment. That’s how you turn community noise into legitimate signal—and signal into action.

    If you’re building AI for regulated, high-stakes environments, take note: invest in your data layers, make context a first-class citizen, embrace privacy-by-design with clear access negotiation, and treat evaluation as a living system. Do that, and you’ll earn the trust that makes your AI assistant—and your organization—indispensable.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Urgent Alert: Spot Fraudulent Job Offers Impersonating Pendo—and Protect Your Career

    Urgent Alert: Spot Fraudulent Job Offers Impersonating Pendo—and Protect Your Career

    In my role leading product management, I take brand trust and cybersecurity seriously—especially when it affects people’s livelihoods. Over the past few weeks, I’ve seen a troubling uptick in brand impersonation and social engineering targeting candidates. It’s a reminder that protecting our community isn’t just a technical problem; it’s a product management leadership and stakeholder management responsibility.

    We want to warn you about recent instances of fraudulent job offers purporting to be from Pendo and/or its affiliate companies.

    If you receive an unexpected outreach claiming to be from Pendo with a fast-track offer, requests for payment, or a push to move conversations to informal channels, treat it as a red flag. Scammers often spoof logos, clone profiles, and use vague role descriptions to create urgency. Their goal is to extract personal data, money, or access—classic social engineering tactics that undermine data governance and privacy-by-design principles.

    Here’s how I advise candidates to protect themselves while keeping their job search momentum. Validate every opportunity through the company’s official careers page and confirm the recruiter’s identity through corporate channels. Check that email addresses and domains match publicly listed corporate information, and be wary of communication conducted exclusively through messaging apps. Never pay fees, buy equipment up front, or share sensitive data like Social Security numbers or banking information before a formal, verified offer is in place.

    If something feels off, pause and verify. Contact the company via the channels listed on its website, ask for a video meeting with the recruiter using an official corporate account, and request written details on the role and interview process. If it’s fraudulent, report it to the company, the platform where the outreach occurred, and—when appropriate—local authorities. Acting quickly helps with threat detection and response and protects other candidates from harm.

    From a product and security perspective, this is a cross-functional issue that benefits from AI risk management discipline. Strong signals include clear public guidance on recruiting practices, a dedicated reporting mailbox for suspected scams, and hardened email authentication (SPF, DKIM, DMARC). Pair these with privacy-by-design reviews for hiring workflows, recruiter verification checklists, and ongoing education for talent teams. These measures reduce attack surface while reinforcing brand integrity.

    If you believe you’ve shared information with a fraudulent recruiter, take immediate steps: change any reused passwords, enable two-factor authentication, place fraud alerts or freezes with credit bureaus as appropriate, and monitor accounts for suspicious activity. Document all communications; they can help security teams and platforms act faster.

    Recruitment fraud is emotionally taxing and can erode confidence in the process. Don’t let scammers slow your momentum. Stay vigilant, verify before you trust, and share this warning so others can avoid similar traps. If you’re ever unsure about a message that appears to come from Pendo, pause, validate through official channels, and prioritize your safety first.


    Inspired by this post on Pendo – Best Practices.


    Book a consult png image
  • Build the Cake, Then the Frosting: 3 Elements of a High‑Performing AI Strategy That Wins

    Build the Cake, Then the Frosting: 3 Elements of a High‑Performing AI Strategy That Wins

    Over the past few years leading product at HighLevel, I’ve watched too many teams rush to demo flashy agents before they’ve built a reliable foundation. The metaphor I use in every AI roadmap review still hits home: “Think of AI readiness as a three-layer cake. Most companies are trying to build the fancy frosting (the agent interface) without bothering to bake the actual cake underneath.” If we want durable impact, we have to bake first, frost later.

    When I design an AI Strategy, I anchor on three elements that map directly to that cake: a data and instrumentation foundation, a governance and risk layer, and finally the agent experience itself. This sequence isn’t theory—it’s how we de-risk delivery, accelerate product-market fit, and create competitive differentiation without compromising trust.

    Layer 1 — Data and instrumentation: The base of the cake is clean, well-instrumented data flowing through a unified analytics platform. I start with a clear event schema, rigorous data quality checks, and tight CRM integration so we can connect outcomes to users, accounts, and journeys. Privacy-by-design is nonnegotiable: we minimize PII, define retention, and ensure consent flows are explicit. With this in place, gen ai features have the context they need—retrieval works, grounding holds, and feedback loops from production inform continuous improvement.

    On top of that, I build measurement in from day one: activation, retention, task success, latency, and satisfaction. Every AI interaction is observable. We run A/B testing with a well-defined minimum detectable effect, pair quant with qualitative review, and feed human-in-the-loop judgments back into ranking and prompt libraries. This is how we avoid “demo-ware” and deliver real, repeatable value.

    Layer 2 — Governance and risk: Before scaling, I formalize AI risk management and data governance. That includes model evaluation against safety and quality thresholds, red-teaming for jailbreaks, and threat detection and response for prompt injection and data exfiltration. We establish policy for model and provider selection, versioning, and rollback; we log prompts, responses, and decisions for auditability; and we define escalation paths when the system is unsure. These controls don’t slow us down—they create the confidence needed for faster iteration and board management alignment.

    I also align legal, security, and product early on a taxonomy of risks—bias, hallucinations, privacy, IP leakage—so we can write tests and guardrails once and reuse them across features. The result is fewer surprises in customer pilots and a far smoother path through enterprise procurement.

    Layer 3 — The agent experience: Only now do we invest in the frosting—the agent interface and workflows. Here I focus on clear jobs-to-be-done, crisp UX writing, and transparent system behavior. We design agentic AI flows that show reasoning steps when helpful, ask for clarification when confidence is low, and gracefully hand off to humans in customer support scenarios. Product tours, in-app guides, and tooltips reduce the learning curve and accelerate user activation.

    Crucially, we measure the interface, not just the model. Agent Analytics tracks intents, tool use, fallbacks, and user corrections so we can tune prompts, tools, and policies. This closes the loop from experience back to data and governance, and it directly informs product roadmapping and sprint planning. When the cake is baked this way, go-to-market becomes easier: we can prove ROI with hard numbers, fine-tune pricing, and scale adoption with product-led growth tactics.

    If your AI roadmap feels stuck, start with an honest readiness audit against these three elements. Shore up instrumentation and data pipelines, codify governance, then refine the agent interface with real user telemetry. Bake first. Frost last. That’s how we ship AI that customers trust—and keep winning after the first demo high fades.


    Inspired by this post on Pendo – Best Practices.


    Book a consult png image
  • Urgent Alert: Spot Fraudulent Job Offers Impersonating Pendo—and Protect Your Career

    Urgent Alert: Spot Fraudulent Job Offers Impersonating Pendo—and Protect Your Career

    In my role leading product management, I take brand trust and cybersecurity seriously—especially when it affects people’s livelihoods. Over the past few weeks, I’ve seen a troubling uptick in brand impersonation and social engineering targeting candidates. It’s a reminder that protecting our community isn’t just a technical problem; it’s a product management leadership and stakeholder management responsibility.

    We want to warn you about recent instances of fraudulent job offers purporting to be from Pendo and/or its affiliate companies.

    If you receive an unexpected outreach claiming to be from Pendo with a fast-track offer, requests for payment, or a push to move conversations to informal channels, treat it as a red flag. Scammers often spoof logos, clone profiles, and use vague role descriptions to create urgency. Their goal is to extract personal data, money, or access—classic social engineering tactics that undermine data governance and privacy-by-design principles.

    Here’s how I advise candidates to protect themselves while keeping their job search momentum. Validate every opportunity through the company’s official careers page and confirm the recruiter’s identity through corporate channels. Check that email addresses and domains match publicly listed corporate information, and be wary of communication conducted exclusively through messaging apps. Never pay fees, buy equipment up front, or share sensitive data like Social Security numbers or banking information before a formal, verified offer is in place.

    If something feels off, pause and verify. Contact the company via the channels listed on its website, ask for a video meeting with the recruiter using an official corporate account, and request written details on the role and interview process. If it’s fraudulent, report it to the company, the platform where the outreach occurred, and—when appropriate—local authorities. Acting quickly helps with threat detection and response and protects other candidates from harm.

    From a product and security perspective, this is a cross-functional issue that benefits from AI risk management discipline. Strong signals include clear public guidance on recruiting practices, a dedicated reporting mailbox for suspected scams, and hardened email authentication (SPF, DKIM, DMARC). Pair these with privacy-by-design reviews for hiring workflows, recruiter verification checklists, and ongoing education for talent teams. These measures reduce attack surface while reinforcing brand integrity.

    If you believe you’ve shared information with a fraudulent recruiter, take immediate steps: change any reused passwords, enable two-factor authentication, place fraud alerts or freezes with credit bureaus as appropriate, and monitor accounts for suspicious activity. Document all communications; they can help security teams and platforms act faster.

    Recruitment fraud is emotionally taxing and can erode confidence in the process. Don’t let scammers slow your momentum. Stay vigilant, verify before you trust, and share this warning so others can avoid similar traps. If you’re ever unsure about a message that appears to come from Pendo, pause, validate through official channels, and prioritize your safety first.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image