Tag: agentic AI

  • Inside ShowMe’s Playbook: Orchestrating Voice, Video & Multi‑Agent AI Sales Reps that Close

    Inside ShowMe’s Playbook: Orchestrating Voice, Video & Multi‑Agent AI Sales Reps that Close

    What happens when you treat an AI agent not as a chatbot, but as a full teammate on your sales team – one that can jump on video calls, demo your product, make phone calls, and follow up over days?

    I recently dug into this question with the team behind ShowMe, an AI-native startup building digital sales reps for inbound teams. Founded in April 2025, ShowMe has engineered a multi‑agent system that combines conversation agents for live voice and video interactions, evaluator agents that score every call for quality and sentiment, and creator agents that ingest customer documentation to build tailored playbooks. A workflow layer orchestrates the entire lead‑to‑close journey across days, not minutes—exactly the kind of agentic AI approach I expect to see become standard in revenue workflows.

    What stood out to me first was the origin story: a glaring conversion gap on a previous website, and the realization that a purpose‑built AI could fill it. The initial MVP was refreshingly pragmatic—start with a voice agent, pair it with product videos, and back it with a simple RAG knowledge base. That retrieval‑first pipeline let the team ship quickly, validate real user behavior, and then scale sophistication where it mattered.

    Then came a pivotal affordance shift: adding a realistic avatar via HeyGen. It wasn’t just eye candy; it changed how prospects engaged. The video-call UX established trust and made the AI’s capabilities legible at a glance. Prospects behaved as if they were with a human rep—interrupting, probing, and asking for demos—because the surface area invited that behavior.

    On the architecture side, the team decomposed a single sales conversation into multiple specialized sub‑agents—greeting, qualifying, pitching—to manage latency, memory constraints, and model limitations. Deterministic workflows handle the happy paths reliably, while a smart orchestrator is emerging to break out of rigid paths when context demands it. Confidence scoring and frustration detection kick in for real‑time human handoff decisions, a must for revenue‑critical moments where a missed nuance can cost pipeline.

    Training the system to sell like your team is where it gets powerful. ShowMe ingests sales transcripts and training materials to teach company‑specific sales skills, then uses creator agents to assemble tailored playbooks. Conversation agents stay focused on live interactions, while evaluator agents continuously score calls for quality and sentiment. The result: repeatable, compliant, and brand‑consistent selling—without flattening personalization.

    Quality isn’t an afterthought—it’s operationalized. Early deployments run with customer-driven evaluation loops where 100% of conversations are reviewed, tapering to about 5% over time as confidence increases. Feedback becomes automated tests to prevent prompt regression, and production quality is proven with POCs, A/B rollouts, dashboards, and CRM logging. This is eval-driven development applied to go‑to‑market: measurable, auditable, and continuously improving.

    I also appreciate how they treat the agent as a coworker, not a widget. Onboarding happens via Slack, weekly reporting aligns with sales leadership rhythms, and tight CRM integration keeps data flowing both ways. That mindset unlocks adoption because it fits how sales teams actually operate—and it creates real Agent Analytics you can manage.

    From a product perspective, several pragmatic details matter. Real‑time voice and avatar demos rely on latency tricks and a library of video clips to keep interactions snappy. The conversation agent evolved from a basic Q&A bot into guided sales discovery, balancing personalization with the ever-present risks of hallucination. Guardrails, human‑in‑the‑loop, and clearly defined handoff rules are non‑negotiables in high‑stakes sales workflows.

    Looking ahead, the roadmap makes sense: move toward self‑serve PLG setup, add smarter orchestration that adapts beyond deterministic flows, and expand into adjacent roles like customer success. For product leaders building in gen ai, the pattern here is instructive: start with inbound value, design AI workflows that align to proven sales motions, and use rigorous evals to earn the right to automate more.

    If you want to go deeper into the build, the live demos, and the full multi‑agent orchestration, listen to this episode on: Spotify | Apple Podcasts. For more on the stack, explore ShowMe and the avatar platform HeyGen.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Implementing AI Agents That Scale: My Playbook for One‑Person Departments with Amplitude

    Implementing AI Agents That Scale: My Playbook for One‑Person Departments with Amplitude

    Over the past few years, I’ve led cross-functional teams to deploy agentic AI in production, and I’ve learned that success rarely hinges on the model alone. It comes from methodically designing the right workflows, instrumenting every step, and building a feedback loop that compounds. Learn how companies like Replit are consolidating workflows, creating one-person departments, and building systems for scale with Amplitude.

    When I talk about AI agents, I’m describing software that behaves like a focused teammate—owning a clear job to be done end-to-end. In practice, that means consolidating fragmented tasks into a single accountable “one-person department,” then giving it the context, tools, and analytics to perform reliably. This is how agentic AI moves beyond demos into durable business impact.

    I start with outcomes, not algorithms. I map a driver tree from business goals (e.g., lower response time, higher activation, better retention) to the specific moments an agent can influence. This outcome-first alignment keeps scope tight, informs guardrails, and grounds the value proposition in measurable change instead of vanity metrics.

    Next, I define the workflow the agent will fully own. I look for high-volume, rules-adjacent processes—think lead qualification, support triage, or billing inquiries—where clear decision criteria already exist but human time is the bottleneck. I document triggers, inputs, decision points, and handoffs, then design the ideal-state flow the agent will run autonomously, with transparent escalation paths to humans.

    On architecture, I favor a retrieval-first pipeline to keep responses accurate and current. I scope the knowledge base, implement context window management, and standardize tools the agent can call (search, CRM actions, ticket updates). For teams new to this, I coach “LLMs for product managers” fundamentals so we make sensible trade-offs between speed and reliability rather than chasing model-of-the-week headlines.

    Instrumentation is where the system becomes self-improving. I use Amplitude analytics and an Agent Analytics schema to track intent detection, tool usage, resolution rate, time-to-resolution, deflection, and escalation causes. A unified analytics platform lets me connect agent outcomes to core product metrics—activation, retention, and conversion—so we can see the real revenue and experience impact, not just local efficiency gains.

    To validate impact, I run A/B testing when traffic allows, setting a minimum detectable effect (MDE) upfront to avoid inconclusive reads. In lower-volume scenarios, I lean on eval-driven development: curated test sets for edge cases, scenario-based regression suites, and error taxonomies that accelerate iteration. Feature flags let us stage capabilities safely (shadow mode, assistive, autonomous) while we monitor deltas before full rollout.

    Reliability and trust are designed in from the start. I apply AI risk management practices—privacy-by-design, data governance, and policy-aligned prompt templates—paired with observability to trace decisions. Clear escalation policies, incident management runbooks, and human-in-the-loop checkpoints ensure the agent fails safe, not silently.

    Shipping cadence matters. I use CI/CD to increase deployment frequency, keep prompts and tools versioned, and gate risky changes with targeted rollouts. As patterns stabilize, we scale horizontally to new use cases, sharing core capabilities (retrieval, analytics, guardrails) as a platform. This is how “one-person departments” multiply without multiplying overhead.

    Change management closes the loop. I partner with product trios and frontline teams to co-design prompts, set acceptance criteria, and define what “good” looks like in plain language. In-app guides and product tours introduce the agent’s role and limits, and structured feedback channels feed directly into our discovery and iteration rhythm.

    The throughline of this playbook is simple: treat agents like real teammates with a job description, operating procedures, and performance reviews. With disciplined workflow design, a retrieval-first pipeline, and outcome-level instrumentation in Amplitude, agentic AI stops being a science project and starts compounding into durable product-led growth.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • LLMs vs AI Agents: Hard‑Won Lessons Product Teams Need to Nail for Real‑World Impact

    LLMs vs AI Agents: Hard‑Won Lessons Product Teams Need to Nail for Real‑World Impact

    When people ask me about "LLM vs AI Agents: What Product Teams Must Get Right," I start with a simple truth: an LLM is a powerful prediction engine, while an AI agent is a productized workflow that plans, takes actions with tools, remembers, and closes the loop on an outcome. That difference sounds academic until you’re on the hook for reliability, cost, and customer trust.

    In my role, I’ve shipped LLM copilots that delight users and piloted agents that automate complex workflows. The pattern that never fails is this: start assistive, then graduate to autonomy. Copilots accelerate people; agents own outcomes. When we respect that gradient, adoption climbs, incidents fall, and we earn the right to expand scope.

    The first decision point is use-case fit. If the task benefits from human judgment, high-context nuance, or brand voice, I frame it as a copilot with strong guardrails and crisp UX. If the task is well-bounded, tool-heavy, and verify‑able, I consider an agent—but only after we can measure end‑to‑end task success with eval-driven development.

    Architecture matters. I reach for a retrieval-first pipeline to keep responses grounded in authoritative data, then add tool use for actions (search, write, schedule, transact) with deterministic scaffolding to prevent thrashing. Good prompt engineering is table stakes, but context window management and a clean memory strategy (short‑term scratchpad, long‑term facts, and policy) separate demos from durable systems.

    Agents amplify both value and risk. I build safety in layers: role and scope definition, tool whitelists, unit limits, human‑in‑the‑loop checkpoints at irreversible steps, and privacy-by-design data governance. We log every decision token-for-token because auditability isn’t optional once agents touch customers, money, or data.

    Measurement is non‑negotiable. For LLM features, I track time‑to‑first‑token, response latency, groundedness, and user satisfaction. For agents, I add Agent Analytics: task success rate, number of steps per task, tool error rate, loop detection, guardrail triggers, escalation to human, cost per successful task, and containment rate. If we can’t see it, we can’t ship it.

    My delivery playbook mirrors modern software ops. We use feature flags, gated betas, and canary rollouts; we version prompts like code; we set incident management paths for model outages and tool drift; and we rehearse fallbacks so the experience degrades gracefully, not catastrophically. Dull operations build dazzling products.

    On roadmapping, I thin‑slice value. We introduce a minimal viable copilot that handles a single, frequent job-to-be-done with high success. Only after continuous discovery confirms product‑market fit do we grant more autonomy, one capability at a time. Outcomes vs output OKRs keep us honest: if the customer’s job gets done faster, cheaper, and with fewer errors, we scale; if not, we fix fundamentals before adding scope.

    Build vs buy is rarely binary. I tend to buy the undifferentiated heavy lifting—observability, prompt versioning, red‑teaming, and policy enforcement—while building the proprietary workflows, data modeling, and UX that encode our defensible advantage. The litmus test: if it’s part of our unique value proposition, we own it; if not, we integrate the best‑in‑class and move.

    Go‑to‑market must be as rigorous as the tech. We position clearly (assistant vs agent), price to value with transparent consumption SaaS pricing, and communicate risk posture in plain language. Customers don’t buy models; they buy confidence that a job gets done reliably within their constraints.

    Common failure modes repeat: shipping autonomy before instrumentation, treating prompts as magic instead of software, skipping data governance, and ignoring the human experience. The antidote is disciplined AI Strategy rooted in empowered product teams, tight feedback loops, and relentless evaluation.

    If you take nothing else: choose the right paradigm for the job (copilot first, agent when proven), ground with a retrieval-first pipeline, instrument with eval-driven development and Agent Analytics, and operationalize like a mission‑critical system. Do that, and you’ll turn LLM capabilities into durable product outcomes.


    Inspired by this post on Product School.


    Book a consult png image
  • Can AI Agents Master Enterprise Analytics? My Proven Task Framework and Amplitude Insights

    Can AI Agents Master Enterprise Analytics? My Proven Task Framework and Amplitude Insights

    Every week, product and data leaders ask me the same question: can AI agents truly shoulder enterprise analytics without sacrificing trust, governance, or speed? I’ve spent the past year putting agentic AI through its paces in real product workflows, and I’ve distilled what works into a practical, task-driven evaluation approach you can adopt immediately.

    Learn how to evaluate AI analytics agents with a task-based framework across analytics tasks. See how Amplitude’s Global Agent scores.

    When I say “enterprise analytics,” I’m talking about far more than chatty dashboards. The bar includes consistent metric definitions, privacy-by-design, RBAC and data governance, audit trails, low-latency decision support, and repeatable outcomes across retention analysis, funnels, cohorts, A/B testing, instrumentation planning, and anomaly detection—ideally within a unified analytics platform.

    My task-based framework evaluates eight capability pillars I expect from an enterprise-ready Agent Analytics solution: task coverage and depth across common product analytics workflows; data fidelity and governance (lineage, access controls, PII handling); instruction-following and reasoning transparency; evaluation rigor and reliability (repeatability, error modes, regressions); security and compliance posture; latency and cost efficiency; integration into existing product strategy workflows (e.g., CRM integration, CI/CD-linked instrumentation, experiment platforms); and human-in-the-loop controls for approvals and guardrails.

    Operationally, I define canonical tasks that reflect day-to-day product management: codify a North Star metric; perform retention analysis by cohort; generate and explain a funnel with drop-off drivers; recommend an event taxonomy and tracking plan; analyze an A/B test with minimum detectable effect (MDE) considerations; and propose a driver tree that maps inputs to outcomes. Each task comes with ground-truth datasets, acceptance criteria, and edge cases to stress the agent—an eval-driven development practice I’ve found indispensable.

    I then score maturity across four levels. L0: a pure chat UI that summarizes existing charts. L1: a retrieval-first pipeline that grounds responses in your analytics catalog and metric store. L2: a tool-using agent that is schema-aware, can write safe SQL, and reconciles results to canonical definitions. L3: a governance-aware autonomous workflow that executes analytics tasks end-to-end with approvals, audit logs, feature flags, and rollback plans. Most teams discover they’re between L1 and L2; reaching L3 requires serious investment in data governance and eval automation.

    Risk management is non-negotiable. I require strict data governance and privacy-by-design controls, including scoped credentials, PII redaction, policy-aware retrieval, and comprehensive observability (query traces, prompt/response logs, lineage). Feature flags and approval gates prevent unintended metric redefinitions. Red-teaming tasks expose prompt injection, schema drift, and hallucination failure modes before they hit production stakeholders.

    Where do agents shine today? Rapid exploration, SQL generation from schema context, summarizing experimentation results, and turning natural-language questions into actionable charts. Where do they struggle? Ambiguous metric semantics, under-specified experiment designs, and edge-case-heavy analyses where ground truth depends on organizational nuance. The cure is disciplined product management: codify definitions, maintain a living analytics taxonomy, and continuously harden your eval suite.

    In the context of product analytics stacks, Amplitude analytics is a common anchor for product teams, and many are evaluating “Amplitude’s Global Agent” to accelerate insight generation. In my framework, I look for how well it grounds to canonical metrics, handles retention and funnel tasks, explains trade-offs, and respects governance boundaries—before I consider expanded autonomy. I share the full task matrix and scoring rubric so you can replicate the assessment in your environment.

    If you’re getting started, pick your top ten high-frequency analytics tasks and define crisp success metrics for each (accuracy, explainability, latency, and reusability). Build a small eval harness with golden datasets, assertions, and regression tests. Favor a retrieval-first pipeline tied to your taxonomy and metric store, add human-in-the-loop approvals for sensitive actions, then pilot with a cross-functional tiger team. Measure time-to-insight, analyst hours saved, and stakeholder trust—then iterate.

    Enterprise analytics isn’t a single feature; it’s a system of definitions, workflows, and governance. With a task-based, eval-driven approach, agentic AI can become a reliable partner—not just a novel interface. If you’re evaluating options, apply this framework first, then expand scope as reliability and trust climb.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Multi‑Agent Systems Demystified: Why One AI Isn’t Enough—and How I Ship Faster With Many

    Multi‑Agent Systems Demystified: Why One AI Isn’t Enough—and How I Ship Faster With Many

    In my day-to-day building AI products, I’ve learned a simple truth: a single model can be brilliant, but a coordinated team of specialized agents is what consistently ships outcomes customers trust. That’s the promise of multi-agent systems—multiple AIs with distinct roles collaborating inside robust AI workflows to deliver accuracy, speed, and resilience you can’t get from a lone model.

    Think of a multi-agent system as a well-run product trio for machines: a planner decomposes the job, specialists execute focused tasks, a reviewer checks quality, and an orchestrator keeps everyone aligned. This agentic AI approach mirrors how high-performing teams work—divide complex problems, play to strengths, and create tight feedback loops.

    When does one AI stop being enough? Whenever tasks require tool use, domain retrieval, multi-step reasoning, or policy adherence under real-world constraints. In those moments, specialized agents shine—one for search using a retrieval-first pipeline, another for reasoning, another for action execution, and a final one for validation. The result is better accuracy with manageable latency and cost.

    The core architecture I rely on starts with a planner that breaks a goal into steps, followed by execution agents equipped with tools and grounded context. I pair this with context window management to keep prompts lean and relevant, and I insert a verifier (or critic) to catch logic slips and policy violations before results reach customers. A lightweight orchestrator coordinates handoffs and retries to keep the whole flow resilient.

    To make this production-grade, I treat observability as non-negotiable. Agent Analytics helps me see which agents are adding value versus adding latency, where failures cluster, and how prompts drift over time. From there, eval-driven development gives me measurable confidence: I codify representative tasks, run offline and shadow evaluations, and only promote changes that move accuracy and safety in the right direction.

    Governance is equally critical. I design privacy-by-design from the start, restrict data movement with strong data governance, and enforce policy constraints inside the workflow rather than after the fact. This includes red-teaming failure modes, rate-limiting tools, and capturing immutable traces for audits and post-incident reviews—habits borrowed from SRE culture that map well to AI systems.

    On the practical side, prompt engineering remains foundational, but it’s the system design that converts clever prompts into reliable outcomes. Tool access, retrieval quality, memory strategy, and error handling matter more than wordsmithing alone. I’ve found that small prompt improvements are amplified when the surrounding workflow is sound—and are overwhelmed when it isn’t.

    If you’re just starting, begin with a narrow use case and a minimal set of agents—planner, executor, and verifier—then expand. Use continuous discovery with real users to learn where the workflow fails in the wild, and iterate with tight release cycles. Treat every agent like a microservice with clear contracts, test coverage, and metrics, and you’ll unlock compounding gains without losing control.

    The payoff is tangible: faster shipping cycles, fewer regressions, and outcomes customers can actually rely on. When stakes are high and ambiguity is real, one AI is often a talented soloist—but a disciplined ensemble of agents is how I deliver dependable, scalable value at product velocity.


    Inspired by this post on Product School.


    Book a consult png image
  • Context Engineering Playbook: 5 Proven Ways to Slash Context Rot and Scale Smarter AI

    Context Engineering Playbook: 5 Proven Ways to Slash Context Rot and Scale Smarter AI

    I've been getting a lot of questions about why I'm diving so deep into Claude Code, so I want to take a step back and provide some context.

    Last March, when I started building my first AI product—the Interview Coach—I felt like I had to figure it all out on my own. I had never built an AI product before, and I didn't have a team I could lean on. It was equal parts energizing and intimidating.

    I had a blast digging in, experimenting, and learning what I needed to learn to ship that first AI product. But I also started to wonder, "How are product teams going to learn this stuff?"

    As an industry, we are being asked to leverage a new technology that is foreign to us. We are all experimenting and learning what's just now possible. It's moving so fast, it's exhausting just following the news, let alone trying to learn and develop new skills.

    My mission has always been to help teams make better product decisions. That still drives me today.

    After releasing the Interview Coach, I asked myself two questions: "How am I going to rapidly develop my skill set?" and "How can I help others do the same?" I landed on a three-part plan: First, I'm going to collect and share stories about how other teams are learning and building AI products—that's why I launched Just Now Possible. Second, I'm going to push the boundaries on how I can use AI in my day-to-day life, and I'm going to write about it. Third, I'm going to keep building AI products—and I'm going to write about that, too.

    The Claude Code series was born out of number two. It’s had an interesting side effect: it’s also helping me build better AI products.

    The more I push the boundaries of what's possible with Claude Code, the more I understand how to build more robust AI products. That’s reinforced my belief that product teams need to get hands-on with this stuff in their day-to-day lives. It’s how we’re going to develop the skillsets we need to build tomorrow’s products.

    In my context rot article—where we learned how to manage the context window in Claude Code—I showed just how much day-to-day practice compounds. Today, I want to show how learning about context window management in our day-to-day lives directly maps to managing the context window in the AI products we might build. My hope is to make it crystal clear how experience in one area develops expertise in the other. Let’s dive in.

    Infographic titled What is Context Engineering? visualizing a context window with arrows and five strategies: compact prompts, external memory, curating turns, repeating info, and sub-agents.
    Discover how product teams engineer context in generative AI: compact prompts, curated turns, external memory, repetition, and sub-agents, all feeding a shared context window to deliver clearer, faster outcomes.

    A quick refresher on context window management. In the context rot article, we learned: "what the context window is and what goes into it"; "how to offload conversational context to the file system"; "about the /compact and /clear tools"; "to repeat critical information as the context window fills up to overcome tokens "lost in the middle" or at the beginning of the input"; and "how to use agents to get access to more context windows."

    It turns out these exact same skills are being used by developers to manage the context window in production products. If you haven't read the context rot article, start there: "Context Rot: Why AI Gets Worse the Longer You Talk (And How to Fix It)."

    What is Context Engineering? Context engineering is the work that we do to manage the context window in the AI products and services that we build. It's how we give the large language model the context it needs to do the job well. It's also how we manage and mitigate context rot in our product and services, so that we can get the highest performance from the underlying model.

    Today, we are going to look at five different strategies that product teams are currently using in their context engineering efforts. You are going to see that each of these strategies ties back to a strategy you might already be using in your day-to-day AI usage (especially if you followed the advice in the context rot article).

    Here's how product teams are putting this into practice right now: designing compact system prompts by breaking big tasks into smaller tasks; building external memory/state structures to keep the context window clean; curating what goes into each turn; repeating critical information as context grows; and using sub-agents to grow the context window.

    I'll connect each tactic back to patterns you're likely already using in your daily AI workflows, especially if you followed the advice in the context rot article. Along the way, I’ll share practical guardrails and instrumentation ideas so you can track quality with eval-driven development, reduce context rot, and scale performance predictably.

    Why this matters for product trios: these strategies clarify the handoffs between prompt engineering, external memory design, and orchestration, which strengthens collaboration across PM, design, and engineering. Whether you’re exploring gen ai prototypes, hardening a retrieval-first pipeline, or evolving toward agentic AI, context engineering is the backbone of reliable, high-performing experiences.

    If you build or lead LLMs for product managers initiatives, consider this your field guide. In upcoming posts, I’ll break down each strategy with concrete examples and templates you can adapt to your stack, so your team can move from experiments to durable, scalable AI workflows with confidence.


    Inspired by this post on Product Talk.


    Book a consult png image
  • From Chaos to Clarity with Claude Code: My Hands-On Playbook for Product Leaders

    From Chaos to Clarity with Claude Code: My Hands-On Playbook for Product Leaders

    I’ve been pushing hard to operationalize AI for real product work, and this episode zeroes in on the moment Claude Code stops feeling like a demo and starts behaving like a dependable teammate. If you’ve ever wondered how to go from clever prompts in the browser to durable, repeatable workflows on your machine, this walkthrough is for you.

    Listen on: Spotify | Apple Podcasts.

    My first honest reaction to installing and configuring the desktop agent was the all-too-relatable “this tool thinks everything is a code repo” reality. That framing helped me reset expectations fast: instead of treating it like a magical universal assistant, I began designing guardrails, context, and repeatable routines—exactly how I’d onboard a new team member.

    The shift from Claude-in-the-browser to Claude Code on my machine was the unlock. Locally, it can finally work with my files, folders, and workflows. That meant I could ground it in real artifacts—project docs, meeting notes, product specs, and historical decisions—so responses weren’t just plausible; they were contextual and verifiable.

    On setup, I now treat /init and Claude MD files as my product requirements. I define roles, boundaries, and canonical sources up front, then run in a deliberate “walled garden.” The “treat it like an intern” model works beautifully: scope access intentionally, expand privileges as trust grows, and keep a tight audit trail of what it can touch and why.

    Surprisingly, task management became my ideal on-ramp. It’s easy to validate, the feedback loops are tight, and the ROI is immediate. I export calendar windows rather than granting full calendar access, then let the agent map priorities into Trello, reconcile time blocks, and surface trade-offs. Fast wins build confidence—mine and the agent’s.

    Model switching matters more than I expected. When speed is king and “good enough” will do, Haiku keeps the loop snappy. When stakes are higher—complex synthesis, nuanced product strategy, or gnarly ambiguity—I step up to Claude Opus 4.5. Being intentional about when to optimize for latency versus depth is a quiet superpower.

    Web tasks can still spiral. When that happens, I pause its autonomy, toggle to fewer steps, and ask, “What are you doing?” Paired with Claude’s Web fetch tool, this makes the agent explain its chain-of-thought planning without exposing hidden reasoning, so I can spot brittle assumptions, prune distractions, and re-ground the task.

    Content retrieval has become a killer workflow. I point the agent at my archives—blog posts, book drafts, transcripts, notes—and ask, “Where have I talked about this before?” It assembles a map of prior art, connects themes I’d forgotten, and prevents me from reinventing work. Over time, this evolves into a Zettelkasten-style research system that upgrades rigor and accelerates synthesis.

    I’ve also turned Claude Code into a publishing engine. From a single transcript, it drafts titles, descriptions, show notes, and chapters, then routes artifacts to Ghost for formatting. Before anything ships, I run fact-checking workflows that validate claims against transcripts and research sources. The output improves, but more importantly, the scaffolding makes quality repeatable.

    Reusable workflows compound. I rely on slash commands to trigger common jobs, break down larger efforts with sub-agents, and wire in hooks and plugins where external systems are needed. This is agentic AI at its most practical: fewer hero prompts, more reliable processes.

    Audience analytics and content prioritization are helpful with caveats. I let the agent cluster themes and flag gaps, then I pressure-test its suggestions against first-party data and strategic goals. As with any model-driven insight, triangulation beats blind faith.

    Two metaphors guide my day-to-day. First, Claude Code is like a dog—sometimes it returns with the stick, sometimes it gets lost in the woods. Second, the “intern” framing keeps me honest: don’t hand it the whole company on day one. With that mindset, my output jumped—more volume without sacrificing quality—because the workflow scaffolding got better.

    In this episode, I cover what Claude Code is and why it’s useful even if you’re not an engineer, the real difference between the browser experience and running locally, how to shape behavior with /init and Claude MD files, why task management is the perfect proving ground, when to export calendar windows versus connecting directly, and when model-switching makes sense—Haiku for speed, Opus for depth.

    I also dig into debugging web tasks by asking “What are you doing?”, content retrieval workflows across personal archives, building reusable slash-command systems with sub-agents, hooks, and plugins, practical publishing stacks from transcripts, fact-checking against transcripts and research sources, and using analytics to prioritize content—with a healthy respect for uncertainty.

    If you’ve been trying to make Claude Code feel less like “throwing a stick into the woods,” this is the candid, tactical tour I wish I’d had on day one. Drop your questions and experiments below—I’m eager to compare notes and refine the playbook together.


    Inspired by this post on Product Talk.


    Book a consult png image
  • AI Agent Deployment Mastery: My Proven Checklist to Ship Safely, Faster, and at Scale

    AI Agent Deployment Mastery: My Proven Checklist to Ship Safely, Faster, and at Scale

    Shipping AI agents is not like shipping a typical feature. The system learns, reasons, and takes action in unpredictable environments, and when it’s customer-facing, the stakes are high. Over the past few years, I’ve refined a practical checklist that helps my teams move quickly without breaking trust. It balances speed with safety, and ambition with accountability—exactly what you need to scale agentic AI in production.

    This checklist was forged in real launches—some smooth, some humbling. Early on, I watched an otherwise brilliant agent confidently offer a refund policy we didn’t have. That one incident made it clear: AI agents require a higher bar for guardrails, evals, and observability. Today, I won’t greenlight an AI rollout without these steps being explicit, owned, and testable.

    Start with outcomes, not output. I define the job-to-be-done, the target users, and the measurable business impact using outcomes vs output OKRs and driver trees. Success is not “ship an agent,” it’s “reduce first-response time by 40% with no drop in CSAT,” or “increase qualified demo bookings by 20% at a lower cost per acquisition.” Clear outcomes give the agent a purpose and the team a north star.

    Prepare the knowledge the agent will use. A retrieval-first pipeline beats raw prompting for most enterprise cases. I inventory sources of truth, set access controls, and enforce data governance from day one. That includes PII handling, redaction, retention policies, and privacy-by-design. If the agent can’t reliably retrieve the right fact at the right time, the rest doesn’t matter.

    Choose models and prompts with discipline. I align model selection with context window management, cost, latency, and tool-use requirements. Then I build prompts and tools together, not in isolation, and I keep temperature, stop conditions, and function-calling explicit. Most importantly, I use eval-driven development: golden datasets, task-specific metrics (accuracy, helpfulness, latency, cost), and target thresholds that must be met before widening rollout.

    Manage AI risk upfront. I treat jailbreaks, toxicity, and data leakage as product risks, not just security issues. I implement layered defenses—input/output filtering, policy checks, rate limits, and abuse monitoring—and define escalation paths and human-in-the-loop handoffs for ambiguous cases. Every risky capability needs an owner, a playbook, and a test.

    Build the pipeline that lets you iterate safely. Prompts, tools, policies, and retrieval configs go through the same CI/CD rigor as code. I use feature flags for progressive delivery, canary cohorts to limit blast radius, and clear rollback procedures. Observability isn’t optional; I track latency, token usage, cost, failure modes, and user outcomes. I also watch DORA metrics and deployment frequency to ensure we’re improving the engine, not just the output.

    Constrain autonomy intentionally. Agent behavior design matters as much as model choice. I set step limits, define tool whitelists, separate read vs write permissions, and specify decision checkpoints. When the agent is uncertain or confidence drops below a threshold, it hands off to a human or a deterministic workflow. Guardrails aren’t barriers; they’re bumpers that keep you on the track.

    Instrument what users experience, not just what models produce. I track activation, task success, self-serve completion rates, and time-to-value. I pair Agent Analytics with journey analytics so I can see where the agent helps or hurts. I also invest in UX trust cues—transparent explanations, undo paths, and in-app guides—so users feel in control. When the agent changes behavior through learning, the interface should make that understandable.

    If you’re shipping a voice AI agent, test in realistic conditions. I set targets for ASR accuracy, barge-in responsiveness, TTS prosody, and end-to-end latency. I predefine safe transfer logic for complex calls and ensure compliance for call recording and data retention. Voice amplifies both the magic and the mistakes; operational excellence is non-negotiable.

    Plan the business rollout like a product, not a press release. I align pricing (often consumption SaaS pricing), packaging, and SLAs with actual unit economics—tokens, inference, and retrieval. I equip solutions engineering with playbooks and reference architectures, wire up CRM integration for attribution, and put feedback loops into Intercom or the support stack so we learn from every interaction.

    Run operations like an SRE team. I define incident severity for AI-specific failures (e.g., harmful output, runaway cost, degraded retrieval), add alerting, and keep runbooks current. I schedule postmortems that feed directly into eval baselines and backlog priorities. Continuous discovery isn’t a ceremony; it’s the safety net that keeps improvements compounding.

    Close the loop on compliance and governance. From day zero, I document data flows, vendor scopes, and audit logs. I verify regulatory compliance and adopt privacy-by-design so I’m not retrofitting later. Transparency, user consent, and opt-outs aren’t just legal checkboxes; they’re trust-building tools that differentiate your product.

    The result of this checklist is speed with confidence. It gives my teams a common language to debate trade-offs, a clear path to production, and the guardrails to scale safely. If you’re preparing to deploy an agent, adapt these steps to your stack and your customers. Your future self—and your users—will thank you.


    Inspired by this post on Product School.


    Book a consult png image
  • Vibe Coding Unleashed: How Parallel Agents Build KPI Driver Trees in Under Two Hours

    Vibe Coding Unleashed: How Parallel Agents Build KPI Driver Trees in Under Two Hours

    I’ve been exploring what I call the next level of vibe coding: orchestrating agentic AI to build complex product artifacts in minutes, not days. The breakthrough comes from ditching linear handoffs and embracing true parallelism—letting specialized agents tackle the work simultaneously while I steer the orchestration. In product management contexts where speed and clarity matter, this shift changes everything.

    Building a KPI Driver Tree in two hours becomes possible when you stop building sequentially and start building with parallel agents.

    For product leaders, a KPI Driver Tree is the fastest way to make strategy legible. It ties high-level outcomes to the levers we can actually pull—features, channels, pricing, onboarding, activation, and retention mechanics—so we can prioritize with confidence. Done well, it connects outcomes vs output OKRs, clarifies measurement, and aligns the team around a shared, testable model of growth.

    Here’s how I operationalize it with agentic AI and AI workflows. I spin up a small team of specialized parallel agents: a Metrics Librarian (taxonomy and definitions), a Data Modeler (event and table design), a Research Synthesizer (voice of customer and causal hypotheses), a UX Prototyper (visualizing the tree and flows), and a QA/Evaluator (logic and consistency checks). An Orchestrator coordinates these agents, resolves conflicts, and composes outputs into a single, production-ready artifact—while I set constraints, review deltas, and decide.

    In a typical two-hour sprint, all agents run at once. While the Metrics Librarian finalizes the KPI ontology, the Data Modeler validates instrumentable events and joins, and the UX Prototyper renders an interactive driver tree for a unified analytics platform. Meanwhile, the Synthesizer maps qualitative insights to quantitative levers, and the Evaluator stress-tests assumptions. Because we’re not waiting for sequential handoffs, we converge on a coherent driver tree and its initial measurement plan in one pass.

    The payoff isn’t just speed—it’s higher-quality decisions. Parallel agents reduce context loss, expose trade-offs earlier, and allow me to compare multiple viable paths side-by-side. This accelerates continuous discovery, aligns with product strategy, and gives product managers and LLMs for product managers a clear, living map of how inputs roll up to outcomes. It’s the closest I’ve found to running a product trio at machine speed.

    Guardrails matter. I pair this approach with strong data governance, privacy-by-design, and eval-driven development so every agent’s output is testable and auditable. Clear prompts, scoped corpora, and consistent acceptance criteria keep the Orchestrator honest, while lightweight Agent Analytics helps me see where reasoning falters and where to improve the system.

    If your team is still tackling analytics artifacts sequentially—requirements, then instrumentation, then visualization—consider switching mental models. Treat the driver tree as the backbone, empower parallel agents to co-create around it, and reserve human judgment for the critical calls. This is vibe coding for product management: creative, fast, and grounded in measurable outcomes.


    Inspired by this post on Pendo – Best Practices.


    Book a consult png image
  • Go Deep or Get Left Behind: How AI Deployment Depth Transforms Customer Service

    Go Deep or Get Left Behind: How AI Deployment Depth Transforms Customer Service

    AI adoption is everywhere. I see more teams every quarter moving from pilots to production—and increasing their budgets accordingly. But the gap between “using AI” and truly transforming with it is widening fast. Launching an AI Agent is easy; building a mature, AI-powered support operation is where the real work—and the real value—lives.

    In the new research, the "2026 Customer Service Transformation Report," the difference comes down to depth of deployment. It’s not enough to dabble. Teams that design their operations around AI are pulling away from those who treat AI like a bolt-on feature.

    This article kicks off part one of my five-part deep dive into the research. I’ll unpack the data, share what I’ve learned leading product and AI strategy, and translate it into practical steps you can apply now. If you’d like to go straight to the source, you can download the report here.

    First, the macro picture: 2,470 global support professionals across industries were surveyed to understand current AI usage, challenges, and the 2026 opportunities. The headline is clear—AI investment is now table stakes. Eighty-two percent of senior leaders say their teams invested in AI in the past year and 87% say they plan to invest in 2026. Those investments are already paying off: Over three-quarters of CS teams (77%) say AI is meeting or exceeding expectations, delivering faster response and resolution times, always-on coverage, cost savings, increased capacity, and multilingual support that scales globally.

    And yet, only 10% of organizations say they have reached a "mature" level of deployment, where AI is fully integrated into operations and working at scale. That’s the tell: most teams are skimming the surface and leaving meaningful performance gains on the table.

    Infographic showing AI deployment stages in customer service: 10% mature deployment, 26% scaling, 35% initial deployment, 26% exploring; note says 3% unsure; circular gauges compare adoption levels.
    Most service teams are still early in AI adoption. Only 10% report mature deployment, while 26% are scaling, 35% are in initial rollout, and 26% remain in exploration, with 3% unsure.

    When I map the data to what I’ve seen in the field, the maturity difference shows up immediately in outcomes. Teams at mature deployment don’t just automate repetitive tasks; they build AI into critical workflows, give it real responsibility, and iterate continuously. Beyond automating the bulk of their manual work, they’re using AI to proactively engage customers and perform tasks on their behalf.

    The results follow. Of the teams that have reached mature deployment, 43% report higher quality and consistency across support—nearly double the rate of those still in the initial deployment stage. That quality shift is how support evolves from a cost center to a value driver. Great experiences don’t just prevent churn; they create advocacy and become a reason customers choose you. The more you trust your AI Agent with meaningful work, the more it creates the conditions for higher-quality, more consistent support.

    One example I point to often: Lightspeed. They operate a complex product across regions and languages, with tens of thousands of monthly requests. When they adopted Fin in early 2023, they needed a solution that could scale with that complexity—and they treated the transition like a first-class change program.

    They leveraged foundational training and built custom, in-house modules aligned to their processes. They supported their team post-launch and worked closely with leadership to align on the goals and benefits of AI. In a large, distributed org, that executive alignment created ownership and momentum. Their VP of Information Systems, Yamine Gluchow, put it perfectly: "It’s not magic. If you invest in understanding, adoption, and great content, AI performance takes off."

    Bar chart on how teams use an AI Agent for customer service, comparing mature vs initial deployments: automate manual work (63% vs 52%), proactive engagement (51% vs 41%), and performing customer tasks (45% vs 28%).
    Mature AI Agent rollouts deliver bigger gains in customer service—outperforming initial deployments in automation, proactive engagement, and task completion (63% vs 52%, 51% vs 41%, 45% vs 28%)—showing how depth drives measurable impact.

    Their outcomes reflect that depth: An 88% involvement rate. 72% of Fin conversations resolved without human intervention. 43,000+ customer requests resolved monthly. Service in 12+ languages across 100+ countries. Stable CSAT—with improvement in some markets.

    What impressed me most was the complexity Fin now resolves. A merchant in France asked about tax invoices—normally a long phone call to check back-end data and explain rules step by step. Instead, Fin handled the conversation in French, provided an accurate end-to-end explanation, and earned positive CSAT. That’s what mature deployment looks like: a system that absorbs complexity and delivers correct, efficient results at scale.

    So how do we build toward that level of maturity? In my experience, this journey requires a mindset shift and operational rigor—not just a bigger AI budget.

    Rethink how you approach support. If you were building from scratch today, you’d design around AI from day one. As Grant Lee, CEO of Gamma, puts it: "If you want to unlock the real value of AI, you have to design for it, not retrofit around it." Treat AI as infrastructure, not a feature. That shift impacts your org design, workflows, and what “good” looks like.

    Neon green hero graphic reading 'The 2026 Customer Service Transformation Report', with subhead 'The AI deployment gap is widening' and a black 'Get the report' button over a bar-chart pattern.
    Leaders are racing ahead with real AI in support. Explore the 2026 Customer Service Transformation Report to see where deployment is stalling, benchmark your team, and get practical steps to scale automation that delights.

    Secure executive sponsorship early. You won’t scale without C-suite backing. AI reshapes how support works, how teams are structured, how performance is measured, and how cost and value flow. Align your CFO on ROI, your CCO on journey design, and your CEO on customer experience as a strategic advantage. Early wins are great—but the compounding gains only come when leadership backs AI as infrastructure, not a one-off cost save.

    Assign clear ownership for AI performance. One common failure mode: no one owns the AI. Stand up an AI operations lead or support ops specialist to review resolution trends and handoffs, tune content and configuration, coordinate on systemic issues, and drive a prioritized improvement roadmap. Without this role, feedback loops break and performance plateaus.

    Treat content as critical infrastructure. Your AI Agent is only as good as the knowledge it can access. Ensure coverage for the topics it must handle, keep information accurate and current, and structure content so it’s easy for AI to consume. Make maintenance part of BAU, not a quarterly fire drill. A clean, governed, retrieval-first pipeline dramatically increases autonomous resolution.

    Build a continuous improvement system. AI performance isn’t static. Train your AI Agent by expanding its knowledge, refining behavior, and connecting new data sources to handle more scenarios autonomously. Validate changes against real scenarios before they ship. Roll out updates in a controlled way across channels and segments. Use performance data to find patterns—frequent handoffs, low-resolution topics—and decide what to improve next. I often point to the Fin Flywheel (Train → Test → Deploy → Analyze) as a practical example of turning performance data into action.

    The big takeaway from the "2026 Customer Service Transformation Report" is encouraging: investment is widespread, and early returns are real. The bigger opportunity is to turn those early wins into durable transformation. Teams leaning into AI as infrastructure—supported by executive alignment, clear ownership, strong content, and a continuous improvement loop—are already separating from the pack.

    Next up in this series, I’ll dig into how leading teams measure success. Beyond simple cost savings, mature deployments tie AI to clear ROI and strategic impact—shifting more work into value-adding, revenue-generating territory. Follow along here, or subscribe on LinkedIn to get the next installment in your feed.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Two People, Zero Waste: How Earmark’s Agentic AI Turns Meetings into Finished Work

    Two People, Zero Waste: How Earmark’s Agentic AI Turns Meetings into Finished Work

    I care about meetings only insofar as they create momentum and outcomes. What if your meetings could actually produce the artifacts you need—specs, tickets, slides—before the call even ends?

    I recently listened to an episode of Just Now Possible where Teresa Torres talks with Mark Barbir (CEO) and Sanden Gocka (Co-Founder), the co-founders of Earmark, about building a productivity suite that turns unstructured conversations into finished work in real time. As a product leader, this premise hits the sweet spot of agentic AI, real-time AI workflows, and ruthless focus on outcomes over output.

    Listen to this episode on: Spotify | Apple Podcasts

    Unlike generic AI notetakers that produce summaries nobody reads, Earmark runs multiple agents in parallel during your meetings—translating engineering jargon, drafting product specs, even spinning up prototypes in Cursor or V0 while you're still talking. That’s the bar I want from AI in the room: finished work, not notes.

    What impressed me most was the clarity of their pivot. They moved from an Apple Vision Pro presentation coaching tool to a web-based meeting assistant. I’ve made similar calls: when the distribution path and daily workflow are obvious, you follow the user’s gravity. This shift unlocked a broader surface area—PMs, engineers, design partners—and made agentic workflows useful where work actually happens.

    They also turned a technical constraint into a commercial advantage. Their ephemeral (no-storage) architecture became a feature for enterprise sales. I’ve seen this repeatedly in AI risk management: privacy-by-design and clear data governance reduce friction with security reviewers and accelerate procurement. For many enterprises, “we don’t store your data” is the win condition.

    Cost discipline was another standout. They tackled the hard problem of making real-time AI affordable—from $70 per meeting down to under a dollar through prompt caching. That’s not just optimization; it’s product strategy. Choices like model selection, context window management, and retrieval-first pipeline design determine whether a feature can scale to every meeting or remains a demo.

    On capability design, the team leaned into templates and simulated stakeholders to ship value fast. Template-based agents: Engineering Translator, Make Me Look Smart, Acronym Explainer. Personas that simulate absent team members (security architect, legal, accessibility). This is exactly how I frame early AI workflows: remove friction for the product trio, anticipate blockers, and let the agent do the tedious, error-prone first pass.

    They were refreshingly pragmatic about models. Why GPT 4.1 still beats newer models for prose quality in their use case is a reminder that “best” is contextual. When the job-to-be-done is precise prose and production-grade artifacts, consistent quality trumps leaderboard buzz. Of course, they also invest in guardrails to ensure quality and manage hallucinations—another non-negotiable for enterprise adoption.

    Search and analysis across time is where many AI products stumble. They explained the limits of vector search for analysis questions across meetings and how they’re building agentic search with multiple retrieval tools (RAG, BM25, metadata queries, bespoke summaries). I couldn’t agree more: analysis requires reasoning over structure, time, and purpose—not just semantic proximity. Layered retrieval with stateful agents beats a single embedding call.

    They also articulated a crisp user thesis: design for product managers as the extreme user to solve for everyone. In my experience, if you satisfy the PM’s bar for clarity, traceability, and actionability, engineers, designers, and go-to-market teams benefit immediately. That’s how you earn daily active use, not once-a-week novelty.

    For builders curious about the stack and comparables, they discuss services and tools like Assembly AI for speech-to-text, OpenAI API with prompt caching support, and build integrations with Cursor and V0 by Vercel. They also reference Granola as a comparison point and nod to ProductPlan, where both founders previously worked. If you want to try the product, here’s Earmark—a productivity suite where the work completes itself.

    If you're a PM drowning in follow-up work or a builder curious about real-time AI architectures, this conversation offers a detailed look at what it takes to ship an AI product that people can't imagine working without. Personally, I see this as a credible path toward an AI chief of staff—their vision goes beyond automating deliverables to orchestrating judgment, compliance signals, and cross-functional readiness.

    The episode covers the founder backstory, what Earmark does, comparisons to competitors, unique features, templates and personas, technical decisions, early versions and challenges, optimizing transcript summarization, managing multiple tools and costs, challenges with context and reasoning models, innovative search and retrieval techniques, creating actionable artifacts from meetings, ensuring quality and managing hallucinations, and the future vision for an AI chief of staff. It’s a full-spectrum look at building with agentic AI, not just talking about it.

    Podcast transcripts are only available to paid subscribers.


    Inspired by this post on Product Talk.


    Book a consult png image
  • From Idea to Impact: My PM-Friendly Blueprint to Building Your First AI Agent Fast

    From Idea to Impact: My PM-Friendly Blueprint to Building Your First AI Agent Fast

    AI agents are quickly moving from novelty to necessity, and the fastest way to capture value is to approach them like any other high-stakes product initiative. In this guide, I share how I plan, build, and launch production-grade agents with a product mindset—balancing ambition with risk, speed with governance, and innovation with measurable outcomes.

    I start by getting crisp on the outcome. Who is the primary user, what job are they hiring the agent to do, and how will we know it’s working? I translate this into outcomes vs output OKRs, such as resolution rate, time-to-value, cost-to-serve, or qualified pipeline influenced—anchoring the roadmap before a single line of code or prompt is written.

    Next, I map the agent’s scope and boundaries. I write a simple capability canvas: the tasks the agent must perform, the tools it can use, the data it can access, and the constraints it must respect. Most successful builds follow a retrieval-first pipeline: connect trusted knowledge sources, enrich with metadata, and manage a lean context window to keep responses relevant and cost-efficient. From the start, I bake in privacy-by-design, data governance, and AI risk management so compliance isn’t an afterthought.

    Model selection comes after the workflow is clear. I choose an LLM for the job (latency, cost, multilingual needs, and tool-use fidelity) and pair it with the right connectors and actions—think CRM integration, ticketing, search, or internal APIs. For voice experiences, I define a voice AI agent persona, turn-taking rules, and barge-in behavior. This is where agentic AI patterns shine: structured planning, tool invocation, and verification loops create a resilient, goal-directed system.

    Prompt design is product design. I write system prompts that define role, tone, constraints, data sources, and success criteria. I add few-shot examples that mirror my top use cases and edge cases, then apply prompt engineering best practices to control style, limit speculation, and encourage citations. For voice, I include prompt engineering for voice to optimize brevity, warmth, and disfluency handling without sacrificing accuracy.

    Before launch, I build an eval-driven development workflow. I curate golden datasets from real user intents, add adversarial cases, and automate evals for accuracy, safety, grounding, and tool-use success. I set a minimum detectable effect (MDE) so A/B testing can validate improvements with confidence, and I define go/no-go thresholds to prevent regression. This becomes my continuous discovery loop for the agent.

    Instrumentation is non-negotiable. I wire up Agent Analytics to track task success, containment/deflection rate, handoff quality, cost per task, and user satisfaction. I supplement with a unified analytics platform and session replays to observe failure patterns. These signals feed prioritization and help me decide when to expand scope versus harden reliability.

    For delivery, I rely on CI/CD with feature flags to gate risky capabilities, plus canary releases for new tools and prompts. I monitor DORA metrics to maintain deployment frequency without trading off quality. When incidents happen, I treat them like production issues: incident management playbooks, rollbacks, and clear postmortems.

    Trust is earned through safety and transparency. I enforce least-privilege access, structured logging, and red-teaming for jailbreaks, prompt injection, and data exfiltration. Threat detection and response plus clear user disclosures keep the experience responsible and compliant with regulatory requirements.

    GTM is product-led. I use in-app guides, product tours, and onboarding checklists to drive user activation and early wins. I define success moments, turn them into habit loops, and run retention analysis to find where users stall. This tight loop of messaging, measurement, and iteration accelerates product-market fit.

    Common high-ROI use cases I prioritize include customer support ai strategy (automated resolution and augmented agent assist), sales and success workflows (lead qualification, QBR prep), and internal knowledge copilots (policy, process, engineering runbooks). Each starts narrow, ships fast, and scales with proven evidence from analytics and experiments.

    If you’re skimming, here’s the blueprint: clarify outcomes, design AI workflows with a retrieval-first pipeline, select the right LLM and tools, engineer robust prompts, institutionalize evals and A/B testing, instrument Agent Analytics, ship with CI/CD and feature flags, and iterate with discipline. In the walkthrough video above, I go deeper on templates, prompts, and experiments you can use to build your first agent with confidence.


    Inspired by this post on Product School.


    Book a consult png image