Tag: AI workflows

4 Costly Misconceptions About Building AI Agents—and How I Turn Them Into Wins

I’ve lost count of how many times I’ve been asked for a “quick AI agent” that can autonomously fix customer problems, write code, or run sales ops. The promise is intoxicating—and I get why. But in practice, sustainable impact comes from disciplined product thinking, not wishful automation. Drawing on my experience leading product for complex, agentic AI initiatives, I want to debunk four misconceptions I see repeatedly and share what actually works.

Misconception 1: AI agents are plug-and-play. The reality is that effective agentic AI behaves more like a new product line than a feature toggle. It needs clear job stories, domain grounding, tool access, and guardrails. I start by narrowing scope to one painful job to be done, then design AI workflows that reflect real constraints (SLAs, compliance, edge cases). From day one, I instrument with Agent Analytics and set up eval-driven development so we can see failure modes early and iterate with intent.

What consistently moves the needle is treating the agent like a teammate you onboard: define responsibilities, provide the right tools, and measure outcomes. I pair scripted validations with live evals, track containment rates and handoff quality, and balance precision/recall depending on the risk profile. This is slow to fast, not fast to broken.

Misconception 2: Bigger models make better agents. In my experience, architecture outperforms horsepower. A retrieval-first pipeline, tight context window management, and practical prompt engineering often beat an oversized model that hallucinates. Tool use matters more than model size: give the agent reliable APIs, clear schemas, and deterministic fallbacks. For LLMs for product managers, the play is to right-size the foundation model and invest in data quality, prompts, and evaluators that reflect your true acceptance criteria.

When I see erratic behavior, I don’t immediately swap models; I improve retrieval, prune irrelevant context, and clarify the agent’s planning loop. Most performance gains come from better state management and grounding rather than a pricier token budget.

Misconception 3: Agents replace teams. High-performing organizations design human-in-the-loop systems. I implement human review on high-risk actions, explicit escalation paths, and simple override mechanisms. That’s not just safety theater—it’s good product design. AI risk management and data governance are part of the product backlog, not an afterthought. In customer support ai strategy, for example, the agent drafts, a specialist approves, and the system learns from deltas to tighten future responses.

The social system matters as much as the technical one: clear role boundaries, audit trails, and feedback loops turn the agent into a force multiplier. Teams gain leverage without surrendering accountability.

Misconception 4: Shipping the agent equals success. Adoption is earned, not announced. I treat agent launches like any product-led growth motion: define activation events, remove friction with in-app guides and product tours, and A/B test prompts, tool choices, and UI affordances. We track time-to-value, task completion rate, and user trust signals (edits, undo patterns, and escalation requests). When we get those leading indicators right, retention follows.

Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.

My playbook is simple and repeatable: frame the problem narrowly, ground the agent with the right tools and data, measure with eval-driven development and Agent Analytics, then grow adoption with a disciplined go-to-market inside the product. The agents that win don’t feel like magic—they feel dependable. That’s what customers trust, and that’s what scales.

Inspired by this post on Pendo – Best Practices.

January 4, 2026
AI Context Pulling Playbook: How I Get LLMs and Teams to Collaborate for Better Product Outcomes

In my role leading product, I’ve learned that the fastest path to higher-quality deliverables from large language models (LLMs) is not a clever prompt—it’s rigorous context. I call the practice AI context pulling: a repeatable way to assemble, compress, and structure the most relevant knowledge before the model ever starts generating. Done well, it turns generative AI into a dependable partner for discovery, prioritization, and execution.
AI context pulling means I proactively gather the right artifacts (customer insights, analytics, strategy, constraints), manage context windows intentionally, and shape the model’s task with clear objectives and guardrails. This reduces hallucinations, improves alignment, and creates traceability back to sources—critical for product management leadership and stakeholder trust.
Learn a new way in which product professionals can collaborate with AI to get even better results on their projects.
Here’s the simple flow I use: first, I define the intent (e.g., “synthesize discovery interviews for a positioning brief”). Next, I inventory relevant context: top customer pains from product discovery, usage patterns from Amplitude analytics, recent support trends from Intercom, and any constraints from our product strategy. Then I run a retrieval-first pipeline to select only the most pertinent slices—favoring recency, representativeness, and canonical sources.
Because context window management matters, I compress long documents into short, source-cited summaries and keep raw excerpts handy when nuance is important. My prompts follow a consistent structure: role and objective, constraints and audience, curated context, the explicit ask, preferred output format, and a brief self-check (e.g., “cite sources and flag uncertainty”). This is prompt engineering for reliability, not theatrics.
A quick example: when drafting a one-page feature brief, I attach three items—the product strategy paragraph that sets the frame, a usage cohort analysis that highlights who’s affected, and five verbatim customer quotes. I ask the LLM to propose a problem statement, success criteria, and a shortlist of solution hypotheses, each tied to a cited piece of evidence. The result is a grounded, decision-ready artifact I can share with product trios and stakeholders.
Tooling-wise, I keep it pragmatic. A lightweight retrieval-first pipeline (embeddings, metadata filters, and recency rules) ensures the LLM pulls what matters. I version prompts and contexts together so I can run quick A/B testing on output quality. And I log decisions and sources to support eval-driven development and continuous discovery.
Common pitfalls are avoidable. Too little context yields generic answers; too much overwhelms the model. Stale docs can mislead; curate aggressively. Vague asks invite fluffy prose; specify outcomes, audiences, and formats. If the task is high risk, I bias toward smaller, well-cited outputs and expand iteratively with human review in the loop.
To measure impact, I track rework rate, review time, and stakeholder alignment on first pass. Over time, teams adopting AI context pulling report clearer artifacts, faster synthesis cycles, and more confident decisions—because every recommendation traces back to evidence. That’s how humans and LLMs truly collaborate better: we provide the right context, and the model amplifies our judgment.
If you’re ready to operationalize this, start by templatizing your most common product workflows—discovery synthesis, roadmap rationale, and release notes—and attach small, high-signal context packs. With a retrieval-first mindset and disciplined prompting, AI becomes an extension of your product craft, not a gamble.

Inspired by this post on Pendo – Perspectives.

January 4, 2026
Master Burger Prompting: Build a High-Impact AI Resume Coach with Proven LLM Structure

I’ve been refining a hands-on approach to “burger prompting” that turns prompt engineering into a reliable, repeatable system. Using an AI resume coach as the proving ground, I’ll walk through a detailed prompt structure to get the most out of your LLM and share what’s worked for me in product environments where clarity, consistency, and measurable outcomes matter.

At a high level, burger prompting follows a simple mental model: the top bun frames the role and mission, the fillings pack in context and examples, and the bottom bun locks in output format and quality guardrails. It’s deceptively simple and extremely effective for Generative AI use cases where you need predictable behavior across different inputs and user personas.

For the top bun, I establish the AI’s role, audience, and objective in one place. In the resume coach flow, I define the assistant as a structured, unbiased reviewer tasked with aligning a candidate’s resume to a specific job description. I set constraints on tone (supportive but direct), scope (resume and job description only), and safety (avoid speculative claims, defer legal or medical advice). This crisp intent statement reduces ambiguity and prevents the model from wandering outside the product’s value proposition.

The fillings are where context window management becomes crucial. I inject the job description, the candidate’s resume, a capability rubric aligned to the role, and the company’s style preferences. If the content is long, I chunk inputs and, when needed, use a retrieval-first pipeline to fetch only the most relevant snippets. I also include a brief style guide with voice, depth, and formatting expectations so the AI doesn’t drift between terse and verbose responses across sessions.

Strong examples are the meat of the burger. I include a few annotated comparisons that show what “excellent,” “good,” and “needs improvement” look like for specific competencies, from impact statements to quantification. These examples are compact and domain-specific, so the LLM sees the pattern I expect without overfitting to a single profile. I encourage transparent reasoning by asking for stepwise evaluations that reference evidence from the resume and job description, while keeping the explanations concise and user-friendly.

The bottom bun finalizes structure and guardrails. I specify an output schema that always returns a brief summary, evidence-backed strengths, concrete gaps with examples of what’s missing, and a prioritized action plan with suggested rewrites. I also request a rubric-aligned score to support eval-driven development, and I cap length to ensure scannability inside product UI. This predictable format reduces downstream parsing errors and keeps the AI workflow snappy.

To operationalize this in a product context, I run small A/B tests on the prompt variants and measure utility through user activation and completion rates. I tune the prompt with tight feedback loops, comparing structured scores against human spot checks until the variance narrows. When I see drift, I adjust the constraints, swap underperforming examples, or expand the rubric to capture overlooked signals.

Quality and trust are non-negotiable. I add guidance to avoid hallucinated credentials or inflated claims, enforce privacy-by-design around sensitive data, and encourage the assistant to cite which resume lines support each recommendation. When the model is uncertain or the resume lacks evidence, the assistant should explicitly say so and propose realistic next steps rather than guessing.

The result is an AI resume coach that feels both helpful and disciplined. With burger prompting, you get a durable prompt pattern you can reuse across adjacent AI workflows, from portfolio reviews to job description rewrites. Once you internalize the top bun, fillings, and bottom bun, you’ll find it far easier to ship prompts that scale, maintain consistency across releases, and deliver tangible, career-advancing outcomes for users.

Inspired by this post on Pendo – Best Practices.

January 4, 2026
Beyond Digital: How I Drive AI Transformation to Build Adaptive, Intelligent Organizations

Digital transformation set the foundation, but it’s no longer sufficient. In my work leading product teams, I’ve learned that real competitive advantage now comes from building systems that perceive, learn, and adapt—end to end, across the product lifecycle and the business operating model.

AI transformation goes beyond automation to create adaptive, intelligent organizations. Discover why it’s the next imperative and how to measure success.

Why is this the next imperative? Customers expect intelligent experiences, not just digitized workflows. Markets are shifting faster than roadmaps, and teams need systems that learn in production. For me, AI Strategy starts with a clear value thesis: where can intelligence amplify customer outcomes and compound business impact—whether in onboarding, customer support, or core product differentiation.

Practically, I frame AI transformation as a capability stack: data governance and privacy-by-design at the foundation; a retrieval-first pipeline to ground models in trusted context; agentic AI and AI workflows to orchestrate actions; and eval-driven development to continuously measure quality, safety, and relevance. Layered on top are operating rhythms—outcomes vs output OKRs, rapid experimentation, and incident management—that keep shipping disciplined and responsible.

I start with product discovery. Together with product trios, we target moments where intelligence removes friction or unlocks new value. We translate those opportunities into crisp outcomes (activation, time-to-first-value, resolution rate) and instrument them from day one. In customer support, for example, a customer support ai strategy might blend LLMs for product managers with retrieval-first grounding to deliver accurate, brand-safe answers and escalate seamlessly when needed.

On architecture, I prioritize context window management and robust integrations. CRM integration and event streams from tools like Intercom, HubSpot, Pendo, and a unified analytics platform provide the signals AI needs to adapt in real time. Prompt engineering patterns, guardrails, and privacy-by-design controls ensure responses remain trustworthy and compliant. When applicable, I explore agentic AI to orchestrate multi-step tasks with clear constraints and auditability.

Delivery is where transformation becomes measurable. I combine CI/CD practices with DORA metrics (deployment frequency, lead time, change failure rate, MTTR) to keep iteration fast and safe. On the product side, A/B testing with a minimum detectable effect (MDE) protects rigor, while eval-driven development tracks model accuracy, hallucination rates, and policy adherence before and after release. I tie these to business metrics like user activation, retention analysis, and support resolution time to ensure we’re shipping outcomes, not just output.

Governance is non-negotiable. AI risk management, regulatory compliance, and data governance anchor every phase—from dataset curation to prompt libraries and model routing. Threat detection and response and incident management processes are integrated so we can respond quickly when behavior drifts or new risks emerge.

Transformation also means evolving how teams work. I invest in empowered product teams, continuous discovery, and developer evangelism to spread best practices across domains. We share playbooks, reusable CustomGPT workflows, and an AI product toolbox to scale patterns like retrieval-first pipelines and safe prompt engineering across the portfolio.

The outcome is not just smarter features; it’s a more adaptive business. With clear OKRs, reliable telemetry, and responsible guardrails, AI becomes a force multiplier for product strategy and execution. If you’re moving beyond digital toward intelligence, start small, measure relentlessly, and let outcomes guide the journey.

Inspired by this post on Pendo – Perspectives.

January 4, 2026
Inside PendomoniumX London: AI’s tipping point and what product leaders should do next

I walked into PendomoniumX London energized by a simple question: are we finally past the AI hype cycle and into real product impact? From the hallway conversations to the main stage, the momentum was unmistakable—and deeply practical.

PendomoniumX’s sixth stop brought 350+ software leaders together for a day of AI transformation, real-world stories, and product innovation.

That scale and focus say a lot. Across the dialogues I joined, the center of gravity has clearly shifted from experiments to execution: building an AI Strategy that aligns with product roadmaps, turning promising prototypes into production-grade AI workflows, and measuring value in ways that reinforce product-led growth. It’s the inflection point where Generative AI moves from isolated pilots to cross-functional capabilities.

My biggest takeaway for product leaders: treat AI like any other durable capability. Start with sharp problem framing and customer outcomes, run continuous discovery to validate use cases, and sequence delivery through product roadmapping and sprint planning. Pair this with privacy-by-design and sensible governance so your teams can move fast without cutting corners.

Operationally, I’ve found it essential to design experiences that accelerate user activation—think thoughtful onboarding, in-app guides, and product tours that reduce friction while teaching new AI-powered behaviors. For teams adopting LLMs for product managers, keep your evaluation loops tight, instrument the journey end-to-end, and make sure every iteration maps to a clear value proposition customers can feel.

Events like PendomoniumX London remind me why community matters: they compress learning cycles. If you’re steering an AI portfolio, now is the moment to translate vision into repeatable systems—prioritize the right bets, make adoption effortless, and let data tell you when to double down or pivot. That’s how we turn AI transformation into durable product innovation.

Inspired by this post on Pendo – Perspectives.

January 3, 2026
Unlock Product Insights Fast: Connect MCP and Pendo to Claude, ChatGPT, and Cursor

I’ve spent the last year pushing our AI Strategy from slideware to shipped value, and one pattern keeps winning in real-world product teams: connecting agentic AI directly to trustworthy product analytics. That connection is where Model Context Protocol shines—safely bridging LLMs with the tools and data product managers rely on every day.
Model Context Protocol (MCP) gives AI agents access to your business data. Learn how MCP works, how product managers are using it, and how to connect Pendo’s MCP server to Claude, ChatGPT, or Cursor for instant product insights.
In practice, I treat MCP as a clean, auditable interface between LLMs and enterprise systems—decoupling the model choice from the data plane and enabling a retrieval-first pipeline with strong data governance. Because MCP standardizes the way agents discover resources and tools, it simplifies context window management, enforces least-privilege access, and makes it easier to evolve our stack without rewriting prompts or fragile glue code.
For product leaders, the immediate payoff is speed to insight. Instead of hopping across dashboards, I ask the agent questions in natural language—“Which onboarding step drives the biggest drop-off by segment?”—and get synthesized answers backed by traceable queries. That shift turns AI workflows into a daily habit, improving continuous discovery and accelerating product-led growth while maintaining privacy-by-design controls.
Under the hood, I think about MCP in four layers: resources (read-only data surfaces such as feature usage or retention cohorts), tools (safe operations like creating a note, exporting a segment, or proposing an in-app guide), prompts (task-scoped instructions tuned for LLMs for product managers), and observability (logs and evaluations). This structure keeps eval-driven development front and center and reduces operational risk.
Here’s how I connect Pendo analytics through MCP to my preferred assistants without compromising security or accuracy:
1) Prepare access: confirm your Pendo MCP server endpoint, authentication method, and scopes; apply least-privilege and redact any PII not required for analysis.
2) Register the server: in Claude, ChatGPT, or Cursor, add the MCP server with the provided URL and API key or token, then enable only the resources and tools your use case demands.
3) Validate the contract: prompt the agent to list available resources and describe tools; run harmless dry runs (e.g., “summarize top feature adoption trends last 30 days”) to confirm the interface behaves as expected.
4) Operationalize: standardize prompts for recurring analyses (QBRs vs OKRs, activation funnels, retention analysis), set guardrails, and log every interaction for audit. This is where prompt engineering meets governance.
5) Iterate with metrics: track answer quality, latency, and usage; expand scopes gradually and gate new tools behind human-in-the-loop until you reach reliable performance.
Once configured, I use the agent to surface weekly activation insights, identify outlier cohorts, and auto-draft product discovery notes with links back to Pendo reports. The result isn’t magic; it’s a disciplined AI product toolbox that brings the right context to the right question, fast.
If you’re starting from zero, pilot with one high-value question, one team, and one assistant. Keep the footprint small, measure outcomes, and then scale—with security, compliance, and stakeholder management baked in from day one. That’s how you turn MCP from an interesting protocol into a durable competitive advantage.

Inspired by this post on Pendo – Best Practices.

January 3, 2026
The New AI Playbook for Product Portfolio Optimization: Slash Complexity, Boost ROI

The most valuable lesson I’ve learned leading product organizations is that portfolio choices make or break outcomes. In an era of infinite requests and finite teams, the question isn’t what we could build—it’s what we must build next. That’s why I’m codifying a pragmatic, AI-driven playbook to optimize the product portfolio while staying true to outcomes, not output.

AI-powered product portfolio optimization is here. Explore strategies and tools helping product leaders manage complexity and boost ROI.

My starting point is a data backbone that connects strategy to reality. I aggregate product usage, revenue by segment, cost-to-serve, retention cohorts, and support signals into a unified analytics platform, then layer a retrieval-first pipeline so LLMs can reason over clean context. Instrumentation matters: Amplitude analytics, Pendo, and in-app guides provide the behavioral and activation signals that make prioritization measurable.

From there, I translate strategy into an objective decision system. I express outcomes vs output OKRs, align initiatives to value proposition and competitive differentiation, and classify opportunities with the Kano Model. LLMs for product managers help cluster voice-of-customer at scale; with thoughtful prompt engineering and AI workflows, I can map themes to jobs-to-be-done, quantify demand, and de-duplicate asks across stakeholders.

Execution hinges on evidence. I run A/B testing with a clear minimum detectable effect (MDE), pair it with eval-driven development for AI features, and ship through CI/CD while tracking DORA metrics. This closes the loop between product roadmapping and sprint planning and real-world performance—activation, retention analysis, and Web Vitals inform the next set of portfolio bets.

Trust is a feature, so governance is built-in. Privacy-by-design, data governance, and AI risk management guide how we store, prompt, and evaluate models. I apply guardrails to sensitive workflows and define success metrics that balance short-term ROI with long-term resilience and regulatory compliance.

The operating model matters as much as the models themselves. Product trios and empowered product teams run continuous discovery, pressure-test assumptions in QBRs vs OKRs, and make trade-offs visible. Stakeholder management becomes easier when the portfolio narrative is anchored in transparent scenarios and shared metrics.

If you’re getting started, here’s my flow: unify data, define outcomes, segment opportunities, simulate scenarios, and test fast. Use LLMs to synthesize signals you’d never humanly read, then make one focused bet per team that moves a measurable KPI. Rinse, learn, and reallocate—portfolio optimization is a living system, not an annual meeting.

Ultimately, the promise of this new playbook is simple: less noise, sharper focus, and compounding ROI. By pairing AI Strategy with disciplined product management leadership, we can manage complexity with clarity—and consistently build what matters most.

Inspired by this post on Product School.

December 29, 2025
10 AI Business Models You Need Now: Proven Playbooks Turning Algorithms into Revenue

I’ve spent the past few product cycles re-architecting roadmaps around one simple reality: AI is no longer just a feature—it’s a business model. The companies winning market share are those that treat models, data, and workflows as monetizable assets with defensible moats, not science projects.

AI business models are rewriting value creation. Learn how smart teams turn algorithms into profit engines, reshaping entire industries.

From my seat in product leadership, I evaluate AI bets through three lenses: durable value (moat and differentiation), measurable outcomes (clear ROI), and unit economics (gross margins under real-world load). With that frame, here are ten AI business models I see performing now—and how I decide when to invest.

1) API-first Model-as-a-Service. I monetize foundation or specialized models via an API, priced by tokens, requests, or time-in-context. Success hinges on latency, accuracy, and “context window management” that balances quality with cost. This is where “consumption SaaS pricing” shines and where disciplined rate-limiting, observability, and SLAs build trust.

2) Vertical AI copilots. I package domain-specific expertise (legal, healthcare, finance, field service) into workflow-native assistants that surface next-best actions. Because these copilots live where work happens, I price on outcomes—time saved, revenue recovered, or risk reduced—aligning value with customer metrics and accelerating product adoption.

3) Agentic AI automation. When autonomous agents handle multi-step tasks across tools, I lean toward per-outcome or per-job pricing. Reliability is the moat, so I invest early in eval-driven development, robust guardrails, and human-in-the-loop QA. This model compounds fast once agents can execute end-to-end workflows with transparent audit trails.

4) Copilot add-ons inside existing SaaS. I’ve seen “AI Assist” tiers deliver immediate ARPU lift and retention gains. The playbook: start with high-frequency, high-friction jobs (drafts, summaries, enrichment), then expand to proactive suggestions. This aligns tightly with product strategy and lets me stage value without overhauling the core experience.

5) Insights-as-a-Service via data network effects. I transform exhaust data into benchmarking, predictions, and prescriptive recommendations—while honoring privacy-by-design and data governance. The more customers I onboard, the stronger the patterns, and the higher the switching costs. Pricing ties to seats plus an outcomes or value metric.

6) Retrieval-first pipeline for enterprise knowledge. I land with high-accuracy answers over customer data (search, summarize, cite), then expand into workflow automations. This “retrieval-first pipeline” reduces hallucinations, boosts trust, and creates defensibility through connectors, semantic indexing, and continuous relevance tuning—an ideal fit for LLMs for product managers prioritizing reliability.

7) Open source monetization. When I bet on openness, I monetize hosting, support, enterprise controls, and compliance features. The advantage is developer love and rapid iteration; the moat is operational excellence at scale, plus integrations customers rely on. This model converts community momentum into predictable revenue.

8) Marketplaces for prompts, skills, and agents. I create a platform for third-party extensions and charge a take rate on usage. The flywheel spins when developers see distribution, customers see breadth, and I enforce strong quality bars. The roadmap focuses on governance, discovery, and safe execution policies.

9) Solutions with forward deployed engineers. For complex rollouts, I pair product with specialized implementation to guarantee outcomes. Revenue blends software plus services, accelerating time-to-value and informing the roadmap with real-world constraints. Over time, learnings fold back into scalable, self-serve capabilities.

10) AI risk, security, and compliance tooling. As AI scales, so does the need for policy enforcement, monitoring, and auditability. I monetize via platform subscriptions that address model provenance, data leakage prevention, red teaming, and reporting. Strong “AI risk management” is now a purchasing requirement, not a nice-to-have.

How do I choose among these models? I start with the customer’s biggest workflow pain, map it to the fastest path to measurable outcomes, and align pricing with value creation. Then I build defensibility through data advantage, distribution, and governance. If a model deepens trust, improves margins, and compounds learning, it earns a place on the roadmap.

Inspired by this post on Product School.

December 24, 2025
Monetizing AI with Confidence: Proven Models, Smart Pricing, and ROI You Can Defend

I’ve learned the hard way that shipping an impressive AI demo is not the same as creating a durable revenue engine. In my role leading product strategy, I focus on one goal: connect AI capabilities to measurable customer outcomes, then price and package them so both value and margins are visible and defensible.

Monetizing AI features into profit isn’t trivial. Here are some clear strategies for capturing and pricing AI products and how to monetize with returns.

First, I clarify the business model. Add-on AI packs work when the value is concentrated in a specific workflow (for example, automated summarization or AI copilot assistance). Tiered packaging helps when AI elevates the overall experience across many features. Usage-based or consumption SaaS pricing is ideal when value scales with volume—tokens, documents processed, calls handled, or agents invoked—because it aligns price to realized outcomes.

Next, I align pricing mechanics with the customer’s value story. I anchor price against the baseline they know: hours saved, conversions gained, cases deflected, or risk reduced. Then I set floors based on unit economics—model inference, vector storage, and orchestration costs—so gross margins remain healthy as usage grows. Clear guardrails (quotas, rate limits, and context window management) prevent surprise bills and keep cost-to-serve predictable.

Packaging is where monetization becomes intuitive. I gate high-cadence, high-compute features behind premium tiers, and I expose quick wins (like smart suggestions) in core tiers to accelerate activation. For enterprise, I bundle governance, audit logs, data controls, and “privacy-by-design” features to justify step-up pricing and reduce procurement friction.

To sustain ROI, I run an eval-driven development loop. I define quality metrics (accuracy, helpfulness, latency, safety) and instrument the retrieval-first pipeline so I can isolate where value is created or lost. This lets me right-size models, tune prompts, and swap components without compromising outcomes or margins—critical for LLMs for product managers who must balance experience and cost.

Measurement is non-negotiable. I track activation, time-to-first-value, weekly engaged AI users, and feature-level retention. For revenue impact, I attribute uplift through A/B testing and minimum detectable effect thresholds, measuring conversion lift, ticket deflection, and cycle-time reductions. When customers see these numbers in their own dashboards, procurement turns into partnership.

Risk and compliance are part of the product, not an afterthought. I build in AI risk management, data governance, and red-teaming from day one. Clear data boundaries, human-in-the-loop controls, and transparent disclosures protect end users and make enterprise legal teams our allies rather than blockers.

Go-to-market matters as much as the model. I use product-led growth tactics—free AI credits, transparent meters, and in-app guides—to let users feel the value before the paywall. Sales enablement centers on the value proposition: faster outcomes, higher quality, and lower total cost of ownership, not just “gen ai” for its own sake. Pricing pages should showcase tiers, usage bands, and outcomes, eliminating guesswork.

Here’s the simple playbook I follow: validate the problem with continuous discovery, instrument the workflow, pilot with generous caps, and collect willingness-to-pay signals early. Then iterate the price meter, refine units of value (documents, messages, or actions), and align SKUs to buyer personas. Over time, I introduce agentic AI capabilities as premium modules when they demonstrably reduce steps or automate entire objectives.

When AI monetization works, it feels effortless to customers because the price mirrors the outcome. When it doesn’t, it’s usually because packaging hides value, pricing ignores unit economics, or ROI isn’t visible. By grounding strategy in value metrics, consumption-aware pricing, and rigorous evaluation, I’ve found we can scale AI revenue with confidence—and keep both customers and margins happy.

Inspired by this post on Product School.

December 22, 2025
Master Burger Prompting: Build a High-Impact AI Resume Coach with Proven LLM Structures

I turned the playful idea of “burger prompting” into a rigorous framework for building an AI resume coach that delivers consistent, high‑quality guidance. In product management, repeatability matters: I want dependable LLM behavior, tight control of outputs, and measurable outcomes. This approach gives me exactly that—clear roles, crisp constraints, and an evaluation loop that raises the quality bar with each iteration.

Here’s the metaphor in practice. The top bun sets the role and goal; the middle layers stack context, examples, constraints, and tools; the patty is the core algorithm and output schema; and the bottom bun locks in the quality bar and follow-up behavior. When I apply this structure to an AI resume coach, I get results that feel expert, empathetic, and actionable—without rewriting the prompt every time.

Top bun: I define the system role and success criteria. I’ll say, “Act as an experienced hiring manager and resume coach for SaaS product roles” and specify the north star: improve clarity, impact, and ATS alignment without fabricating experience. I also name the audience (mid-career PMs, early-career candidates, or executives) so tone and calibration stay consistent across sessions.

First layer: I load precise context. That includes the candidate’s resume, the target job description, and any constraints (for example: keep bullets under 22 words, lead with impact, quantify outcomes). I also clarify non-goals (no inflated titles, no unverifiable claims). This is where I set the voice: confident, concise, and supportive, not generic or robotic.

Second layer: I attach the tools and references that anchor outputs. A skill taxonomy for product roles, a style guide for resume bullets, and a scoring rubric (impact, clarity, relevance, keyword coverage) help the model prioritize. To protect quality, I call out context window management rules—what to include or trim—and how to summarize long inputs without losing signal.

Third layer: I add exemplars. Few-shot examples of excellent resume bullets (“before” and “after”) teach the model what “great” looks like. I also include a counterexample or two to prevent bad habits (for instance, over-indexing on buzzwords). Exemplars act like taste buds; they steer nuance without overfitting.

Patty: I define the core algorithm and the output schema. The algorithm moves in stages: diagnose the resume against the job, identify 3–5 high-leverage improvements, rewrite bullets with quantified outcomes, and propose a summary that highlights relevant wins. I then specify the output sections: a brief diagnosis, rewritten bullets mapped to the job’s requirements, an ATS keyword coverage table, and a confidence score with rationale. A tight schema produces consistent, scannable outputs that are easy to evaluate—and easy to ship.

Bottom bun: I lock in the quality bar and the follow-up behavior. If inputs are incomplete, the coach must ask clarifying questions before rewriting. If claims lack evidence, it should suggest proof points (metrics, scope, stakeholders) rather than embellish. Finally, I require a self-check pass where the coach verifies that each bullet demonstrates impact, relevance, and clarity before presenting the final result.

Implementation blueprint: I create a reusable prompt template with clear system and user sections, then parameterize it for different roles (PM, design, data). If I have a library of style guides or skill matrices, I wire it into a retrieval layer so the model references the right material for each job. This setup makes the coach portable across tools and easy to maintain as the taxonomy evolves.

Evaluation and iteration: I practice eval-driven development. I assemble a small, representative test set of resumes and job descriptions, define acceptance criteria (readability score, keyword coverage, human rater alignment), and A/B test prompt variants. I track drift and tighten the schema whenever outputs start to meander. The goal isn’t just impressive demos—it’s reliable performance at scale.

Governance guardrails: A trustworthy resume coach respects privacy-by-design. I strip PII where possible, avoid storing raw resumes beyond what’s necessary, and document bias checks so advice doesn’t disadvantage non-traditional candidates. Clear data governance and risk management keep the product shippable and compliant as it grows.

When I apply burger prompting end to end, the AI resume coach becomes a repeatable system: fast, accurate, and measurably helpful. The structure teaches the model how to behave; the evals keep it honest; and the schema makes the result easy to review, refine, and ship. If you want dependable LLM outcomes, start with a great bun—and don’t skimp on the patty.

Inspired by this post on Pendo – Best Practices.

December 19, 2025
How I Make Diagnostic AI Trustworthy: Confidence Levels, Citations, and Evals That Win Trust

Trust is the true currency of diagnostic analytics. If customers can’t verify why a system reached a conclusion—or how confident it is—adoption stalls. That’s why this line resonated so strongly with my own playbook: Amplitude used confidence levels, citations, and evals to build a diagnostic AI tool accurate enough to earn customer trust.

Confidence levels are my first non-negotiable. When a model flags a root cause or prescribes a next step, I want the UI to state its certainty upfront and in plain language—ideally with calibrated ranges and a brief rationale. This simple pattern sets the right expectations, reduces over-trust, and supports AI risk management by making uncertainty visible. In practice, we pair this with clear UX writing so users understand what “High,” “Medium,” or “Low” confidence really means in their workflow.

Citations are the second pillar. Every diagnostic needs a breadcrumb trail back to source data: which metrics were analyzed, what time window was used, and how the insight was derived. Linking directly to the underlying chart, query, or dashboard reinforces data governance and shortens the path from “interesting” to “actionable.” When customers can click through to verify the evidence, they gain the confidence to make decisions—fast.

Evals complete the trio. Before and after launch, I hold the team to eval-driven development: offline benchmarks, targeted scenario tests, and live performance monitoring that mirrors real customer use. We define success criteria for precision/recall, false-positive thresholds, and latency, then wire those checks into CI/CD so regressions are caught early. Continuous evals aren’t just QA; they’re the heartbeat of an AI workflow that keeps insights reliable at scale.

Operationally, these practices compound. Confidence levels help prioritize follow-up analysis, citations accelerate collaboration across product and data teams, and evals keep quality high even as models, data, and usage evolve. Together, they form a pragmatic AI strategy that aligns product discovery with measurable outcomes and safeguards customer trust where it matters most—inside daily decisions.

If you’re building a diagnostic AI tool, start with these three building blocks and resist the urge to hide uncertainty. Make it legible. Make it verifiable. And measure it continuously. That’s how we turn powerful models into trustworthy products customers depend on.

Inspired by this post on Amplitude – Perspectives.

December 18, 2025
Beyond the Support Iceberg: Gradient Labs’ Multi‑Agent Breakthrough That Actually Gets Work Done

When a customer reports a stolen credit card, the frontline play seems straightforward—freeze it. But that’s just the visible tip of a much larger customer support iceberg. Underneath sits the real work: dispute filings, fraud investigations, merchant communications, proactive outreach, and follow-ups that unfold over days across multiple systems. Most AI support tools only touch the surface; they don’t coordinate or close the loop. That gap is exactly where my product instincts kick in—and why this story matters.

I recently listened to a conversation with Jack Taylor (Product Engineer) and Ibrahim Faruqi (AI Engineer) from Gradient Labs, an AI-native startup building agents that automate the full scope of customer support in fintech. Their approach resonated with the challenges I see every day in customer support automation: fragmented workflows, regulatory complexity, and the need for human-in-the-loop moments. Gradient Labs has architected a platform with three coordinating agents—"inbound, back office, and outbound"—all built on a shared foundation of "natural language procedures, modular skills, and configurable guardrails."

What impressed me most was how they "Let non-technical subject matter experts define agent behavior through natural language procedures—no coding required." That’s a powerful way to remove engineering bottlenecks, accelerate iteration, and keep the domain experts—those closest to fraud, disputes, and compliance—directly in control. In my experience, this design choice alone can compress lead times from weeks to hours and aligns perfectly with continuous discovery and eval-driven development.

At the heart of their platform is orchestration. They "Architected a state machine orchestrator that manages turns, triggers, and skill selection across long-running conversations." That "turn" architecture is built for the messy reality of async, multi-day support. They treat "Skills as modular agent capabilities—and how they're scoped deterministically per turn," ensuring the system stays predictable and auditable. They also confront a nuanced challenge most teams dodge: "Defining "done" for outbound agents when the customer isn't the one ending the conversation." That’s where deterministic criteria, timers, and clearly scoped outcomes matter as much as the model beneath.

Compliance is not an afterthought—it’s baked into the core. Gradient Labs "Built guardrails as binary classifiers with eval pipelines, tuning for high recall on critical regulatory checks." In regulated domains, optimizing for recall on high-stakes checks is the right call; you can tolerate a few extra reviews, but you can’t miss a potential fraud signal. More broadly, they frame "Guardrails as classification problems: balancing recall and precision for regulatory compliance." That mindset is exactly how I like to merge AI risk management with product velocity.

Crucially, they avoid the trap of fully autonomous optimism. "Ask a Human: a tool call that brings humans into the loop for approvals or missing APIs" gives the system a safety valve for novel or high-risk cases. I also appreciated the explicit "Ask A Human Tool" pattern, which cleanly integrates approvals, policy exceptions, or data gaps without derailing the workflow.

Quality doesn’t happen by accident. They "Designed an auto-eval system that samples conversations for human review to catch edge cases and build labeled datasets" and built "Auto-eval pipelines that flag conversations for manual review and feed labeled datasets." That closed-loop evaluation flow is the backbone of sustainable performance in agentic AI. Combine this with targeted instrumentation—think CSAT, first contact resolution, deflection rate, time to resolution, and escalation rate—and you get a real Agent Analytics discipline, not just logs and dashboards.

The "iceberg" metaphor is more than a catchy visual. It’s a blueprint for scoping multi-agent platforms that work across the entire customer journey. With "inbound, back office, and outbound" agents coordinating on complex tasks like fraud disputes, the system can transition cleanly from intake to investigation to resolution—without dropping context or asking customers to repeat themselves. This is what genuine customer support automation looks like when it’s grounded in real operations.

Under the hood, the team leans into robust design choices that matter at scale: the "Complexities of Natural Language Input" are managed with explicit state and skill scoping, "Deterministic Skill Execution" reduces flakiness, and "Customer-Specific Guardrails" ensure compliance remains aligned to each client’s policies. Add their focus on "APIs and Customer Tools Integration" and the result is a platform that can actually take action—not just answer questions.

If you’re building in this space, here’s how I’d apply these lessons. Start by mapping the iceberg: enumerate back-office steps, approvals, and SLAs that follow the initial customer touchpoint. Capture those steps as "natural language procedures" owned by SMEs. Implement a "state machine orchestrator" to manage "turns, triggers, and skill selection" across multi-day workflows. Treat "guardrails as classification problems" and tune for high recall on high-stakes checks. Introduce "Ask a Human" early to handle missing APIs or policy exceptions. Finally, operationalize learning with "auto-eval pipelines" and tight, eval-driven development loops. That’s how multi-agent platforms deliver measurable outcomes in fintech support.

If you want to hear the full conversation, you can listen on Spotify or Apple Podcasts. You’ll also hear a nod to the "Incident.io episode – Referenced in the conversation," and a thoughtful take on the "Future of Multi-Agent Systems."

In short: this is a shift from simple Q&A bots to agents that can coordinate, comply, and complete. It’s the kind of multi-agent platform work that moves the needle for customer support in fintech—and a compelling template for any product leader scaling agentic AI and AI workflows beyond the tip of the iceberg.

Inspired by this post on Product Talk.

December 18, 2025