I’ve spent the last few years turning AI from an intriguing demo into an operational advantage, and the clearest wins come when we treat agents as productized workflows—not toys. In practice, that means aligning agentic AI to a sharp product strategy, instrumenting everything, and scaling what works across the organization.
Learn how companies like Replit are consolidating workflows, creating one-person departments, and building systems for scale with Amplitude
When I talk about agentic AI, I’m focused on outcomes: fewer handoffs, faster cycle times, and measurable uplift in activation, retention, and NPS. The most successful rollouts start with a specific job-to-be-done, translate it into clear AI workflows, and then iterate with a tight feedback loop between data, design, and engineering.
My implementation playbook is simple and disciplined. First, choose a high-friction workflow and define success upfront. Second, make the build vs buy call on the foundation model, orchestration layer, and connectors. Third, establish AI risk management and safeguards early—before scale amplifies errors. Finally, run small, eval-driven releases and promote what performs.
Instrumentation is where the leverage compounds. With Amplitude analytics as a unified analytics platform, I design purposeful events (agent intent, tool calls, resolution state, human handoff), map funnels from user input to agent outcome, and cohort users by context to pinpoint lift. This gives me an honest read on where agents help, where they hinder, and what to tune next.
The “one-person departments” concept isn’t about doing more with less at all costs; it’s about assembling a tight loop of product management leadership, data, and automation so one operator can own a business outcome end-to-end. An agent handles the repeatable work, while the human focuses on judgment, edge cases, and continuous improvement that compounds.
As we scale, I look for platform scalability patterns: shared tools and policies, reusable prompt libraries, standardized evaluation suites, and consistent governance. That structure keeps agent performance predictable while preserving speed, and it aligns beautifully with product-led growth when agents are embedded directly in the product experience.
If you’re starting now, begin with a single, valuable workflow. Instrument it thoroughly with Amplitude analytics, make decisions from the data you see—not the demos you remember—and expand only after you’ve proven uplift. Iteration beats ambition here: agentic AI rewards teams who measure relentlessly and scale only what truly works.
Inspired by this post on Amplitude – Perspectives.
How do you help disadvantaged students take action on opportunities they don't even know exist? That question has been top of mind for me as I’ve explored how AI can augment—not replace—human mentorship. Recently, I dug into the work behind Zero Gravity, a UK-based platform using mentoring, community, and learning pathways to unlock elite career opportunities for state school students. Their approach reframed a core problem I care deeply about: the "knowing-doing gap."
I sat down with Elliot Little (Product Manager) and Dan St. Paul (Software Engineer) from Zero Gravity to unpack how they’re tackling this gap with an AI career co‑pilot. They’ve intentionally positioned the system as an orchestrator, not an automation tool—bridging the space between knowing what to do and actually doing it. As a product leader, I see this as a powerful pattern for Generative AI: use AI to coordinate steps, personalize guidance, and empower action in moments where confidence and clarity are fragile.
What resonated most was the humility of their build journey. They started with grand visions of AI mentors and synthetic avatars, then scaled back to something simpler and more effective. The first prototype—a job suitability summary—didn’t deliver the "wow moment" they expected. And they discovered that hiding the "LLM magic" backfired—students needed to feel the personalization. That insight aligns with my own experience: users must perceive the value for trust and motivation to compound.
From a UX standpoint, the team chose text chat over voice input and leaned into guided prompts rather than empty text boxes. That decision lowered cognitive load and increased completion rates—classic product management tradeoffs that privilege momentum over novelty. In my view, this is what good AI product strategy looks like: invite action with structure, then expand autonomy as confidence grows.
The technical backbone is equally thoughtful. Multi‑month journeys require rigorous context window management to avoid exploding token counts and degrading quality. I appreciated their pragmatic toolkit: context management techniques like removing stale tool calls, summarizing history, exposing tools conditionally. They also used application logic rather than complex RAG architectures to manage tool availability and context freshness. This is the kind of disciplined engineering that keeps systems reliable at scale without overcomplicating the stack.
Model selection was fit‑for‑purpose, not one‑size‑fits‑all. They’re using different models for different tasks, including "GPT-5 Nano for structured outputs, lighter models for quick replies." That modularity enables speed and cost control while preserving high‑fidelity moments where structure matters most.
Safeguarding was treated as a first‑class concern—non‑negotiable when you’re building AI for 16‑year‑olds. Their safeguarding architecture pairs moderation endpoints with external verification via Unitary. They also invested in building a failure taxonomy through internal red team/green team exercises. This is AI risk management done right: define failure modes early, test ruthlessly, and wire safety into the product surface area—not just the model layer.
Evaluation was grounded in outcomes, not demos. The team focused on whether students progressed from insight to action: applying, interviewing, and engaging with mentors. That aligns with how I run eval‑driven development—ship narrowly, measure real behavior, and iterate toward a repeatable "wow moment" that students can actually feel.
Looking ahead, I’m excited by what’s next: long‑term memory management for multi‑year student journeys. It’s a hard problem—balancing privacy, provenance, and portability—but it’s precisely where an AI career co‑pilot can compound value over time. The vision is compelling: a resilient companion that remembers goals, adapts to context, and orchestrates the right next step.
If you want to dive deeper, you can listen to the full conversation on Spotify and Apple Podcasts:
Listen to this episode on: Spotify | Apple Podcasts
Blue Dot Impact AI Safety Course – free AI safety course Elliot recommended: https://bluedot.org/
My key takeaways: build AI that augments human relationships, not replaces them; don’t hide the personalization—let learners feel it; privilege application logic over unnecessary architectural complexity; and treat safety, context, and evaluation as product features, not afterthoughts. That’s how we bridge the "knowing-doing gap" with integrity and scale.
I wanted to cut through the hype and see what’s actually changing inside customer service teams as AI agents like Fin move from pilots to production. So I analyzed 166 interviews with support leaders, managers, and frontline specialists to understand how roles, workflows, and team structures evolve once AI becomes part of everyday work.
The anecdotes were already loud: AI tools are transforming customer support. But the scale, shape, and consistency of that transformation? Less clear. I went to the source—the practitioners living it—to quantify what’s real and what’s next for customer support AI strategy.
Here’s what I gleaned from the data.
TL;DR — What’s changing
AI is reorganizing core CS operations: Nearly every team (≈95%) reported meaningful workflow changes. Triage, routing, translation, and categorization are increasingly automated. Hybrid human+AI systems are taking their place.
Frontline work is changing to AI oversight: Humans now QA, monitor, and test AI outputs. When it comes to handling queries, they step in for nuance, rather than repetition.
Structural change is widespread but uneven across companies: 83% reported new responsibilities or roles. Some built AI pods, while others retained traditional setups.
Tier 1 headcount demand is falling: 28% saw hiring freezes, slowdowns, or natural attrition at Tier 1 level as AI Agents manage more requests and improve operational efficiency.
Skill gaps are widening inside teams: Data literacy, QA, and cross-functional communication are all rising in value. For many companies, long-term role strategy is lagging behind.
Research methodology
The goal of this research is to understand how many customer service teams have changed their roles, responsibilities and ways of working due to adopting AI agents, as well as understanding how these changes manifest within their organizations.
For this study, the data chosen consists of interviews conducted by the research team, either with Intercom customers or prospects. This data was chosen because the focus of the interviews revolved around the individual experience of the participant, which gives a higher chance of information related to role changes to be present.
The data was collected using Snowflake by pulling all interviews stored in gong conducted by a member of the research team from 01-01-2025 to 14-10-2025.
After the data was pulled, a python script was used to clean the conversation corpus for each conversation retrieved. Common English stopwords (e.g. “and”, “very”, “with”, etc.) were removed, as well as all the text associated with a speaker in the conversation that was not the interview participant(s). This was done to reduce the computational power required for the conversation coding, avoid API timeouts and reduce costs.
After the corpus was cleaned, the OpenAI API was employed, alongside a prompt, to code each conversation using closed codes defined in a closed codebook.
The codes used were:
No role change mentioned: No explicit changes to roles, teams, or reporting lines are attributed to AI/Fin.
Role responsibilities changed due to AI/Fin: Duties/ownership moved between humans and AI/Fin, or scope of a role changed because AI/Fin handles tasks.
Team structure/reporting changed due to AI/Fin: Org/team boundaries, team charters, or reporting lines changed due to adopting AI/Fin.
Headcount/hiring impacted due to AI/Fin: Hiring plans, headcount, staffing coverage, or shifts/rotations changed due to AI/Fin.
Workflow/process changed due to AI/Fin: Steps, triage/escalations, routing, or playbooks changed because AI/Fin alters the process.
Other organizational changes due to AI/Fin: Other changes inside the organization due to AI/Fin that don’t involve a change in responsibilities, team structure/reporting lines, headcount or workflow/processes changes.
Data analysis
166 conversations were retrieved. More than 90% of all conversations report some sort of change either in their role, team, or processes due to implementing Fin, or a similar AI product, with only 13 participants reporting no changes.
Across these conversations, each one could have multiple types of change associated with it (M = 2.35, Med = 2, Min = 1, Max = 4, N = 166).
More specifically, after implementing Fin or a similar AI product:
94.58% participants reported having their processes and workflows disrupted
82.53% participants reported seeing their role and responsibilities change
27.71% participants reported changes in company headcount or hiring
6.02% participants reported their team structure or reporting lines changing as a result
Additionally, 16.27% participants reported a change for a different reason from the ones highlighted above (“Other organizational changes due to AI/Fin”).
Sample representativeness
The sample is representative with a confidence level of 90% and a margin of error of ±6.4% (accounting for an overall unknown population size). The individual confidence intervals for each type of change are as follows.
Workflow/process changed due to AI/Fin: 157 (94.6%), 90% CI: 91.7% – 97.5%
Role responsibilities changed due to AI/Fin: 137 (82.5%), 90% CI: 77.7% – 87.4%
Headcount/hiring impacted due to AI/Fin: 46 (27.7%), 90% CI: 22.0% – 33.4%
Other organizational changes due to AI/Fin: 27 (16.3%), 90% CI: 11.6% – 21.0%
No role change mentioned: 13 (7.8%), 90% CI: 4.4% – 11.3%
Team structure/reporting changed due to AI/Fin: 10 (6.0%), 90% CI: 3.0% – 9.1%
Thematic analysis
1) Automation and AI integration replacing manual steps (94.58%). I see AI workflows embedding into every stage of support. Manual triage, routing, translations, and repetitive responses shift to Fin or similar systems, while agents focus on human-in-the-loop oversight.
Agents’ day-to-day work now revolves around monitoring or fine-tuning AI outputs, not replying to the same questions. In many teams, conversations enter Fin first; humans only step in when nuance or exception handling is required. Testing, QA, and rollout practices have matured too—teams track Fin’s accuracy and iterate intentionally.
2) Humans shift to oversight, AI handles execution (82.53%). The role resets are unmistakable. Support agents and managers move from high-volume execution to optimization, configuration, and measurement. New roles emerge—AI specialists, automation managers, Fin owners—while responsibilities migrate toward strategic analysis and quality assurance.
Duties are redistributed: Fin takes on refunds, triage, simple messaging, even parts of the sales process. I’ve watched some careers pivot toward product/ops or AI systems strategy as managers coordinate testing and monitor adoption metrics.
3) Reductions or slower growth due to efficiency gains (27.71%). Efficiency is real. Many teams reduce Tier 1 headcount needs or slow hiring because AI absorbs simpler requests. Others reallocate people to complex work or AI management. A few still expand—adding automation engineers, implementation specialists, or technical AI leads—but not at past growth rates.
The upshot: organizations handle more volume while stabilizing or reducing staffing, especially at the frontline tier.
4) New AI teams, flatter orgs, fewer escalation layers (6.02%). I’m seeing organizational design catch up to the tech. Some companies form dedicated LLM or automation teams. Others flatten hierarchies, design around workflow complexity instead of region, or merge roles. Dedicated escalation layers shrink as Fin routes or resolves more autonomously.
Team design is getting more modular and data-driven, with clearer ownership for configuration, governance, and Agent Analytics.
5) Broader digital transformation and operational modernization (16.27%). Beyond support, companies are modernizing their operating model: automation-first, digital self-service, better data foundations, and new vendor ecosystems. Collaboration patterns between data, ops, CX, and product/engineering are tightening, with a culture of experimentation and continuous improvement taking hold.
How have customer service roles and responsibilities changed due to Fin/AI agent implementation?
Implementing Fin or a similar AI agent profoundly changes how an organization operates, with around 95% of participants reporting some level of change in their processes after implementation. These systems have significantly reshaped the workflows that customer service teams are used to. Tasks once performed manually, such as ticket triage, routing, repetitive responses, and translations are now handled by AI agents.
“This marks a clear transformation in how customer service agents work: moving away from directly resolving customer queries to focusing on more analytical and procedural work”
As a result, customer service agents’ responsibilities have shifted from performing manual tasks to monitoring and fine-tuning the AI agent whenever its output is inaccurate or incomplete. This marks a clear transformation in how customer service agents work: moving away from directly resolving customer queries to focusing on more analytical and procedural work, such as testing, QA, and performance analysis of AI outputs.
Human agents who still handle conversations tend to do so either because the AI agent cannot yet respond adequately, or because of an organizational choice to retain human involvement for sensitive or high-value interactions. Nevertheless, the need for such roles is diminishing. Around 28% of participants reported a reduction in Tier 1 staff or a hiring slowdown or a full hiring freeze, as AI agents increasingly manage simple requests and organizational attention shifts towards improving automation efficiency.
“In some cases, this has led to the creation of specialized AI teams, reorganizations around workflow complexity, or the merging and redefinition of existing roles”
However, this transformation is not uniform across companies. While some roles have disappeared (particularly escalation layers), others have emerged. Many organizations are reallocating existing staff to AI management or hiring new technical profiles such as automation engineers, implementation specialists, and AI leads. In some cases, this has led to the creation of specialized AI teams, reorganizations around workflow complexity, or the merging and redefinition of existing roles.
Around 83% of participants reported changes to their roles or responsibilities following the introduction of Fin or similar AI agents. Specifically, customer service agents who no longer handle basic queries now focus on managing AI performance, reviewing Fin tasks and improving automation outputs. Managers oversee AI evaluation and implementation, coordinate testing, and monitor AI metrics such as resolution and involvement rates. In some organizations, new dedicated roles have emerged—AI specialists, automation managers, or Fin owners—reflecting a strategic shift toward automation-first, digital self-service models.
These structural shifts are also cultural. I’m seeing teams embrace experimentation, versioning, and eval-driven development while deepening collaboration with data, operations, and product/engineering. The move from outcomes vs output OKRs is palpable: leaders are measuring containment, deflection, CSAT, and time-to-resolution with new rigor.
Overall, a widespread transformation is underway. Roles are broadening, responsibilities are diversifying, and cross-functional collaboration is becoming the norm. Given the pace of gen ai improvement and the rise of agentic AI patterns, I expect these shifts to intensify.
This evolution raises two important questions
Firstly, do customer service agents possess the skills required to succeed in these new roles? While they are experts in customer interaction and company policy, their work now demands new competencies in data analysis (e.g. reporting AI agent performance and how it changes over time), quality assurance/debugging (e.g. Fin output testing and versioning), and cross-functional communication (e.g. if help from another team is required, drafting a business case to justify the resources required could be needed).
Secondly, what long-term strategies are companies adopting to support these evolving roles? Some are reorganizing entirely around automation, while others retain traditional structures. For those undergoing transformation, it remains unclear whether these changes are part of a deliberate strategic plan aimed at achieving specific performance outcomes, or the result of experimentation without defined goals.
Ultimately, Fin’s success— and of AI in customer service more broadly— depends not only on the technology itself but on the people and strategies that shape its use. In my experience, the winners invest early in data literacy, robust QA, clear ownership, and governance; they align product, ops, and CX around a shared AI roadmap; and they measure what matters with disciplined Agent Analytics. That’s how you turn AI workflows into durable customer and business outcomes.
I’ve lost count of how many times I’ve been asked for a “quick AI agent” that can autonomously fix customer problems, write code, or run sales ops. The promise is intoxicating—and I get why. But in practice, sustainable impact comes from disciplined product thinking, not wishful automation. Drawing on my experience leading product for complex, agentic AI initiatives, I want to debunk four misconceptions I see repeatedly and share what actually works.
Misconception 1: AI agents are plug-and-play. The reality is that effective agentic AI behaves more like a new product line than a feature toggle. It needs clear job stories, domain grounding, tool access, and guardrails. I start by narrowing scope to one painful job to be done, then design AI workflows that reflect real constraints (SLAs, compliance, edge cases). From day one, I instrument with Agent Analytics and set up eval-driven development so we can see failure modes early and iterate with intent.
What consistently moves the needle is treating the agent like a teammate you onboard: define responsibilities, provide the right tools, and measure outcomes. I pair scripted validations with live evals, track containment rates and handoff quality, and balance precision/recall depending on the risk profile. This is slow to fast, not fast to broken.
Misconception 2: Bigger models make better agents. In my experience, architecture outperforms horsepower. A retrieval-first pipeline, tight context window management, and practical prompt engineering often beat an oversized model that hallucinates. Tool use matters more than model size: give the agent reliable APIs, clear schemas, and deterministic fallbacks. For LLMs for product managers, the play is to right-size the foundation model and invest in data quality, prompts, and evaluators that reflect your true acceptance criteria.
When I see erratic behavior, I don’t immediately swap models; I improve retrieval, prune irrelevant context, and clarify the agent’s planning loop. Most performance gains come from better state management and grounding rather than a pricier token budget.
Misconception 3: Agents replace teams. High-performing organizations design human-in-the-loop systems. I implement human review on high-risk actions, explicit escalation paths, and simple override mechanisms. That’s not just safety theater—it’s good product design. AI risk management and data governance are part of the product backlog, not an afterthought. In customer support ai strategy, for example, the agent drafts, a specialist approves, and the system learns from deltas to tighten future responses.
The social system matters as much as the technical one: clear role boundaries, audit trails, and feedback loops turn the agent into a force multiplier. Teams gain leverage without surrendering accountability.
Misconception 4: Shipping the agent equals success. Adoption is earned, not announced. I treat agent launches like any product-led growth motion: define activation events, remove friction with in-app guides and product tours, and A/B test prompts, tool choices, and UI affordances. We track time-to-value, task completion rate, and user trust signals (edits, undo patterns, and escalation requests). When we get those leading indicators right, retention follows.
Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.
My playbook is simple and repeatable: frame the problem narrowly, ground the agent with the right tools and data, measure with eval-driven development and Agent Analytics, then grow adoption with a disciplined go-to-market inside the product. The agents that win don’t feel like magic—they feel dependable. That’s what customers trust, and that’s what scales.
In my role leading product, I’ve learned that the fastest path to higher-quality deliverables from large language models (LLMs) is not a clever prompt—it’s rigorous context. I call the practice AI context pulling: a repeatable way to assemble, compress, and structure the most relevant knowledge before the model ever starts generating. Done well, it turns generative AI into a dependable partner for discovery, prioritization, and execution.
AI context pulling means I proactively gather the right artifacts (customer insights, analytics, strategy, constraints), manage context windows intentionally, and shape the model’s task with clear objectives and guardrails. This reduces hallucinations, improves alignment, and creates traceability back to sources—critical for product management leadership and stakeholder trust.
Learn a new way in which product professionals can collaborate with AI to get even better results on their projects.
Here’s the simple flow I use: first, I define the intent (e.g., “synthesize discovery interviews for a positioning brief”). Next, I inventory relevant context: top customer pains from product discovery, usage patterns from Amplitude analytics, recent support trends from Intercom, and any constraints from our product strategy. Then I run a retrieval-first pipeline to select only the most pertinent slices—favoring recency, representativeness, and canonical sources.
Because context window management matters, I compress long documents into short, source-cited summaries and keep raw excerpts handy when nuance is important. My prompts follow a consistent structure: role and objective, constraints and audience, curated context, the explicit ask, preferred output format, and a brief self-check (e.g., “cite sources and flag uncertainty”). This is prompt engineering for reliability, not theatrics.
A quick example: when drafting a one-page feature brief, I attach three items—the product strategy paragraph that sets the frame, a usage cohort analysis that highlights who’s affected, and five verbatim customer quotes. I ask the LLM to propose a problem statement, success criteria, and a shortlist of solution hypotheses, each tied to a cited piece of evidence. The result is a grounded, decision-ready artifact I can share with product trios and stakeholders.
Tooling-wise, I keep it pragmatic. A lightweight retrieval-first pipeline (embeddings, metadata filters, and recency rules) ensures the LLM pulls what matters. I version prompts and contexts together so I can run quick A/B testing on output quality. And I log decisions and sources to support eval-driven development and continuous discovery.
Common pitfalls are avoidable. Too little context yields generic answers; too much overwhelms the model. Stale docs can mislead; curate aggressively. Vague asks invite fluffy prose; specify outcomes, audiences, and formats. If the task is high risk, I bias toward smaller, well-cited outputs and expand iteratively with human review in the loop.
To measure impact, I track rework rate, review time, and stakeholder alignment on first pass. Over time, teams adopting AI context pulling report clearer artifacts, faster synthesis cycles, and more confident decisions—because every recommendation traces back to evidence. That’s how humans and LLMs truly collaborate better: we provide the right context, and the model amplifies our judgment.
If you’re ready to operationalize this, start by templatizing your most common product workflows—discovery synthesis, roadmap rationale, and release notes—and attach small, high-signal context packs. With a retrieval-first mindset and disciplined prompting, AI becomes an extension of your product craft, not a gamble.
AI is changing how I build products, not by replacing designers or researchers, but by amplifying the quality and speed of what our product trios can deliver. The real breakthrough isn’t a single tool; it’s the way genAI and traditional methods combine into a tighter discovery–design–delivery loop that shortens time-to-value without sacrificing rigor.
Learn how Pendo’s product design team is using genAI and traditional tools to speed up design and development.
In practice, that’s exactly the pattern I see working across my teams: we treat genAI as part of the AI product toolbox—great for rapid exploration, structured synthesis, and test preparation—while we rely on our proven techniques to validate outcomes. For early-stage concepting, I use prompt engineering to generate multiple storyboard options and interaction flows in minutes, then refine those outputs with our design system for alignment and accessibility. It’s a pragmatic “gen ai for product prototyping” approach that lets us compare more alternatives, faster, with better signal.
On the testing front, AI accelerates everything around A/B testing without diluting statistical discipline. We draft hypotheses, define success metrics, and estimate minimum detectable effect (MDE) with guardrails, then deploy variants via feature flags in CI/CD. That pairing—LLMs for product managers plus eval-driven development—keeps experiments reproducible while boosting deployment frequency. The outcome is fewer opinions, more evidence, and a tighter feedback loop from build to learn.
Research goes from weeks to days when we combine a retrieval-first pipeline for qualitative data with strong data governance. I’ll ingest interview notes, support tickets, and session transcripts to cluster themes, then pressure-test the clusters with live customer calls. Privacy-by-design and AI risk management remain non-negotiable: we redact sensitive fields, constrain context windows, and keep a human-in-the-loop for decisions that affect user experience or compliance.
Where analytics meets adoption, tools like in-app guides and product tours help us translate insights into behavior change. I’ll prototype a flow, auto-generate guidance variants, and run controlled rollouts to target segments, measuring activation and retention analysis in parallel. This is product-led growth in action: discover the friction, design the intervention, instrument the journey, and validate outcomes with unified analytics.
Organizationally, empowered product teams and continuous discovery make the difference. Our product trios work from outcomes vs output OKRs, pairing competitive differentiation with product strategy to keep bets focused. We meet weekly to review experiment readouts, model trade-offs with the Kano Model, and update product roadmapping and sprint planning based on verified learning—never vibes alone.
If you’re getting started, begin with one workflow—say, prototype generation plus structured experiment design—and measure impact across cycle time, experiment throughput, and decision quality. Layer in communities of practice to share prompt patterns, establish eval baselines, and codify what “good” looks like. The companies winning with AI aren’t chasing shiny objects; they’re building a repeatable system that turns curiosity into customer value.
I’ve been refining a hands-on approach to “burger prompting” that turns prompt engineering into a reliable, repeatable system. Using an AI resume coach as the proving ground, I’ll walk through a detailed prompt structure to get the most out of your LLM and share what’s worked for me in product environments where clarity, consistency, and measurable outcomes matter.
At a high level, burger prompting follows a simple mental model: the top bun frames the role and mission, the fillings pack in context and examples, and the bottom bun locks in output format and quality guardrails. It’s deceptively simple and extremely effective for Generative AI use cases where you need predictable behavior across different inputs and user personas.
For the top bun, I establish the AI’s role, audience, and objective in one place. In the resume coach flow, I define the assistant as a structured, unbiased reviewer tasked with aligning a candidate’s resume to a specific job description. I set constraints on tone (supportive but direct), scope (resume and job description only), and safety (avoid speculative claims, defer legal or medical advice). This crisp intent statement reduces ambiguity and prevents the model from wandering outside the product’s value proposition.
The fillings are where context window management becomes crucial. I inject the job description, the candidate’s resume, a capability rubric aligned to the role, and the company’s style preferences. If the content is long, I chunk inputs and, when needed, use a retrieval-first pipeline to fetch only the most relevant snippets. I also include a brief style guide with voice, depth, and formatting expectations so the AI doesn’t drift between terse and verbose responses across sessions.
Strong examples are the meat of the burger. I include a few annotated comparisons that show what “excellent,” “good,” and “needs improvement” look like for specific competencies, from impact statements to quantification. These examples are compact and domain-specific, so the LLM sees the pattern I expect without overfitting to a single profile. I encourage transparent reasoning by asking for stepwise evaluations that reference evidence from the resume and job description, while keeping the explanations concise and user-friendly.
The bottom bun finalizes structure and guardrails. I specify an output schema that always returns a brief summary, evidence-backed strengths, concrete gaps with examples of what’s missing, and a prioritized action plan with suggested rewrites. I also request a rubric-aligned score to support eval-driven development, and I cap length to ensure scannability inside product UI. This predictable format reduces downstream parsing errors and keeps the AI workflow snappy.
To operationalize this in a product context, I run small A/B tests on the prompt variants and measure utility through user activation and completion rates. I tune the prompt with tight feedback loops, comparing structured scores against human spot checks until the variance narrows. When I see drift, I adjust the constraints, swap underperforming examples, or expand the rubric to capture overlooked signals.
Quality and trust are non-negotiable. I add guidance to avoid hallucinated credentials or inflated claims, enforce privacy-by-design around sensitive data, and encourage the assistant to cite which resume lines support each recommendation. When the model is uncertain or the resume lacks evidence, the assistant should explicitly say so and propose realistic next steps rather than guessing.
The result is an AI resume coach that feels both helpful and disciplined. With burger prompting, you get a durable prompt pattern you can reuse across adjacent AI workflows, from portfolio reviews to job description rewrites. Once you internalize the top bun, fillings, and bottom bun, you’ll find it far easier to ship prompts that scale, maintain consistency across releases, and deliver tangible, career-advancing outcomes for users.
Digital transformation set the foundation, but it’s no longer sufficient. In my work leading product teams, I’ve learned that real competitive advantage now comes from building systems that perceive, learn, and adapt—end to end, across the product lifecycle and the business operating model.
AI transformation goes beyond automation to create adaptive, intelligent organizations. Discover why it’s the next imperative and how to measure success.
Why is this the next imperative? Customers expect intelligent experiences, not just digitized workflows. Markets are shifting faster than roadmaps, and teams need systems that learn in production. For me, AI Strategy starts with a clear value thesis: where can intelligence amplify customer outcomes and compound business impact—whether in onboarding, customer support, or core product differentiation.
Practically, I frame AI transformation as a capability stack: data governance and privacy-by-design at the foundation; a retrieval-first pipeline to ground models in trusted context; agentic AI and AI workflows to orchestrate actions; and eval-driven development to continuously measure quality, safety, and relevance. Layered on top are operating rhythms—outcomes vs output OKRs, rapid experimentation, and incident management—that keep shipping disciplined and responsible.
I start with product discovery. Together with product trios, we target moments where intelligence removes friction or unlocks new value. We translate those opportunities into crisp outcomes (activation, time-to-first-value, resolution rate) and instrument them from day one. In customer support, for example, a customer support ai strategy might blend LLMs for product managers with retrieval-first grounding to deliver accurate, brand-safe answers and escalate seamlessly when needed.
On architecture, I prioritize context window management and robust integrations. CRM integration and event streams from tools like Intercom, HubSpot, Pendo, and a unified analytics platform provide the signals AI needs to adapt in real time. Prompt engineering patterns, guardrails, and privacy-by-design controls ensure responses remain trustworthy and compliant. When applicable, I explore agentic AI to orchestrate multi-step tasks with clear constraints and auditability.
Delivery is where transformation becomes measurable. I combine CI/CD practices with DORA metrics (deployment frequency, lead time, change failure rate, MTTR) to keep iteration fast and safe. On the product side, A/B testing with a minimum detectable effect (MDE) protects rigor, while eval-driven development tracks model accuracy, hallucination rates, and policy adherence before and after release. I tie these to business metrics like user activation, retention analysis, and support resolution time to ensure we’re shipping outcomes, not just output.
Governance is non-negotiable. AI risk management, regulatory compliance, and data governance anchor every phase—from dataset curation to prompt libraries and model routing. Threat detection and response and incident management processes are integrated so we can respond quickly when behavior drifts or new risks emerge.
Transformation also means evolving how teams work. I invest in empowered product teams, continuous discovery, and developer evangelism to spread best practices across domains. We share playbooks, reusable CustomGPT workflows, and an AI product toolbox to scale patterns like retrieval-first pipelines and safe prompt engineering across the portfolio.
The outcome is not just smarter features; it’s a more adaptive business. With clear OKRs, reliable telemetry, and responsible guardrails, AI becomes a force multiplier for product strategy and execution. If you’re moving beyond digital toward intelligence, start small, measure relentlessly, and let outcomes guide the journey.
I walked into PendomoniumX London energized by a simple question: are we finally past the AI hype cycle and into real product impact? From the hallway conversations to the main stage, the momentum was unmistakable—and deeply practical.
PendomoniumX’s sixth stop brought 350+ software leaders together for a day of AI transformation, real-world stories, and product innovation.
That scale and focus say a lot. Across the dialogues I joined, the center of gravity has clearly shifted from experiments to execution: building an AI Strategy that aligns with product roadmaps, turning promising prototypes into production-grade AI workflows, and measuring value in ways that reinforce product-led growth. It’s the inflection point where Generative AI moves from isolated pilots to cross-functional capabilities.
My biggest takeaway for product leaders: treat AI like any other durable capability. Start with sharp problem framing and customer outcomes, run continuous discovery to validate use cases, and sequence delivery through product roadmapping and sprint planning. Pair this with privacy-by-design and sensible governance so your teams can move fast without cutting corners.
Operationally, I’ve found it essential to design experiences that accelerate user activation—think thoughtful onboarding, in-app guides, and product tours that reduce friction while teaching new AI-powered behaviors. For teams adopting LLMs for product managers, keep your evaluation loops tight, instrument the journey end-to-end, and make sure every iteration maps to a clear value proposition customers can feel.
Events like PendomoniumX London remind me why community matters: they compress learning cycles. If you’re steering an AI portfolio, now is the moment to translate vision into repeatable systems—prioritize the right bets, make adoption effortless, and let data tell you when to double down or pivot. That’s how we turn AI transformation into durable product innovation.
I’ve spent the last year pushing our AI Strategy from slideware to shipped value, and one pattern keeps winning in real-world product teams: connecting agentic AI directly to trustworthy product analytics. That connection is where Model Context Protocol shines—safely bridging LLMs with the tools and data product managers rely on every day.
Model Context Protocol (MCP) gives AI agents access to your business data. Learn how MCP works, how product managers are using it, and how to connect Pendo’s MCP server to Claude, ChatGPT, or Cursor for instant product insights.
In practice, I treat MCP as a clean, auditable interface between LLMs and enterprise systems—decoupling the model choice from the data plane and enabling a retrieval-first pipeline with strong data governance. Because MCP standardizes the way agents discover resources and tools, it simplifies context window management, enforces least-privilege access, and makes it easier to evolve our stack without rewriting prompts or fragile glue code.
For product leaders, the immediate payoff is speed to insight. Instead of hopping across dashboards, I ask the agent questions in natural language—“Which onboarding step drives the biggest drop-off by segment?”—and get synthesized answers backed by traceable queries. That shift turns AI workflows into a daily habit, improving continuous discovery and accelerating product-led growth while maintaining privacy-by-design controls.
Under the hood, I think about MCP in four layers: resources (read-only data surfaces such as feature usage or retention cohorts), tools (safe operations like creating a note, exporting a segment, or proposing an in-app guide), prompts (task-scoped instructions tuned for LLMs for product managers), and observability (logs and evaluations). This structure keeps eval-driven development front and center and reduces operational risk.
Here’s how I connect Pendo analytics through MCP to my preferred assistants without compromising security or accuracy:
1) Prepare access: confirm your Pendo MCP server endpoint, authentication method, and scopes; apply least-privilege and redact any PII not required for analysis.
2) Register the server: in Claude, ChatGPT, or Cursor, add the MCP server with the provided URL and API key or token, then enable only the resources and tools your use case demands.
3) Validate the contract: prompt the agent to list available resources and describe tools; run harmless dry runs (e.g., “summarize top feature adoption trends last 30 days”) to confirm the interface behaves as expected.
4) Operationalize: standardize prompts for recurring analyses (QBRs vs OKRs, activation funnels, retention analysis), set guardrails, and log every interaction for audit. This is where prompt engineering meets governance.
5) Iterate with metrics: track answer quality, latency, and usage; expand scopes gradually and gate new tools behind human-in-the-loop until you reach reliable performance.
Once configured, I use the agent to surface weekly activation insights, identify outlier cohorts, and auto-draft product discovery notes with links back to Pendo reports. The result isn’t magic; it’s a disciplined AI product toolbox that brings the right context to the right question, fast.
If you’re starting from zero, pilot with one high-value question, one team, and one assistant. Keep the footprint small, measure outcomes, and then scale—with security, compliance, and stakeholder management baked in from day one. That’s how you turn MCP from an interesting protocol into a durable competitive advantage.
I spend a lot of time with CIOs and IT leaders who are moving fast on generative AI. The momentum is real, but so are the risks. When AI touches core workflows, data, and customer experiences, we need a clear, pragmatic plan that blends AI Strategy with disciplined product management leadership and IT governance.
Learn about the risks that AI poses to IT teams, and how they can mitigate them.
Here are the four risks I see most often—and the playbook I use to de-risk delivery while preserving speed and innovation.
Risk #1: Shadow AI and data leakage. Teams experiment with unapproved tools, and sensitive data ends up in prompts, logs, or third-party services. Without strong data governance and privacy-by-design, even a small proof of concept can create outsized exposure.
How I mitigate it: start with an AI acceptable-use policy, data classification, and clear guardrails on what can be prompted. Deploy a redaction layer and secrets management before any model call. Favor a retrieval-first pipeline so models reason over vetted internal knowledge rather than raw or personal data. Conduct vendor due diligence and DPAs up front, and centralize audit logs to support regulatory compliance and incident response.
Risk #2: Hallucinations and unreliable outputs. LLMs are probabilistic; they can fabricate citations, numbers, or steps. In customer support and internal operations, this erodes trust and creates rework—especially when teams assume model answers are authoritative.
How I mitigate it: adopt eval-driven development with task-specific test sets, reference answers, and pass/fail thresholds that gate CI/CD. Ground models with retrieval, constrain outputs with schemas, and keep a human-in-the-loop for high-risk actions. A/B testing, error taxonomies, and continuous monitoring turn model behavior into measurable, improvable Web Vitals for AI reliability.
Risk #3: Expanded attack surface. Prompt injection, data exfiltration, supply chain risks in model providers, and insecure connectors can undermine existing cybersecurity controls. Traditional threat models often miss these new interaction patterns.
How I mitigate it: treat AI as a first-class asset in threat detection and response. Implement input/output filtering, allow/deny lists, content moderation, and strict isolation of tools and connectors. Red team prompts and tools regularly, rotate credentials, and codify runbooks with SRE and incident management for fast containment. Apply least privilege to agents, APIs, and vector stores, and monitor for anomalous tool-use.
Risk #4: Compliance, bias, and auditability gaps. As AI scales, questions about explainability, fairness, data residency, and retention move from theoretical to board-level. Without traceability, it’s hard to satisfy audits or respond to regulators.
How I mitigate it: embed privacy-by-design from the first sprint—data minimization, consent, purpose limitation, and retention controls. Maintain model cards, versioning, and lineage for prompts, datasets, and parameters. Centralize audit logs, set policies for high-risk use cases, and run periodic compliance reviews with security and legal. Cross-functional communities of practice keep changes aligned across product, engineering, and IT Leadership.
Operationally, I anchor AI initiatives to outcomes vs output OKRs, use empowered product teams and product trios to balance feasibility, value, and risk, and integrate model changes into CI/CD with quality gates. This creates a repeatable mechanism to ship safely, learn quickly, and scale what works.
If you’re standing up new AI workflows or hardening what you already have in production, this playbook gives you a practical path: drive adoption confidently, protect your data, and stay compliant while maintaining competitive velocity.
The bottom line: AI risk management isn’t a brake on innovation—it’s how we earn the right to go faster.
I build AI products with a simple conviction: disciplined experimentation beats intuition. Over the years, I’ve refined a practical playbook that helps my teams learn faster, reduce risk, and turn every release into a smarter next step.
Product experimentation isn’t luck; it’s a method. Learn how top AI product managers test, measure, and grow smarter with every release.
I begin every effort with a crisp hypothesis, an expected user or business outcome, and unambiguous success criteria tied to outcomes vs output OKRs. Before writing a line of code, I define primary metrics and guardrails so we know what “good” looks like—and what to stop.
When the change affects UX, pricing, or activation flows, I favor A/B testing with the statistical rigor to back decisions. We calculate the minimum detectable effect (MDE), choose appropriate randomization units, and pre-register the analysis plan to avoid p-hacking. This gives the team the confidence to scale wins and sunset underperformers quickly.
AI features demand a tailored approach, so I run eval-driven development before any user sees a variant. We curate golden datasets, score candidate prompts and models, and stress-test failure modes. This is where LLMs for product managers matters: prompt templates, context window management, and a retrieval-first pipeline are all evaluated for quality, latency, and cost-to-serve. I treat “hallucination rate,” safety violations, and bias as first-class metrics under AI risk management.
To de-risk launches, we ship behind feature flags with CI/CD, monitor DORA metrics, and roll out in stages. Product trios own problem framing to solution delivery, which shortens feedback loops and preserves accountability. If early signals drift from our hypotheses, we pause, adjust, and re-run—no sunk-cost thinking.
Measurement is non-negotiable. I instrument user journeys end-to-end with Amplitude analytics, track activation and retention analysis, and map behavior to learning objectives. We consolidate logs and events into a unified analytics platform so qualitative insights from customer research pair cleanly with quantitative trends.
Continuous discovery keeps the engine running. Weekly customer conversations, in-product feedback, and lightweight prototypes ensure we validate needs, not just solutions. The output flows into product discovery, product roadmapping and sprint planning, and a reusable AI product toolbox that scales across teams.
Finally, I protect the culture that makes experimentation work: we celebrate invalidated hypotheses, document decisions, and optimize for outcomes over output. That’s how empowered product teams sustain product-led growth—even as complexity grows.
If you’re building AI features today, adopt this playbook to maximize learning velocity, minimize risk, and compound advantage. The method is straightforward: form strong hypotheses, test with rigor, measure what matters, and let evidence—not HiPPOs—guide the roadmap.