Tag: LLMs for product managers

Win AI Search: Proven Playbook to Get Your Startup Recommended by ChatGPT & Perplexity

AI search is quickly becoming the new homepage for startups. When a buyer asks a model for the best tools, they often take the short list at face value. I treat this moment as a product surface I can influence with strategy, content, structure, and distribution—much like any other go-to-market channel.

Early on, I set a simple objective for my team and me: "Learn how LLMs like ChatGPT and Perplexity decide which startups to recommend and what signals help a brand get discovered in AI search." That sentence became our north star for experiments, instrumentation, and content architecture.

Here is the mental model that consistently holds up in practice. Large language models synthesize answers from a knowledge graph built from crawled content, citations, and high-signal sources. They weight consensus, clarity, recency, authority, and machine-readability. I don’t pretend to know the internals, but across hundreds of tests, the same patterns correlate with being surfaced and cited.

First, I make our entity unambiguous. I standardize the company name, product names, and leadership bios across the site and external profiles. I implement Organization and Product markup with schema.org and link out with sameAs to authoritative profiles like LinkedIn, Crunchbase, GitHub, and key directory listings. The goal is to collapse ambiguity so AI search knows exactly who we are and which claims are attributable to us.

Next, I publish definitive, answer-first pages. For every core query—what we do, who it’s for, outcomes, differentiators, pricing, comparisons, and integrations—I ship a page that leads with a crisp summary, then supports it with evidence, examples, and plain language. I include Q&A sections, realistic use cases, and named case studies so models can quote and ground responses in verifiable facts.

I then make the site maximally machine-readable. I add schema.org for SoftwareApplication, Product, FAQPage, and HowTo where relevant. I keep titles, H1/H2 structure, internal links, and metadata descriptive and consistent. I expose last-modified dates, maintain an XML sitemap, and keep a visible changelog and release notes. Freshness matters—Perplexity, in particular, tends to privilege recent, well-cited material when answering time-sensitive questions.

Citations are non-negotiable. I earn credible mentions on third-party properties, analyst lists, comparison pages, and customer reviews. I prioritize authoritative placements over volume, then make sure our site references those sources to reinforce the signal. When Perplexity cites our page alongside a respected third-party review, our inclusion rate in answers rises noticeably.

I also design for developers, buyers, and machines at once. That means clean docs, integration pages, and transparent security and trust content. Clear API references, integration guides, and reliability notes give models concrete artifacts to summarize. Pricing, privacy, and support policies reduce uncertainty and increase the likelihood that an answer will include us.

Measurement turns this from a hunch into a system. I run controlled content experiments, track minimum detectable effect on discovery and mentions, and instrument referral patterns from AI assistants when citations appear. I monitor which prompts surface our brand, which sources are cited, and which pages are repeatedly used as references. When we move a KPI, we codify the pattern into our playbook and scale it.

Trust is the compounding advantage. I maintain a transparent trust center, privacy-by-design posture, and clear data governance practices. I remove vague claims, back up benefits with evidence, and keep all performance or security statements auditable. Models tend to lift brands that feel low-risk, well-documented, and widely corroborated.

If you want a fast start, here’s the checklist I rely on. Standardize your entity and ship schema.org. Publish answer-first pages for core jobs-to-be-done, comparisons, and integrations. Earn authoritative third-party citations and reference them. Keep release notes, changelogs, and dates current. Instrument AI discovery and iterate based on what gets cited. Do this consistently, and your startup earns a fair shot at being recommended when buyers ask AI for the best options.

Inspired by this post on Amplitude – Best Practices.

November 7, 2025
Prototypes vs Products: How I De-risk Ideas Fast and Ship Reliable Value at Scale

Note: This is part of the product creator series of articles, based on the overview article, The Era of the Product Creator. This series is for anyone who wants to create a successful product—whether or not you’ve had formal training or experience in product management, product design, or engineering. Over the years, I’ve watched smart teams stumble because they treated a prototype like a product. The distinction is simple but vital: prototypes exist to learn; products exist to earn trust by delivering value reliably at scale. When we blur that line, we ship avoidable risk to customers and slow ourselves down later with rework. When I build a prototype, I’m testing assumptions as quickly and cheaply as possible. It might be a clickable Figma mock, a Wizard‑of‑Oz demo, or a quick script stitching together a ChatGPT connector with a CustomGPT workflow. It’s intentionally disposable. I expect missing edge cases, fake data, hand‑waving on latency, and limited attention to security or privacy. The only goal is to answer the riskiest questions fast. A product is a promise. It’s hardened for reliability, performance, security, and privacy‑by‑design. It’s observable with real analytics, supports CI/CD and rollback, meets accessibility guidelines, and can be maintained by empowered product teams. It has clear SLAs, incident management runbooks, and instrumentation that lets me track outcomes vs output OKRs and DORA metrics. Keeping prototypes and products separate makes us faster and safer. Prototypes accelerate discovery; products operationalize value. If I catch myself “polishing” a prototype, I pause and either discard it or define the path to production with the right engineering rigor, data governance, and stakeholder management. Here’s how I decide. In prototype mode, I timebox learning to days, not weeks, and focus on a single risky assumption—value, usability, or feasibility. I validate through qualitative research and usability tests, not vanity metrics. To graduate to product work, I require a crisp problem statement, evidence of problem‑solution fit, a technical plan for scale and observability, a privacy and threat modeling review, and a measurement plan (including minimum detectable effect) for upcoming A/B testing. AI adds new wrinkles. For gen AI and agentic AI, I evaluate model behavior offline before exposing anything to customers. That includes prompt design, context window management, guardrails to minimize hallucinations, and clear fallback strategies. I define red‑team scenarios, logging for auditability, and policies for data retention and encryption as part of AI risk management. A recent example: we prototyped an agent workflow in a day that felt magical in demos. We resisted the urge to ship. Instead, we added authentication, rate limiting, PII redaction, human‑in‑the‑loop review, observability, and in‑app guides and product tours for onboarding. Only then did we move to a limited release with a well‑defined go‑to‑market strategy and support readiness. One more trap to avoid: calling a prototype an MVP. An MVP is still a product—minimal in scope but complete enough to deliver value, gather trustworthy data, and support customers. If you wouldn’t put your name on it or support it in production, it’s a prototype, not an MVP. If you’re a product creator, align your product trios around this discipline. Use prototypes to learn quickly in discovery, and use products to deliver outcomes in delivery. That mindset protects customer trust, speeds iteration, and moves you toward product‑market fit with far less waste.

Inspired by this post on SVPG.

November 7, 2025
AI Context Pulling Playbook: How I Make Humans + LLMs Collaborate for Sharper Product Outcomes

Over the last few years, I’ve learned that the fastest path to better product outcomes isn’t “more prompts,” it’s better context. When I combine thoughtful product judgment with disciplined context window management, LLMs become true partners—accelerating discovery, sharpening strategy, and improving execution.

Learn a new way in which product professionals can collaborate with AI to get even better results on their projects.

When I say “AI context pulling,” I’m talking about the intentional process of assembling, structuring, and compressing the right product evidence—customer insights, metrics, constraints, and goals—so an LLM can reason effectively. For LLMs for product managers, the win is simple: by feeding the right inputs and framing the right outcomes, we turn generic AI into a strategic co-pilot for Product Management and AI Strategy.

I start by clarifying intent through outcomes vs output OKRs. Before I ask an LLM to ideate, critique, or plan, I anchor it in the product problem, the measurable outcomes we seek, and the guardrails we cannot cross (risk, privacy, brand). This keeps the collaboration focused and aligned with stakeholder management expectations.

Next, I build a tight “context packet.” I pull customer quotes from discovery notes, usage trends from our unified analytics platform and Amplitude analytics, funnel friction from Intercom transcripts, and commercial constraints from HubSpot data. Then I summarize, deduplicate, and highlight contradictions—so the model gets the signal, not the noise.

From there, I run an agentic AI workflow. In my AI product toolbox, I use CustomGPT workflows with specialized roles: a Summarizer (compress evidence), a Strategist (propose options), and a Skeptic (stress-test assumptions). This agentic AI pattern reduces blind spots and produces artifacts I can share with empowered product teams and executives.

I then bring the insights into a product trios forum (PM, Design, Engineering). We iterate on problem framing, explore solution narratives, and translate options into product roadmapping and sprint planning. The LLM helps us rapidly compare trade-offs, highlight dependencies, and craft crisp decision memos.

Execution still demands rigor. We validate with A/B testing when appropriate, size our minimum detectable effect (MDE), and monitor activation and retention signals. The model helps generate experiment variants and risk checklists, but we own judgment, ethics, and the call to ship.

Governance matters. I treat data governance and privacy-by-design as first-class constraints in every prompt, context packet, and workflow. Clear boundaries make collaboration safer—and paradoxically, more creative—because the LLM spends its cycles inside a well-defined sandbox.

Here’s a simple example: when we explored a new onboarding flow, I fed the model a compressed brief (user segments, friction points, support tickets, and conversion deltas). It returned three viable patterns, each with hypotheses and measurement plans. Our trio refined them, launched a controlled test, and used LLM-powered analysis to summarize learnings for leadership. The result: faster clarity, better decisions, and a tighter feedback loop.

The promise of AI context pulling isn’t that AI replaces product judgment—it’s that it elevates it. With the right structure, LLMs help us think more clearly, decide faster, and build what truly matters. If you’re ready to try this, start small: define an outcome, curate a context packet, and run a single agentic loop with your team. The compounding returns will surprise you.

Inspired by this post on Pendo – Perspectives.

November 6, 2025
How Incident.io’s AI SRE Diagnoses, Hypothesizes, and Fixes Outages in Slack at Record Speed

When your site goes down, every second counts. I’ve lived that reality across multiple product lines, and the difference between a five-minute blip and a two-hour outage is felt by customers, engineers, and the business. That’s why I’ve been closely following how Incident.io has evolved from coordination during chaos to intelligent, proactive response.

Now, they’re building something new: an AI SRE that can actually help diagnose and respond to incidents. As someone who thinks deeply about reliability, velocity, and customer trust, that promise hits the intersection of AI Strategy, product management leadership, and operational excellence.

I recently spent time with Lawrence Jones, Founding Engineer at Incident.io and Ed Dean Product Lead for AI at Incident.io, digging into how their team is teaching AI to think like a site reliability engineer. They shared how they went from simple prototypes that summarized incidents to a multi-agent system that forms hypotheses, tests them, and even drafts fixes—all from within Slack.

Here’s what stood out to me first: AI’s biggest impact comes from compressing time—identifying causes minutes instead of hours. In practice, that means fewer cycles lost to paging the wrong on-call, clearer paths to root cause, and faster recovery—without cutting humans out of the decision loop.

Equally important is deciding where automation belongs. The team’s approach aligns with how I evaluate high-risk workflows: Identify which parts of debugging can safely be automated. Combine retrieval, tagging, and re-ranking to find relevant context fast. Use post-incident “time travel” evals to measure how well their AI performed. Balance human trust and AI confidence inside high-stakes workflows. The human remains accountable; the AI accelerates context, options, and execution.

On the technical side, the retrieval choices were refreshingly pragmatic. Retrieval-augmented reasoning still benefits from simplicity: deterministic tagging and re-ranking often beat complex vector setups. I’ve seen the same in production: start with crisp, deterministic signals, then layer embeddings where they truly add value. This keeps systems debuggable and stable as you scale.

The interface choices matter just as much as the models. “Slack as the interface for human-AI collaboration” puts the agent where incidents already live, reducing friction and increasing adoption. Under the hood, they’ve been pragmatic with “PGVector and Postgres for retrieval experiments”, using “RAG (Retrieval-Augmented Generation)” and “Multi-agent orchestration” to chain context gathering, hypothesis formation, and action proposals. The north star is compelling: “AI as your company’s immune system”.

What impressed me operationally was the rigor around evaluation. Post-incident “time travel” evals let teams score AI accuracy after they know what really happened. That’s the standard we should all adopt: test the agent against reality, not just synthetic prompts, and feed those learnings back into prompts, tools, and guardrails.

Trust is the currency in incidents, so the product surface must reflect uncertainty with care. Building trust in AI isn’t just about precision—it’s about showing reasoning and uncertainty in ways humans understand. In other words, show the chain of thought as a structured artifact (signals considered, hypotheses rejected, evidence gathered), expose confidence bands, and always make it easy for humans to override or guide.

From a workflow standpoint, the investigation loop mirrors seasoned SRE practice: fast scoping, parallel checks and data sources, building hypotheses and refining findings, then proposing remediations paired with the context that justifies them. Human-agent collaboration here is not a handoff—it’s a tight copilot loop where the agent gathers, tests, and drafts, and the human confirms, prioritizes, and executes.

For platform and security leaders, this approach blends speed with safety. Clear permissions, auditable actions, blast-radius constraints, and CI/CD integration keep the AI inside defined guardrails while still delivering material acceleration. The payoff is higher deployment frequency without compromising reliability—because detection, triage, and rollback become faster and more repeatable.

My takeaway as a product leader: this is a blueprint for agentic AI in mission-critical workflows. Start in the tools users live in (Slack), nail retrieval with deterministic foundations, model the expert’s playbook (not just their summaries), and make evaluation a first-class part of the product. Do that well, and the AI goes from assistant to teammate—conservative when it should be, bold when the evidence supports it, and always legible to the humans in the loop.

The momentum around Incident.io’s AI SRE suggests where we’re headed next: deeper integrations, broader coverage across service catalogs, and richer automations that remain transparent and controllable. For teams investing in reliability, this is the moment to operationalize agentic AI—measured, auditable, and designed for trust—so you can move faster when it matters most.

Inspired by this post on Product Talk.

November 6, 2025
Turn Claude Code Into a Trusted Teammate: My 3-Layer Memory System You Can Copy

"Can you critique the landing page for my new Story-Based Customer Interviews course?" That simple ask used to kick off hours of back-and-forth where I fed an AI the same context over and over—only to get generic feedback that wouldn’t land with my audience or fit my products. As a product leader, that inefficiency was unacceptable; as a writer, it was just plain frustrating.

Not anymore. Today, Claude not only critiques my work, it helps me produce it. It generates marketing copy—in my voice. It helps me write blog posts. It knows what search terms are relevant to my business and helps me optimize my articles for SEO and now AEO. It helps me with competitive research, academic research, and discovery research. And it does all of this with little prompting from me.

I don’t upload files to a web-based project. I don’t manage elaborate prompt libraries. I don’t repeat myself. I ask for help and Claude knows exactly what to do. The shift happened when I learned how to give Claude Code a memory. Claude now knows who my target customer is, the key value propositions I focus on, the specific opportunities each product addresses, my revenue model, my marketing channels, and so much more.

A dark-themed strategy slide for the post Stop Repeating Yourself: Give Claude Code a Memory, showing how to lead with a CLAUDE.md glossary page, write clearly for nontechnical readers, and link glossary and article to boost discovery and engagement.

With that memory, I consistently get high-quality output tailored to my audience and aligned to my products and services. I don’t retype the same context; Claude just remembers. In this article, I’ll show you exactly how I set up that memory. It relies on Claude Code (which requires a Pro subscription), and it’s worth it. If you’re new to Claude Code, start with "Claude Code: What It Is, How It’s Different, and Why Non-Technical People Should Use It."

Here’s the underlying problem: with large language models, every conversation starts from scratch. Yes, ChatGPT can remember some things and Claude can search past conversations, but practically speaking each new thread wipes the slate clean. If I were working on a new landing page, I’d normally need to upload target customer context, product details, primary and secondary value propositions, FAQ questions and answers, plus testimonials and logos for social proof—every single time.

Start fast with Claude’s home screen: Sonnet 4.5 is ready, and quick actions for writing, learning, and coding sit beneath a clean prompt box—ideal for showing how memory cuts repetition and streamlines daily development.

Projects in web-based tools help a bit, but they introduce a new dilemma. When I move to the next landing page targeting the same customer but a different product and value proposition, do I start a new Project (tedious) or keep expanding the old one (which muddies the context window and degrades output quality)? The good news: Claude Code solves this by giving the model a precise, durable memory without overloading any single conversation.

Claude Code can read files on my local machine, which is an understated superpower. I use those files to create a persistent, reusable memory that works across all chats and Projects. Files can be mixed and matched, so I give Claude exactly what it needs for the task at hand—and nothing more. For a first landing page, I reference the target customer and the relevant product; for the second, I reuse the same target customer file and point to the new product file.

Dark-mode Notes screenshot captures Claude Code in action: it fetches producttalk.org, reads context files, and delivers a concise homepage evaluation—showing how memory streamlines repeated analysis tasks.

When you give an LLM the exact right context, output quality jumps. More context only helps if it’s the right context. For a landing page, Claude needs to know about the current product and perhaps related products for differentiation—but it doesn’t need to know about unrelated offerings. Structure your memory so Claude gets precisely what’s required.

Once I did this, Claude shifted from “intern who needs handholding” to trusted advisor and capable teammate. It doesn’t guess at my value propositions—I’ve already told it. It writes in my voice because it has my writing guide and samples. It knows who owns which course and which use cases map to which features. The setup takes a bit of upfront work, but it compounds: update a file when something changes and you’re done. Most of this information already lives in your system; the trick is making it easy for Claude to use.

See how Claude Code stops repetition: global and project CLAUDE.md files, plus custom reference docs, flow into the editor so the assistant remembers your preferences and context while you code and run commands.

Because the files live on my machine, I own the system. No vendor or device lock-in. I decide when and who to share with. I can work with Claude on one project and ChatGPT on another—both can rely on the same file-based memory strategy. It’s an AI strategy that scales with product discovery, accelerates go-to-market content, sharpens competitive differentiation, and supports product-led growth.

Here’s how I design the memory: I use three layers. Claude Code already encourages global preferences and Project-specific instructions, but the third layer—reference context—is where the real power lives.

Peek inside a markdown playbook for Claude Code: concise rules for writing, multi-level planning, and clear feedback that turn repeated reminders into reusable memory and smoother, faster coding sessions.

Layer 1: Global Preferences (Always on). The first time I launched Claude Code, I created a CLAUDE.md file at ~/.claude/CLAUDE.md. This is where I keep the cross-project rules of engagement—how I like to work with Claude. Mine includes: Always create a plan for me to review before you start any work; Give me direct feedback (no hedging, no gentle suggestions); Use bullet points for summaries; Ask clarifying questions one at a time so I can give complete answers; No emojis unless I explicitly ask for them. Claude Code automatically loads this file at the start of every session, so I never restate my preferences.

Layer 2: Project-Specific Instructions. Different projects have different rules. In my writing workspace, the Project CLAUDE.md sets the roles (I’m the primary writer; Claude is my thought partner and editor), defines a multi-round review flow (content → structure → accuracy → typos), prioritizes human readability over SEO, and points to my writing style guide. In my task management system, I include how my Trello integration works, file naming conventions for tasks, and how to process research papers into summaries. In my code projects, I specify the technology stack (Node.js vs. Python), testing framework (Jest for Node.js, pytest for Python), code style and conventions, project architecture and directory structure, and which dependencies and libraries to use. Each project directory has its own CLAUDE.md, and Claude automatically loads the relevant file when I’m working there.

Peek inside a markdown playbook for collaborating with Claude—covering session setup, roles, editorial standards, and research steps—to show how saved instructions create consistent results without repeating yourself.

Layer 3: Reference Context (Pull as Needed)—the real power. LLMs have a context window—a limit to how much they can process at once. Even within that limit, loading too much degrades performance due to “context rot.” The remedy is ruthless context management: small, targeted files that load only when needed. Keep CLAUDE.md files concise and focused on rules and workflows. For detailed knowledge, create separate reference files and list them in your CLAUDE.md so Claude knows they exist and when to fetch them. When I ask for help creating a landing page, Claude knows to use my business profile, the product file, and my target customers context.

Here’s what most people miss: you don’t cram everything into global or Project files. You maintain small, reusable reference files that Claude only loads on demand. In my walkthrough, I share exactly which context files I created and why; how I got Claude Code to help me create them; how I break them into small, reusable components so Claude gets precisely what it needs; how I keep everything up to date; and step-by-step instructions so you can set up a similar memory system.

Three project notes funnel into Claude Code, turning reusable context into working output. This visual shows how saving key docs as memory lets the AI pick up where you left off and skip repetitive prompting across tasks.

Let’s dive in.

Inspired by this post on Product Talk.

November 5, 2025
AI at Home, Impact at Work: Experiments That Supercharged My Product Leadership

I recently tuned into an insightful All Things Product episode featuring Teresa Torres and Petra Wille on how experimenting with AI in everyday life sharpens how we build AI-powered products at work. The core premise resonated deeply with my AI Strategy: low-stakes, personal experiments accelerate confidence, clarify limitations, and build an AI product toolbox we can bring into the office with rigor.

If you want to dive in, you can listen on Spotify or Apple Podcasts. I found the conversation especially relevant for product trios and anyone shaping LLMs for product managers in high-stakes environments.

The idea is simple but powerful: when I prototype with AI at home—where the stakes are low—I learn faster, make safer mistakes, and internalize critical product patterns. Over time, those patterns transfer directly to work: tighter context management, sharper bias awareness, clearer human-in-the-loop guardrails, and a more nuanced view of when to use AI as a thought partner versus when to consider agentic AI.

In my own practice, I’ve mirrored many of the scenarios discussed: using ChatGPT by OpenAI to plan meals, analyze public data sets like school budgets, and even sanity-check real estate evaluations. These seemingly mundane tasks are fertile ground for learning about context window limits, hallucination (artificial intelligence), AI bias, and privacy-by-design trade-offs. Each experiment helps me craft better prompts, structure data for clarity, and decide when a human review step is non-negotiable—core habits for AI risk management.

At work, I treat AI as a thought partner for writing, research synthesis, and contract review. I also explore when and how to responsibly evolve toward agentic AI for repeatable workflows. The distinction matters: a thought partner augments judgment; an agent automates execution. Building the right scaffolding—data governance, auditability, constraints, and escalation paths—ensures we unlock speed without compromising safety.

Three lines from the episode stayed with me: “I’m trying to write things that only I can write — that’s my guiding writing light right now.” — Teresa. “The more we use AI, the more we learn what it’s good at, what it’s not good at, and where context becomes a limitation.” — Teresa. “It’s a safer playground — we can build our toolbox at home before bringing those lessons to work.” — Petra. These are practical north stars for product management leadership in the GenAI era.

For anyone getting started, here’s what worked for me: begin with “low-stakes” personal experiments, write down your prompts and outcomes, and reflect on failure modes. Treat each activity as product discovery: What problem am I solving? What outcome matters? What data and context does the model need? Which decisions must stay human-in-the-loop? This discipline builds an AI product toolbox you can confidently apply to real customer problems.

I also keep a running toolkit of references and tools that inform my practice: Context window as a concept helps me size and sequence information. Visual and video tools like Midjourney and Sora expand how I think about multimodal experiences. I rotate between Claude by Anthropic and ChatGPT by OpenAI depending on task fit, and I’ve used Claude Code when I need structured assistance with code review. For knowledge capture and workflow, Readwise and Ghost help me structure insights and ship content.

If you want more structured learning paths, I found Josh Seiden’s Learn AI With Me, A 30-Day Sprint to be a practical primer, and the broader community conversation at Product at Heart Conference is invaluable. For a deeper grounding in risk, I recommend reviewing topics like Hallucination (artificial intelligence), AI bias, and Agentic AI—and revisiting the complementary episode, Context is King.

I’d love to hear how you’re experimenting: Where have you seen AI meaningfully reduce toil? Where does it still struggle? How are you balancing creativity, data safety, and compliance as you scale? Drop a comment below and let’s compare notes—especially on patterns that help product trios move faster without sacrificing trust.

Bottom line: start small at home, carry lessons into the office, and build with curiosity and intentionality. That’s how we level up our product discovery, sharpen our value proposition, and lead teams confidently through the GenAI transition.

Inspired by this post on Product Talk.

November 4, 2025
Decode Why Users Do What They Do: A Proven Playbook for Customer Sentiment Analysis

I obsess over why users do what they do. When I connect the dots between behavior and emotion, product decisions get clearer, roadmaps get sharper, and outcomes improve fast. Customer sentiment analysis is the discipline that helps me bridge that gap between numbers and nuance—turning scattered feedback into a focused narrative that drives product-led growth and retention.

Want to understand the thoughts and feelings that drive user actions? This guide to customer sentiment analysis shows you how to listen and respond.

At its core, customer sentiment analysis blends quantitative signals (usage telemetry, conversion, churn) with qualitative insight (support conversations, reviews, in-app feedback) to reveal why users behave the way they do. I use it to pinpoint friction in onboarding, accelerate user activation, and reinforce the value proposition across the journey. The result is a product experience that not only performs but also resonates.

Here’s how I listen at scale. I aggregate inputs from support tickets and call transcripts, in-app feedback widgets, community posts, and social listening; I supplement them with product analytics from Amplitude analytics, guidance and event data from Pendo, and conversation and engagement patterns from Intercom. With strong CRM integration to HubSpot and a unified analytics platform, I can tie sentiment to accounts, lifecycle stages, and revenue impact—so every signal is actionable, not anecdotal.

On the analysis side, I segment feedback by journey stage (onboarding, activation, adoption, expansion, churn risk) and classify it by theme (usability, reliability, pricing, time-to-value). Gen ai and LLMs for product managers help me summarize large volumes of text, cluster topics, and score sentiment with speed, while I maintain guardrails through data governance, privacy-by-design, and clear AI risk management policies. The aim isn’t just a score—it’s a storyline I can act on.

Closing the loop is where sentiment turns into outcomes. If I see negative sentiment around first-run complexity, I streamline onboarding, add contextual product tours and in-app guides, and refine tooltip design and UX writing. I then validate improvements with A/B testing, watch minimum detectable effect (MDE) thresholds, and track movement on activation, NPS/CSAT, and early retention. This rhythm creates a durable feedback-to-feature pipeline that compounds over time.

Operationally, I run a recurring sentiment review with product trios and cross-functional leaders. We connect insights to outcomes vs output OKRs, pressure-test bets through product discovery, and prioritize work that measurably reduces friction. When sentiment and behavior point to the same problem, it moves to the top of the roadmap. When they diverge, we dig deeper before we build.

If you’re getting started, begin with the highest-value surfaces: onboarding and activation. Instrument the journey, centralize feedback, and label themes consistently. Use small, targeted experiments to address the loudest pain points, then scale what works. Over a few cycles, you’ll see clearer insights, faster decisions, and a product experience that feels intuitively “right” to your users—because it’s grounded in their words and their behavior.

Inspired by this post on Product School.

November 3, 2025
Mastering AI Evals: The Essential Product Manager Skill to Ship Safer, Smarter AI

In every AI-powered product I ship, evaluation is the difference between a compelling demo and a dependable customer experience. AI evaluation isn’t a nice-to-have; it’s a core product management competency that shapes quality, safety, and business outcomes from the first prototype to scale.

When I talk about AI evaluation, I mean a disciplined, repeatable way to measure model behavior across quality, safety, reliability, latency, and cost. Gen AI has changed the cadence of product decisions—models evolve weekly, prompts drift under real-world load, and edge cases multiply. Without rigorous evals, we risk shipping unpredictability.

My goal in this piece is simple: “Dive deep into AI evals, why they matter for PMs today, and how to master them with clear steps, examples, and best practices.” If you’re leading product strategy for LLMs, agentic AI, or applied AI features, this is the playbook I rely on.

Why this matters now: customers don’t judge AI by benchmarks, they judge by trust—did it help me, was it safe, was it fast? Strong AI evals let me set outcomes vs output OKRs, quantify risk, and make transparent trade-offs between accuracy, latency, and cost. They also give engineering and design clear guardrails to move fast without breaking user trust.

Step 1: Define the product problem and success metrics. I start by tying AI metrics to business outcomes—resolution rate, deflection rate, revenue lift, time-to-value—and include model-centric measures like hallucination rate, harmful content rate, latency, and token cost. This keeps experiments anchored to impact, not just model scores.

Step 2: Build a high-signal golden dataset. I curate real, anonymized user prompts from discovery and support channels, then add adversarial and long-tail cases. For generative tasks, I create rubric-based criteria for correctness, helpfulness, tone, and safety. This dataset becomes my regression suite as prompts, RAG pipelines, or models change.

Step 3: Choose the right evaluation methods. I combine deterministic unit tests for rules with LLM-as-judge scoring, pairwise preference tests for prompt variants, human review for critical flows, and red teaming for safety. I also apply privacy-by-design and strong data governance to ensure eval data handling meets compliance and customer expectations.

Step 4: Operationalize with CI/CD. Evals run automatically on every prompt, retrieval, or model update, with pass/fail gates and alerting. I track results in a unified analytics platform so product, engineering, and go-to-market teams see the same truth. If a change regresses key thresholds, we pause rollout or roll back.

Step 5: Optimize the cost–quality–latency triangle. Real products live within constraints. I analyze token budgets, caching strategies, model selection (e.g., small for classification, larger for complex generation), prompt structure, retrieval quality, and function-calling patterns. For agentic AI, I evaluate tool-use correctness and task completion reliability, not just text quality.

Step 6: Close the loop with experimentation. Offline evals get me confidence; online A/B testing validates business impact. I design tests with a clear minimum detectable effect (MDE), guard for novelty bias, and instrument activation, retention, and satisfaction in Amplitude or Pendo. Agent analytics help me pinpoint where users succeed or get stuck.

Step 7: Govern responsibly. I maintain model cards, decision logs, and incident playbooks. For customer-facing assistants, I gate risky actions, log explanations, and add human-in-the-loop escalation. AI risk management isn’t bureaucracy—it’s how we earn trust at scale.

A concrete example: building a customer support assistant. My success metrics include deflection rate, first-contact resolution, median response latency, and safe action rate. The golden dataset blends common queries, billing edge cases, account-specific retrieval checks, and adversarial prompts. Evals measure factuality against a knowledge base, tone alignment with brand guidelines, and safe tool use for CRM integration. Only after passing offline gates do we A/B test deflection and CSAT in production.

Common pitfalls I watch for: overfitting prompts to a tiny test set, relying solely on LLM-as-judge without human calibration, skipping safety tests when latency rises, and treating evaluations as a one-time launch task. The antidote is simple—regularly refresh datasets, diversify eval methods, and wire evals into the same release discipline as any core feature.

The payoff is compounding. With strong AI evals, we ship confidently, reduce incident rates, accelerate iteration, and communicate trade-offs clearly to stakeholders. More importantly, we build products customers trust—because quality isn’t a promise, it’s a practice we can measure every day.

Inspired by this post on Product School.

November 3, 2025
Innovation Strategy in the Age of AI: Proven Playbooks, Real-World Examples, and What Works Now

AI has rewritten the rules of how we create value, and I’ve watched the most resilient organizations treat innovation as a disciplined, outcomes-driven capability—not a one-off initiative. In my role leading product teams, I’ve refined a practical approach that blends rigorous product management with an adaptive AI Strategy so we can ship faster, learn faster, and de-risk smarter.

Learn what an innovation strategy is, how to build one, which types to use, and see real examples that drive meaningful change.

At its core, an innovation strategy is the intentional system that aligns vision, portfolio bets, and execution mechanics to measurable business outcomes. I anchor this in outcomes vs output OKRs, ensuring every experiment, feature, and GTM motion ties to a clear value proposition and reinforces hard-won product-market fit lessons rather than chasing novelty.

I design portfolios around three types of innovation that work well in the age of AI. First, core optimization: drive compounding gains with CI/CD, DORA metrics, and A/B testing to improve activation, retention, and profitability. Second, adjacent expansion: extend value via new segments, channels, or use cases—often enabled by product-led growth tactics like in-app guides and product tours. Third, transformational bets: leverage gen ai and agentic AI to create step-change capabilities while proactively addressing AI risk management, data governance, and privacy-by-design.

Building the strategy starts with empowered product teams and product trios who run continuous product discovery to validate problems before validating solutions. I keep discovery tight with a minimum detectable effect (MDE), instrument the journey with a unified analytics platform, and thread learnings into product roadmapping and sprint planning so we prioritize the smallest, fastest path to decision-quality data.

On the AI front, my operating model combines an AI product toolbox (prompt patterns, evaluation harnesses, and safety rails) with LLMs for product managers to accelerate research, prototyping, and content generation. We standardize CustomGPT workflows where appropriate, define CRM integration and data boundaries early, and adopt a clear build/partner/buy decision tree to protect focus and speed without compromising risk posture.

Here are real patterns that consistently deliver meaningful change. We’ve used generative AI for product prototyping to compress concept validation from weeks to days, then confirmed impact with rapid A/B testing tied to MDE. We’ve implemented agentic AI for customer support triage to reduce response times and free human agents for high-complexity cases, all under strict data governance. And we’ve paired new AI features with a focused go-to-market strategy—clear positioning, sharp onboarding, and outcome-centric messaging—to accelerate user activation.

Measurement makes or breaks innovation. I combine deployment frequency and DORA metrics on the engineering side with activation, retention analysis, and value-moment telemetry on the product side. QBRs vs OKRs alignment keeps leadership focused on outcomes, while experiment scorecards ensure we learn even when results are neutral. The goal is to increase the rate of validated learning across the portfolio, not just ship more.

Governance is a feature, not a tax. We embed threat detection and response, privacy-by-design, and transparent data policies from day one. Stakeholder management and board management stay tight with simple narratives: the bet, the hypothesis, the metric, the MDE, the timeline, and the kill-or-scale criteria. That clarity builds trust and protects speed.

If you’re recalibrating your innovation strategy right now, start small and deliberate: define the outcomes, select one core, one adjacent, and one transformational bet, and wire in learning loops from discovery to delivery. With empowered product teams, disciplined analytics, and a pragmatic AI Strategy, you can move from interesting ideas to durable competitive differentiation—faster and with far less risk.

Inspired by this post on Product School.

November 3, 2025
Upskilling vs. Reskilling: My Playbook to Future‑Proof Teams, Boost Retention, and Ship Faster

In fast-moving product organizations, the skills that got us here won’t carry us through the next wave of change. I’ve learned that future-proofing a team is less about hiring unicorns and more about deliberately growing the skills we already have—and doing it with intention.

Upskilling and reskilling aren’t the same. Knowing the difference can help you build smarter teams and avoid costly missteps in your L&D strategy.

Here’s how I frame it with my leaders: upskilling deepens capability in the role someone already holds—think strengthening discovery, data fluency, or stakeholder management inside an existing lane. Reskilling pivots talent into a new lane—say, a support engineer into data engineering or a product marketer into product operations. Both are essential to building empowered product teams, but they solve different problems.

Deciding which path to take starts with the roadmap and strategy. If your outcomes vs output OKRs signal a need for better execution in current domains, upskilling is the lever. If your strategy introduces new bets—gen AI, privacy-by-design, or a shift to platform architecture—reskilling becomes a strategic investment. I run a simple gap analysis: inventory current skills, map them to near-term outcomes, and identify high-leverage gaps by team.

When I upskill, I prioritize learning in the flow of work. That means structured practice—not just courses—embedded into product discovery, product trios rituals, and code reviews. Shadow sessions, lightweight playbooks, and in-app guides turn new concepts into repeatable muscle memory. For new managers, I add targeted coaching for the IC to manager transition, because role clarity and feedback fundamentals compound quickly.

When I reskill, I treat it like a product launch. There’s a clear charter, staged milestones, a mentor, and onboarding tailored to the new role. I timebox practice projects, use product tours and internal sandboxes, and pair people with forward deployed engineers or senior PMs to accelerate context. The goal is confidence and competence, not just completion.

Measurement keeps the investment honest. I track time-to-productivity during onboarding, deployment frequency and DORA metrics for engineering-heavy paths, and retention analysis for people outcomes. For product and design, I look at decision quality in discovery, reduced cycle time from insight to iteration, and the clarity of written strategy. All of it rolls up into OKRs so learning is tied to business outcomes, not just activity.

The AI wave has made this even more urgent. I’m deliberately upskilling PMs on LLMs for product managers, responsible AI Strategy, and data governance, while reskilling a subset of engineers and analysts into applied gen AI roles. We cover prompt design, evaluation frameworks, and privacy-by-design basics, then ship small internal tools to turn theory into practice.

Culture makes or breaks all of this. I set explicit learning budgets, protect focus time, and model the behavior—publishing my own learning roadmaps and post-mortems. Stakeholder management matters too: I align expectations in QBRs vs OKRs, broadcast progress, and celebrate skill gains the same way we celebrate product wins. When people see that growth is visible and valued, momentum builds.

One example that sticks with me: we reskilled a cross-functional cohort into analytics and experimentation while simultaneously upskilling our existing PMs in discovery synthesis. Within a quarter, decisions got crisper, experiments shipped faster, and collaboration across product trios felt effortless. The compounding effect was unmistakable.

If you’re starting from zero, keep it simple: map the skills you have, the outcomes you need, and choose one upskilling and one reskilling initiative you can deliver in the next 90 days. Make learning visible, measure what matters, and iterate. The teams that master this discipline won’t just keep up—they’ll set the pace.

Inspired by this post on Product School.

November 3, 2025
11 Unconventional Product Management Moves That Supercharge Strategy, Teams, and Impact

I’ve spent years leading product strategy at HighLevel, Inc., and the patterns I rely on don’t always show up in the usual playbooks. In practice, the moves that compound impact are often the quiet ones—unsexy, rigorous, and relentlessly customer-centered.

These product management best practices challenge the norm. Read and you’ll sharpen your strategy and elevate your impact beyond just features.

What follows are the 11 under-discussed habits I return to when the stakes are high and the path is foggy. They help me ship meaningful outcomes, develop empowered product teams, and align our go-to-market strategy without getting trapped in feature theater.

Best practice 1 — Anchor goals to outcomes, not output. I frame “outcomes vs output OKRs” so teams focus on behavior change and business results, not ticket counts. Activation rate, retained revenue, and cycle time beat launch volume every time.

Best practice 2 — Run discovery with product trios. I put design, engineering, and product in the same room early, often with forward deployed engineers. This trio model accelerates product discovery, uncovers risks faster, and builds shared ownership.

Best practice 3 — Decide from first principles, then apply the try do consider framework. I separate points of parity from true differentiation and protect our value proposition. The result: clearer choices, less rework, and a strategy that compounds.

Best practice 4 — Be statistically honest with A/B testing. I size experiments by minimum detectable effect (MDE), guard against peeking, and follow through with retention analysis. This discipline prevents false positives from steering the roadmap.

Best practice 5 — Treat delivery as a learning engine. CI/CD, feature flags, and progressive rollouts let us learn without gambling the brand. I track deployment frequency and DORA metrics to raise quality while increasing the tempo of validated learning.

Best practice 6 — Build a unified analytics backbone. I connect product telemetry to a unified analytics platform and CRM integration so we can see the full funnel. Amplitude analytics, Pendo, and Intercom help us tie behaviors to value realization and inform prioritization.

Best practice 7 — Make onboarding a first-class product. In-app guides, product tours, UX writing, and thoughtful tooltip design shorten time-to-value and lift user activation. This is the quiet lever behind sustainable product-led growth.

Best practice 8 — Systematize stakeholder management. I pair QBRs vs OKRs to balance narrative and numbers, keep board management transparent, and align sequencing through product roadmapping and sprint planning. Clear rituals minimize thrash and build trust.

Best practice 9 — Connect strategy to positioning early. I pressure-test product positioning, clarify our value proposition, and deliberately choose which points of parity to match and which to ignore. This reduces me-too work and sharpens competitive differentiation.

Best practice 10 — Use AI as a responsible force multiplier. I employ LLMs for product managers and gen ai for product prototyping while enforcing privacy-by-design, AI risk management, and strong data governance. The goal is leverage without compromising trust.

Best practice 11 — Write it down to move faster together. I keep crisp decision logs, assumptions, and pre-mortems so empowered product teams can act with context. This simple habit makes onboarding easy, reduces re-litigating, and keeps momentum through change.

When I apply these practices consistently, the team ships less noise and more value. The compounding effect is real: clearer priorities, faster learning cycles, stronger alignment, and a roadmap that tells a coherent story from discovery to adoption.

Inspired by this post on Product School.

November 3, 2025
Inside Our AI-Native Product Training: Accelerating Adoption, ROI, and Measurable Growth

AI is reshaping how we build products, learn new skills, and lead teams. I’ve seen great organizations stall when training lags behind technology. That’s why we rebuilt our approach to product training from first principles—so every team can operate confidently with AI at the core of their product management practice.

Our north star is simple: operationalize AI Strategy for every product manager and cross-functional partner. We designed a learning system that shortens time-to-adoption, amplifies ROI, and links capability-building to clear, measurable outcomes.

Product School transforms product teams into AI-native organizations with training that accelerates adoption, maximizes ROI, and drives measurable growth.

That ambition informs how we design curriculum and delivery. We combine gen AI foundations, LLMs for product managers, applied product discovery, product roadmapping and sprint planning, and product management leadership. The learning experience blends case-based instruction with simulations and real product data so teams practice exactly how they’ll perform.

To ensure knowledge becomes behavior, we embed training directly into product workflows: in-app guides, product tours, onboarding sequences, and user activation loops tied to outcomes vs output OKRs. This closes the gap between knowing and doing, and it makes capability visible in the metrics that matter.

We focus on empowering product teams—clarifying decision rights, elevating accountability, and creating feedback loops that enable faster iteration. When teams own their roadmap and understand the AI building blocks, they move from experimentation to repeatable, scalable value creation.

Measurement is built in from day one. We instrument for adoption, time-to-first-value, feature activation, and ROI attribution, enabling continuous improvement and transparent stakeholder communication. The result is a system that compounds learning into performance.

This is how we’re building AI-native organizations: practical, data-informed, and outcomes-driven. It’s not just training—it’s an operating model that helps teams learn faster, ship smarter, and grow with confidence.

Inspired by this post on Product School.

November 3, 2025