Tag: AI Strategy

Stop Falling for Hollywood Demos: The Unfiltered Truth of Live AI Voice for Support

I’ve sat through countless AI demos, and I’ve learned there are really two kinds: the “Hollywood demo,” which is polished to perfection, and the “real-world demo,” which shows the product raw—imperfections and all. The former dazzles, but the latter is where you discover what’s actually ready for prime time.

Hollywood demos look great, but sometimes need a closer look to make sure what you see is what you’ll get. When I’m evaluating an AI Agent for customer service, I always look past the polish. I’m assessing how well it will handle real-world scenarios—the messy, complex conversations your team deals with every day. That’s especially true on voice, the toughest channel to get right.

Voice is one of the toughest tests of any AI system. It’s not just “chat with speech.” An AI Agent needs to be able to listen, respond, and adapt in real time. Timing, tone, and turn-taking are all part of the product, they shape the experience as much as accuracy or reasoning.

An edited video might sound seamless, but it can’t show how a system behaves in a real support environment—like when a conversation takes an unexpected turn or when it pauses briefly to reason or retrieve data. Those small moments—latency, clarifications, interruptions—are when you see what the AI Agent is really capable of. A real-world demo lets you see and hear how the system actually behaves under real conditions, not in a controlled environment that’s been smoothed out with editing.

That’s why the live Fin Voice demo at Pioneer stood out. The team called Fin live on stage to show the real thing (with real latency and interruptions) so people could understand the product they’d be deploying to their own customers. As a product leader, I appreciate that level of transparency because it mirrors how customers will experience the system in production.

When Paul Adams, Chief Product Officer, demoed Fin Voice at Pioneer, the goal was to show the product exactly as customers experience it. In 90 seconds, Fin verified his identity, retrieved account data, managed an interruption, offered options, completed the workflow, and sent a follow-up email. That’s the kind of end-to-end outcome I look for—fast verification, accurate retrieval, natural pacing, and a closed loop.

Latency. You could hear brief pauses while Fin fetched subscription details and checked backend systems. That wasn’t lag—it was work happening in real time. In voice AI, thoughtful latency that signals reasoning is far better than synthetic speed that collapses under real load.

Natural conversation flow. Fin detected when Paul finished speaking, handled interruptions gracefully, and replied in short, human-like turns. That turn-taking behavior is essential for trust and comprehension in voice customer support.

Awareness and tone. Subtle changes in pacing when Paul laughed or hesitated showed sensitivity to context. Tone control is not a “nice to have” in voice—it’s a core UX capability.

Unscripted conversation design. No rigid IVR menus or fixed paths. Paul spoke naturally, and Fin adapted to resolve his query. That adaptability is what differentiates a true AI Agent from a glorified decision tree.

Those details are the real test. A voice AI Agent that performs well in a live demo is one that will perform well for you and your customers too.

Voice has been one of the most demanding, and rewarding, areas of development for Fin. Since launch, we’ve been expanding what it can do so support leaders can customize how Fin sounds, behaves, and aligns with their brand.

Voice and tone customization: Choose from multiple natural voices, set greetings, and fine-tune how Fin communicates with customers.

Escalation and conversational guidance: Teach Fin to use your terminology, ask clarifying follow-ups, and escalate when needed.

Deployment controls: Manage rollouts, test safely in internal environments, and fine-tune before going live.

Flexible integrations: Connect to any telephony system via call forwarding, and link Fin Voice to backend systems or APIs to take action.

Multilingual capability: Fin Voice now supports 28 languages natively.

Alongside these features, we’ve made big improvements to Fin’s answer quality—the foundation of a great voice experience. When people call, they’re looking for accurate, immediate answers they can trust.

So we’ve focused on three key areas: low latency, which is down roughly 30–40% since launch; clarification flow, so Fin asks smart follow-up questions to reduce back and forth and improve resolution rates; and voice-specific answer structure, so Fin delivers information in shorter sentences with pacing designed for listening.

Together, these improvements mean customers get the highest-quality answers as quickly as possible, resulting in more resolutions and better experiences.

Running a live demo always carries risk because things can go wrong. But that’s also why it matters—because that’s how customers experience it too. Support leaders stake their reputation on the systems they choose, so the only way to understand what you’re putting in front of your customers is to see it under real conditions.

When you see Fin in a demo, you’re seeing the same system that runs in production. Real-world demos take more effort and don’t always go perfectly, but they show what’s real—and that’s exactly what you need to evaluate before you deploy voice AI at scale.

Inspired by this post on The Intercom Blog.

November 11, 2025
From Sketch to Clickable Demo: My AI Prototyping Playbook to Build Apps in Hours

I’ve spent much of my career compressing the distance between a napkin sketch and something real customers can touch. At HighLevel, my product teams use generative AI to validate ideas faster, reduce risk earlier, and win stakeholder trust with evidence instead of slides. The goal isn’t to be flashy—it’s to be precise, testable, and repeatable.

Today, you can build it before you pitch it. AI prototyping can turn ideas into clickable demos in hours. Here are some tools to try and steps to follow.

I start every AI prototyping sprint by sharpening the problem statement and the outcome we care about. That means being explicit about the target user, jobs-to-be-done, and the riskiest assumptions. I define a minimum detectable effect (MDE) and tie it to outcomes vs output OKRs so everyone aligns on what “good” looks like before we touch a tool.

From there, I move from sketch to interface. I capture a rough flow (whiteboard, tablet, or even paper) and generate UI variations with my AI product toolbox—tools that translate structure into components and screens. I’ll iterate on information hierarchy and copy until the narrative supports the core job, borrowing techniques from UX writing. For product managers leaning into LLMs for product managers, this phase is about speed to feedback, not perfection.

Next, I wire data and logic. I connect a lightweight backend or spreadsheet, stitch in a CRM integration if needed, and add LLM calls through a ChatGPT connector or Claude Code. If the concept benefits from multi-step autonomy, I introduce agentic AI to orchestrate tasks across APIs. CustomGPT workflows help me encapsulate business rules so the demo behaves consistently in user paths we care about.

Governance is not optional at this stage. I apply privacy-by-design defaults, document data governance decisions, and run a quick AI risk management pass: input validation, prompt safety, rate limits, and fallback responses. This keeps the prototype credible and prevents false positives from polluting stakeholder perception.

With a click-through in hand, I instrument the experience so learning compounds. I drop in Amplitude analytics to track activation, task completion, and drop-off, and set up simple A/B testing when there’s a meaningful design or copy choice. This makes the prototype a learning vehicle, not just a demo.

Then I get it in front of users—fast. Five targeted conversations will beat fifty internal opinions. I run structured product discovery interviews, observe time-to-value, and capture objections. This is where empowered product teams shine: we make changes in real time, re-run the flow, and document what moves the needle for product-led growth.

When speed matters, I use a four-hour cadence: Hour 1 for problem framing and MDE; Hour 2 for sketch-to-UI generation; Hour 3 for data wiring and AI logic; Hour 4 for instrumentation and user walkthroughs. By the end, we have a clickable demo, preliminary analytics, and a clear decision on whether to advance, pivot, or park.

Finally, I translate insights into a concise artifact: the hypothesis we tested, the signal we observed, the trade-offs we made, and the next sprint plan for product roadmapping and sprint planning. The point is not to be right on the first try; it’s to learn precisely, cheaply, and quickly enough to invest with conviction.

If you adopt this approach, you’ll find that stakeholder management becomes easier, team energy rises, and your roadmap earns credibility. Build it before you pitch it, and let real interactions—not wishful thinking—do the heavy lifting.

Inspired by this post on Product School.

November 10, 2025
Win AI Search: Proven Playbook to Get Your Startup Recommended by ChatGPT & Perplexity

AI search is quickly becoming the new homepage for startups. When a buyer asks a model for the best tools, they often take the short list at face value. I treat this moment as a product surface I can influence with strategy, content, structure, and distribution—much like any other go-to-market channel.

Early on, I set a simple objective for my team and me: "Learn how LLMs like ChatGPT and Perplexity decide which startups to recommend and what signals help a brand get discovered in AI search." That sentence became our north star for experiments, instrumentation, and content architecture.

Here is the mental model that consistently holds up in practice. Large language models synthesize answers from a knowledge graph built from crawled content, citations, and high-signal sources. They weight consensus, clarity, recency, authority, and machine-readability. I don’t pretend to know the internals, but across hundreds of tests, the same patterns correlate with being surfaced and cited.

First, I make our entity unambiguous. I standardize the company name, product names, and leadership bios across the site and external profiles. I implement Organization and Product markup with schema.org and link out with sameAs to authoritative profiles like LinkedIn, Crunchbase, GitHub, and key directory listings. The goal is to collapse ambiguity so AI search knows exactly who we are and which claims are attributable to us.

Next, I publish definitive, answer-first pages. For every core query—what we do, who it’s for, outcomes, differentiators, pricing, comparisons, and integrations—I ship a page that leads with a crisp summary, then supports it with evidence, examples, and plain language. I include Q&A sections, realistic use cases, and named case studies so models can quote and ground responses in verifiable facts.

I then make the site maximally machine-readable. I add schema.org for SoftwareApplication, Product, FAQPage, and HowTo where relevant. I keep titles, H1/H2 structure, internal links, and metadata descriptive and consistent. I expose last-modified dates, maintain an XML sitemap, and keep a visible changelog and release notes. Freshness matters—Perplexity, in particular, tends to privilege recent, well-cited material when answering time-sensitive questions.

Citations are non-negotiable. I earn credible mentions on third-party properties, analyst lists, comparison pages, and customer reviews. I prioritize authoritative placements over volume, then make sure our site references those sources to reinforce the signal. When Perplexity cites our page alongside a respected third-party review, our inclusion rate in answers rises noticeably.

I also design for developers, buyers, and machines at once. That means clean docs, integration pages, and transparent security and trust content. Clear API references, integration guides, and reliability notes give models concrete artifacts to summarize. Pricing, privacy, and support policies reduce uncertainty and increase the likelihood that an answer will include us.

Measurement turns this from a hunch into a system. I run controlled content experiments, track minimum detectable effect on discovery and mentions, and instrument referral patterns from AI assistants when citations appear. I monitor which prompts surface our brand, which sources are cited, and which pages are repeatedly used as references. When we move a KPI, we codify the pattern into our playbook and scale it.

Trust is the compounding advantage. I maintain a transparent trust center, privacy-by-design posture, and clear data governance practices. I remove vague claims, back up benefits with evidence, and keep all performance or security statements auditable. Models tend to lift brands that feel low-risk, well-documented, and widely corroborated.

If you want a fast start, here’s the checklist I rely on. Standardize your entity and ship schema.org. Publish answer-first pages for core jobs-to-be-done, comparisons, and integrations. Earn authoritative third-party citations and reference them. Keep release notes, changelogs, and dates current. Instrument AI discovery and iterate based on what gets cited. Do this consistently, and your startup earns a fair shot at being recommended when buyers ask AI for the best options.

Inspired by this post on Amplitude – Best Practices.

November 7, 2025
Prototypes vs Products: How I De-risk Ideas Fast and Ship Reliable Value at Scale

Note: This is part of the product creator series of articles, based on the overview article, The Era of the Product Creator. This series is for anyone who wants to create a successful product—whether or not you’ve had formal training or experience in product management, product design, or engineering. Over the years, I’ve watched smart teams stumble because they treated a prototype like a product. The distinction is simple but vital: prototypes exist to learn; products exist to earn trust by delivering value reliably at scale. When we blur that line, we ship avoidable risk to customers and slow ourselves down later with rework. When I build a prototype, I’m testing assumptions as quickly and cheaply as possible. It might be a clickable Figma mock, a Wizard‑of‑Oz demo, or a quick script stitching together a ChatGPT connector with a CustomGPT workflow. It’s intentionally disposable. I expect missing edge cases, fake data, hand‑waving on latency, and limited attention to security or privacy. The only goal is to answer the riskiest questions fast. A product is a promise. It’s hardened for reliability, performance, security, and privacy‑by‑design. It’s observable with real analytics, supports CI/CD and rollback, meets accessibility guidelines, and can be maintained by empowered product teams. It has clear SLAs, incident management runbooks, and instrumentation that lets me track outcomes vs output OKRs and DORA metrics. Keeping prototypes and products separate makes us faster and safer. Prototypes accelerate discovery; products operationalize value. If I catch myself “polishing” a prototype, I pause and either discard it or define the path to production with the right engineering rigor, data governance, and stakeholder management. Here’s how I decide. In prototype mode, I timebox learning to days, not weeks, and focus on a single risky assumption—value, usability, or feasibility. I validate through qualitative research and usability tests, not vanity metrics. To graduate to product work, I require a crisp problem statement, evidence of problem‑solution fit, a technical plan for scale and observability, a privacy and threat modeling review, and a measurement plan (including minimum detectable effect) for upcoming A/B testing. AI adds new wrinkles. For gen AI and agentic AI, I evaluate model behavior offline before exposing anything to customers. That includes prompt design, context window management, guardrails to minimize hallucinations, and clear fallback strategies. I define red‑team scenarios, logging for auditability, and policies for data retention and encryption as part of AI risk management. A recent example: we prototyped an agent workflow in a day that felt magical in demos. We resisted the urge to ship. Instead, we added authentication, rate limiting, PII redaction, human‑in‑the‑loop review, observability, and in‑app guides and product tours for onboarding. Only then did we move to a limited release with a well‑defined go‑to‑market strategy and support readiness. One more trap to avoid: calling a prototype an MVP. An MVP is still a product—minimal in scope but complete enough to deliver value, gather trustworthy data, and support customers. If you wouldn’t put your name on it or support it in production, it’s a prototype, not an MVP. If you’re a product creator, align your product trios around this discipline. Use prototypes to learn quickly in discovery, and use products to deliver outcomes in delivery. That mindset protects customer trust, speeds iteration, and moves you toward product‑market fit with far less waste.

Inspired by this post on SVPG.

November 7, 2025
AI Context Pulling Playbook: How I Make Humans + LLMs Collaborate for Sharper Product Outcomes

Over the last few years, I’ve learned that the fastest path to better product outcomes isn’t “more prompts,” it’s better context. When I combine thoughtful product judgment with disciplined context window management, LLMs become true partners—accelerating discovery, sharpening strategy, and improving execution.

Learn a new way in which product professionals can collaborate with AI to get even better results on their projects.

When I say “AI context pulling,” I’m talking about the intentional process of assembling, structuring, and compressing the right product evidence—customer insights, metrics, constraints, and goals—so an LLM can reason effectively. For LLMs for product managers, the win is simple: by feeding the right inputs and framing the right outcomes, we turn generic AI into a strategic co-pilot for Product Management and AI Strategy.

I start by clarifying intent through outcomes vs output OKRs. Before I ask an LLM to ideate, critique, or plan, I anchor it in the product problem, the measurable outcomes we seek, and the guardrails we cannot cross (risk, privacy, brand). This keeps the collaboration focused and aligned with stakeholder management expectations.

Next, I build a tight “context packet.” I pull customer quotes from discovery notes, usage trends from our unified analytics platform and Amplitude analytics, funnel friction from Intercom transcripts, and commercial constraints from HubSpot data. Then I summarize, deduplicate, and highlight contradictions—so the model gets the signal, not the noise.

From there, I run an agentic AI workflow. In my AI product toolbox, I use CustomGPT workflows with specialized roles: a Summarizer (compress evidence), a Strategist (propose options), and a Skeptic (stress-test assumptions). This agentic AI pattern reduces blind spots and produces artifacts I can share with empowered product teams and executives.

I then bring the insights into a product trios forum (PM, Design, Engineering). We iterate on problem framing, explore solution narratives, and translate options into product roadmapping and sprint planning. The LLM helps us rapidly compare trade-offs, highlight dependencies, and craft crisp decision memos.

Execution still demands rigor. We validate with A/B testing when appropriate, size our minimum detectable effect (MDE), and monitor activation and retention signals. The model helps generate experiment variants and risk checklists, but we own judgment, ethics, and the call to ship.

Governance matters. I treat data governance and privacy-by-design as first-class constraints in every prompt, context packet, and workflow. Clear boundaries make collaboration safer—and paradoxically, more creative—because the LLM spends its cycles inside a well-defined sandbox.

Here’s a simple example: when we explored a new onboarding flow, I fed the model a compressed brief (user segments, friction points, support tickets, and conversion deltas). It returned three viable patterns, each with hypotheses and measurement plans. Our trio refined them, launched a controlled test, and used LLM-powered analysis to summarize learnings for leadership. The result: faster clarity, better decisions, and a tighter feedback loop.

The promise of AI context pulling isn’t that AI replaces product judgment—it’s that it elevates it. With the right structure, LLMs help us think more clearly, decide faster, and build what truly matters. If you’re ready to try this, start small: define an outcome, curate a context packet, and run a single agentic loop with your team. The compounding returns will surprise you.

Inspired by this post on Pendo – Perspectives.

November 6, 2025
How Incident.io’s AI SRE Diagnoses, Hypothesizes, and Fixes Outages in Slack at Record Speed

When your site goes down, every second counts. I’ve lived that reality across multiple product lines, and the difference between a five-minute blip and a two-hour outage is felt by customers, engineers, and the business. That’s why I’ve been closely following how Incident.io has evolved from coordination during chaos to intelligent, proactive response.

Now, they’re building something new: an AI SRE that can actually help diagnose and respond to incidents. As someone who thinks deeply about reliability, velocity, and customer trust, that promise hits the intersection of AI Strategy, product management leadership, and operational excellence.

I recently spent time with Lawrence Jones, Founding Engineer at Incident.io and Ed Dean Product Lead for AI at Incident.io, digging into how their team is teaching AI to think like a site reliability engineer. They shared how they went from simple prototypes that summarized incidents to a multi-agent system that forms hypotheses, tests them, and even drafts fixes—all from within Slack.

Here’s what stood out to me first: AI’s biggest impact comes from compressing time—identifying causes minutes instead of hours. In practice, that means fewer cycles lost to paging the wrong on-call, clearer paths to root cause, and faster recovery—without cutting humans out of the decision loop.

Equally important is deciding where automation belongs. The team’s approach aligns with how I evaluate high-risk workflows: Identify which parts of debugging can safely be automated. Combine retrieval, tagging, and re-ranking to find relevant context fast. Use post-incident “time travel” evals to measure how well their AI performed. Balance human trust and AI confidence inside high-stakes workflows. The human remains accountable; the AI accelerates context, options, and execution.

On the technical side, the retrieval choices were refreshingly pragmatic. Retrieval-augmented reasoning still benefits from simplicity: deterministic tagging and re-ranking often beat complex vector setups. I’ve seen the same in production: start with crisp, deterministic signals, then layer embeddings where they truly add value. This keeps systems debuggable and stable as you scale.

The interface choices matter just as much as the models. “Slack as the interface for human-AI collaboration” puts the agent where incidents already live, reducing friction and increasing adoption. Under the hood, they’ve been pragmatic with “PGVector and Postgres for retrieval experiments”, using “RAG (Retrieval-Augmented Generation)” and “Multi-agent orchestration” to chain context gathering, hypothesis formation, and action proposals. The north star is compelling: “AI as your company’s immune system”.

What impressed me operationally was the rigor around evaluation. Post-incident “time travel” evals let teams score AI accuracy after they know what really happened. That’s the standard we should all adopt: test the agent against reality, not just synthetic prompts, and feed those learnings back into prompts, tools, and guardrails.

Trust is the currency in incidents, so the product surface must reflect uncertainty with care. Building trust in AI isn’t just about precision—it’s about showing reasoning and uncertainty in ways humans understand. In other words, show the chain of thought as a structured artifact (signals considered, hypotheses rejected, evidence gathered), expose confidence bands, and always make it easy for humans to override or guide.

From a workflow standpoint, the investigation loop mirrors seasoned SRE practice: fast scoping, parallel checks and data sources, building hypotheses and refining findings, then proposing remediations paired with the context that justifies them. Human-agent collaboration here is not a handoff—it’s a tight copilot loop where the agent gathers, tests, and drafts, and the human confirms, prioritizes, and executes.

For platform and security leaders, this approach blends speed with safety. Clear permissions, auditable actions, blast-radius constraints, and CI/CD integration keep the AI inside defined guardrails while still delivering material acceleration. The payoff is higher deployment frequency without compromising reliability—because detection, triage, and rollback become faster and more repeatable.

My takeaway as a product leader: this is a blueprint for agentic AI in mission-critical workflows. Start in the tools users live in (Slack), nail retrieval with deterministic foundations, model the expert’s playbook (not just their summaries), and make evaluation a first-class part of the product. Do that well, and the AI goes from assistant to teammate—conservative when it should be, bold when the evidence supports it, and always legible to the humans in the loop.

The momentum around Incident.io’s AI SRE suggests where we’re headed next: deeper integrations, broader coverage across service catalogs, and richer automations that remain transparent and controllable. For teams investing in reliability, this is the moment to operationalize agentic AI—measured, auditable, and designed for trust—so you can move faster when it matters most.

Inspired by this post on Product Talk.

November 6, 2025
Turn Claude Code Into a Trusted Teammate: My 3-Layer Memory System You Can Copy

"Can you critique the landing page for my new Story-Based Customer Interviews course?" That simple ask used to kick off hours of back-and-forth where I fed an AI the same context over and over—only to get generic feedback that wouldn’t land with my audience or fit my products. As a product leader, that inefficiency was unacceptable; as a writer, it was just plain frustrating.

Not anymore. Today, Claude not only critiques my work, it helps me produce it. It generates marketing copy—in my voice. It helps me write blog posts. It knows what search terms are relevant to my business and helps me optimize my articles for SEO and now AEO. It helps me with competitive research, academic research, and discovery research. And it does all of this with little prompting from me.

I don’t upload files to a web-based project. I don’t manage elaborate prompt libraries. I don’t repeat myself. I ask for help and Claude knows exactly what to do. The shift happened when I learned how to give Claude Code a memory. Claude now knows who my target customer is, the key value propositions I focus on, the specific opportunities each product addresses, my revenue model, my marketing channels, and so much more.

A dark-themed strategy slide for the post Stop Repeating Yourself: Give Claude Code a Memory, showing how to lead with a CLAUDE.md glossary page, write clearly for nontechnical readers, and link glossary and article to boost discovery and engagement.

With that memory, I consistently get high-quality output tailored to my audience and aligned to my products and services. I don’t retype the same context; Claude just remembers. In this article, I’ll show you exactly how I set up that memory. It relies on Claude Code (which requires a Pro subscription), and it’s worth it. If you’re new to Claude Code, start with "Claude Code: What It Is, How It’s Different, and Why Non-Technical People Should Use It."

Here’s the underlying problem: with large language models, every conversation starts from scratch. Yes, ChatGPT can remember some things and Claude can search past conversations, but practically speaking each new thread wipes the slate clean. If I were working on a new landing page, I’d normally need to upload target customer context, product details, primary and secondary value propositions, FAQ questions and answers, plus testimonials and logos for social proof—every single time.

Start fast with Claude’s home screen: Sonnet 4.5 is ready, and quick actions for writing, learning, and coding sit beneath a clean prompt box—ideal for showing how memory cuts repetition and streamlines daily development.

Projects in web-based tools help a bit, but they introduce a new dilemma. When I move to the next landing page targeting the same customer but a different product and value proposition, do I start a new Project (tedious) or keep expanding the old one (which muddies the context window and degrades output quality)? The good news: Claude Code solves this by giving the model a precise, durable memory without overloading any single conversation.

Claude Code can read files on my local machine, which is an understated superpower. I use those files to create a persistent, reusable memory that works across all chats and Projects. Files can be mixed and matched, so I give Claude exactly what it needs for the task at hand—and nothing more. For a first landing page, I reference the target customer and the relevant product; for the second, I reuse the same target customer file and point to the new product file.

Dark-mode Notes screenshot captures Claude Code in action: it fetches producttalk.org, reads context files, and delivers a concise homepage evaluation—showing how memory streamlines repeated analysis tasks.

When you give an LLM the exact right context, output quality jumps. More context only helps if it’s the right context. For a landing page, Claude needs to know about the current product and perhaps related products for differentiation—but it doesn’t need to know about unrelated offerings. Structure your memory so Claude gets precisely what’s required.

Once I did this, Claude shifted from “intern who needs handholding” to trusted advisor and capable teammate. It doesn’t guess at my value propositions—I’ve already told it. It writes in my voice because it has my writing guide and samples. It knows who owns which course and which use cases map to which features. The setup takes a bit of upfront work, but it compounds: update a file when something changes and you’re done. Most of this information already lives in your system; the trick is making it easy for Claude to use.

See how Claude Code stops repetition: global and project CLAUDE.md files, plus custom reference docs, flow into the editor so the assistant remembers your preferences and context while you code and run commands.

Because the files live on my machine, I own the system. No vendor or device lock-in. I decide when and who to share with. I can work with Claude on one project and ChatGPT on another—both can rely on the same file-based memory strategy. It’s an AI strategy that scales with product discovery, accelerates go-to-market content, sharpens competitive differentiation, and supports product-led growth.

Here’s how I design the memory: I use three layers. Claude Code already encourages global preferences and Project-specific instructions, but the third layer—reference context—is where the real power lives.

Peek inside a markdown playbook for Claude Code: concise rules for writing, multi-level planning, and clear feedback that turn repeated reminders into reusable memory and smoother, faster coding sessions.

Layer 1: Global Preferences (Always on). The first time I launched Claude Code, I created a CLAUDE.md file at ~/.claude/CLAUDE.md. This is where I keep the cross-project rules of engagement—how I like to work with Claude. Mine includes: Always create a plan for me to review before you start any work; Give me direct feedback (no hedging, no gentle suggestions); Use bullet points for summaries; Ask clarifying questions one at a time so I can give complete answers; No emojis unless I explicitly ask for them. Claude Code automatically loads this file at the start of every session, so I never restate my preferences.

Layer 2: Project-Specific Instructions. Different projects have different rules. In my writing workspace, the Project CLAUDE.md sets the roles (I’m the primary writer; Claude is my thought partner and editor), defines a multi-round review flow (content → structure → accuracy → typos), prioritizes human readability over SEO, and points to my writing style guide. In my task management system, I include how my Trello integration works, file naming conventions for tasks, and how to process research papers into summaries. In my code projects, I specify the technology stack (Node.js vs. Python), testing framework (Jest for Node.js, pytest for Python), code style and conventions, project architecture and directory structure, and which dependencies and libraries to use. Each project directory has its own CLAUDE.md, and Claude automatically loads the relevant file when I’m working there.

Peek inside a markdown playbook for collaborating with Claude—covering session setup, roles, editorial standards, and research steps—to show how saved instructions create consistent results without repeating yourself.

Layer 3: Reference Context (Pull as Needed)—the real power. LLMs have a context window—a limit to how much they can process at once. Even within that limit, loading too much degrades performance due to “context rot.” The remedy is ruthless context management: small, targeted files that load only when needed. Keep CLAUDE.md files concise and focused on rules and workflows. For detailed knowledge, create separate reference files and list them in your CLAUDE.md so Claude knows they exist and when to fetch them. When I ask for help creating a landing page, Claude knows to use my business profile, the product file, and my target customers context.

Here’s what most people miss: you don’t cram everything into global or Project files. You maintain small, reusable reference files that Claude only loads on demand. In my walkthrough, I share exactly which context files I created and why; how I got Claude Code to help me create them; how I break them into small, reusable components so Claude gets precisely what it needs; how I keep everything up to date; and step-by-step instructions so you can set up a similar memory system.

Three project notes funnel into Claude Code, turning reusable context into working output. This visual shows how saving key docs as memory lets the AI pick up where you left off and skip repetitive prompting across tasks.

Let’s dive in.

Inspired by this post on Product Talk.

November 5, 2025
AI at Home, Impact at Work: Experiments That Supercharged My Product Leadership

I recently tuned into an insightful All Things Product episode featuring Teresa Torres and Petra Wille on how experimenting with AI in everyday life sharpens how we build AI-powered products at work. The core premise resonated deeply with my AI Strategy: low-stakes, personal experiments accelerate confidence, clarify limitations, and build an AI product toolbox we can bring into the office with rigor.

If you want to dive in, you can listen on Spotify or Apple Podcasts. I found the conversation especially relevant for product trios and anyone shaping LLMs for product managers in high-stakes environments.

The idea is simple but powerful: when I prototype with AI at home—where the stakes are low—I learn faster, make safer mistakes, and internalize critical product patterns. Over time, those patterns transfer directly to work: tighter context management, sharper bias awareness, clearer human-in-the-loop guardrails, and a more nuanced view of when to use AI as a thought partner versus when to consider agentic AI.

In my own practice, I’ve mirrored many of the scenarios discussed: using ChatGPT by OpenAI to plan meals, analyze public data sets like school budgets, and even sanity-check real estate evaluations. These seemingly mundane tasks are fertile ground for learning about context window limits, hallucination (artificial intelligence), AI bias, and privacy-by-design trade-offs. Each experiment helps me craft better prompts, structure data for clarity, and decide when a human review step is non-negotiable—core habits for AI risk management.

At work, I treat AI as a thought partner for writing, research synthesis, and contract review. I also explore when and how to responsibly evolve toward agentic AI for repeatable workflows. The distinction matters: a thought partner augments judgment; an agent automates execution. Building the right scaffolding—data governance, auditability, constraints, and escalation paths—ensures we unlock speed without compromising safety.

Three lines from the episode stayed with me: “I’m trying to write things that only I can write — that’s my guiding writing light right now.” — Teresa. “The more we use AI, the more we learn what it’s good at, what it’s not good at, and where context becomes a limitation.” — Teresa. “It’s a safer playground — we can build our toolbox at home before bringing those lessons to work.” — Petra. These are practical north stars for product management leadership in the GenAI era.

For anyone getting started, here’s what worked for me: begin with “low-stakes” personal experiments, write down your prompts and outcomes, and reflect on failure modes. Treat each activity as product discovery: What problem am I solving? What outcome matters? What data and context does the model need? Which decisions must stay human-in-the-loop? This discipline builds an AI product toolbox you can confidently apply to real customer problems.

I also keep a running toolkit of references and tools that inform my practice: Context window as a concept helps me size and sequence information. Visual and video tools like Midjourney and Sora expand how I think about multimodal experiences. I rotate between Claude by Anthropic and ChatGPT by OpenAI depending on task fit, and I’ve used Claude Code when I need structured assistance with code review. For knowledge capture and workflow, Readwise and Ghost help me structure insights and ship content.

If you want more structured learning paths, I found Josh Seiden’s Learn AI With Me, A 30-Day Sprint to be a practical primer, and the broader community conversation at Product at Heart Conference is invaluable. For a deeper grounding in risk, I recommend reviewing topics like Hallucination (artificial intelligence), AI bias, and Agentic AI—and revisiting the complementary episode, Context is King.

I’d love to hear how you’re experimenting: Where have you seen AI meaningfully reduce toil? Where does it still struggle? How are you balancing creativity, data safety, and compliance as you scale? Drop a comment below and let’s compare notes—especially on patterns that help product trios move faster without sacrificing trust.

Bottom line: start small at home, carry lessons into the office, and build with curiosity and intentionality. That’s how we level up our product discovery, sharpen our value proposition, and lead teams confidently through the GenAI transition.

Inspired by this post on Product Talk.

November 4, 2025
Mastering AI Evals: The Essential Product Manager Skill to Ship Safer, Smarter AI

In every AI-powered product I ship, evaluation is the difference between a compelling demo and a dependable customer experience. AI evaluation isn’t a nice-to-have; it’s a core product management competency that shapes quality, safety, and business outcomes from the first prototype to scale.

When I talk about AI evaluation, I mean a disciplined, repeatable way to measure model behavior across quality, safety, reliability, latency, and cost. Gen AI has changed the cadence of product decisions—models evolve weekly, prompts drift under real-world load, and edge cases multiply. Without rigorous evals, we risk shipping unpredictability.

My goal in this piece is simple: “Dive deep into AI evals, why they matter for PMs today, and how to master them with clear steps, examples, and best practices.” If you’re leading product strategy for LLMs, agentic AI, or applied AI features, this is the playbook I rely on.

Why this matters now: customers don’t judge AI by benchmarks, they judge by trust—did it help me, was it safe, was it fast? Strong AI evals let me set outcomes vs output OKRs, quantify risk, and make transparent trade-offs between accuracy, latency, and cost. They also give engineering and design clear guardrails to move fast without breaking user trust.

Step 1: Define the product problem and success metrics. I start by tying AI metrics to business outcomes—resolution rate, deflection rate, revenue lift, time-to-value—and include model-centric measures like hallucination rate, harmful content rate, latency, and token cost. This keeps experiments anchored to impact, not just model scores.

Step 2: Build a high-signal golden dataset. I curate real, anonymized user prompts from discovery and support channels, then add adversarial and long-tail cases. For generative tasks, I create rubric-based criteria for correctness, helpfulness, tone, and safety. This dataset becomes my regression suite as prompts, RAG pipelines, or models change.

Step 3: Choose the right evaluation methods. I combine deterministic unit tests for rules with LLM-as-judge scoring, pairwise preference tests for prompt variants, human review for critical flows, and red teaming for safety. I also apply privacy-by-design and strong data governance to ensure eval data handling meets compliance and customer expectations.

Step 4: Operationalize with CI/CD. Evals run automatically on every prompt, retrieval, or model update, with pass/fail gates and alerting. I track results in a unified analytics platform so product, engineering, and go-to-market teams see the same truth. If a change regresses key thresholds, we pause rollout or roll back.

Step 5: Optimize the cost–quality–latency triangle. Real products live within constraints. I analyze token budgets, caching strategies, model selection (e.g., small for classification, larger for complex generation), prompt structure, retrieval quality, and function-calling patterns. For agentic AI, I evaluate tool-use correctness and task completion reliability, not just text quality.

Step 6: Close the loop with experimentation. Offline evals get me confidence; online A/B testing validates business impact. I design tests with a clear minimum detectable effect (MDE), guard for novelty bias, and instrument activation, retention, and satisfaction in Amplitude or Pendo. Agent analytics help me pinpoint where users succeed or get stuck.

Step 7: Govern responsibly. I maintain model cards, decision logs, and incident playbooks. For customer-facing assistants, I gate risky actions, log explanations, and add human-in-the-loop escalation. AI risk management isn’t bureaucracy—it’s how we earn trust at scale.

A concrete example: building a customer support assistant. My success metrics include deflection rate, first-contact resolution, median response latency, and safe action rate. The golden dataset blends common queries, billing edge cases, account-specific retrieval checks, and adversarial prompts. Evals measure factuality against a knowledge base, tone alignment with brand guidelines, and safe tool use for CRM integration. Only after passing offline gates do we A/B test deflection and CSAT in production.

Common pitfalls I watch for: overfitting prompts to a tiny test set, relying solely on LLM-as-judge without human calibration, skipping safety tests when latency rises, and treating evaluations as a one-time launch task. The antidote is simple—regularly refresh datasets, diversify eval methods, and wire evals into the same release discipline as any core feature.

The payoff is compounding. With strong AI evals, we ship confidently, reduce incident rates, accelerate iteration, and communicate trade-offs clearly to stakeholders. More importantly, we build products customers trust—because quality isn’t a promise, it’s a practice we can measure every day.

Inspired by this post on Product School.

November 3, 2025
Innovation Strategy in the Age of AI: Proven Playbooks, Real-World Examples, and What Works Now

AI has rewritten the rules of how we create value, and I’ve watched the most resilient organizations treat innovation as a disciplined, outcomes-driven capability—not a one-off initiative. In my role leading product teams, I’ve refined a practical approach that blends rigorous product management with an adaptive AI Strategy so we can ship faster, learn faster, and de-risk smarter.

Learn what an innovation strategy is, how to build one, which types to use, and see real examples that drive meaningful change.

At its core, an innovation strategy is the intentional system that aligns vision, portfolio bets, and execution mechanics to measurable business outcomes. I anchor this in outcomes vs output OKRs, ensuring every experiment, feature, and GTM motion ties to a clear value proposition and reinforces hard-won product-market fit lessons rather than chasing novelty.

I design portfolios around three types of innovation that work well in the age of AI. First, core optimization: drive compounding gains with CI/CD, DORA metrics, and A/B testing to improve activation, retention, and profitability. Second, adjacent expansion: extend value via new segments, channels, or use cases—often enabled by product-led growth tactics like in-app guides and product tours. Third, transformational bets: leverage gen ai and agentic AI to create step-change capabilities while proactively addressing AI risk management, data governance, and privacy-by-design.

Building the strategy starts with empowered product teams and product trios who run continuous product discovery to validate problems before validating solutions. I keep discovery tight with a minimum detectable effect (MDE), instrument the journey with a unified analytics platform, and thread learnings into product roadmapping and sprint planning so we prioritize the smallest, fastest path to decision-quality data.

On the AI front, my operating model combines an AI product toolbox (prompt patterns, evaluation harnesses, and safety rails) with LLMs for product managers to accelerate research, prototyping, and content generation. We standardize CustomGPT workflows where appropriate, define CRM integration and data boundaries early, and adopt a clear build/partner/buy decision tree to protect focus and speed without compromising risk posture.

Here are real patterns that consistently deliver meaningful change. We’ve used generative AI for product prototyping to compress concept validation from weeks to days, then confirmed impact with rapid A/B testing tied to MDE. We’ve implemented agentic AI for customer support triage to reduce response times and free human agents for high-complexity cases, all under strict data governance. And we’ve paired new AI features with a focused go-to-market strategy—clear positioning, sharp onboarding, and outcome-centric messaging—to accelerate user activation.

Measurement makes or breaks innovation. I combine deployment frequency and DORA metrics on the engineering side with activation, retention analysis, and value-moment telemetry on the product side. QBRs vs OKRs alignment keeps leadership focused on outcomes, while experiment scorecards ensure we learn even when results are neutral. The goal is to increase the rate of validated learning across the portfolio, not just ship more.

Governance is a feature, not a tax. We embed threat detection and response, privacy-by-design, and transparent data policies from day one. Stakeholder management and board management stay tight with simple narratives: the bet, the hypothesis, the metric, the MDE, the timeline, and the kill-or-scale criteria. That clarity builds trust and protects speed.

If you’re recalibrating your innovation strategy right now, start small and deliberate: define the outcomes, select one core, one adjacent, and one transformational bet, and wire in learning loops from discovery to delivery. With empowered product teams, disciplined analytics, and a pragmatic AI Strategy, you can move from interesting ideas to durable competitive differentiation—faster and with far less risk.

Inspired by this post on Product School.

November 3, 2025
Upskilling vs. Reskilling: My Playbook to Future‑Proof Teams, Boost Retention, and Ship Faster

In fast-moving product organizations, the skills that got us here won’t carry us through the next wave of change. I’ve learned that future-proofing a team is less about hiring unicorns and more about deliberately growing the skills we already have—and doing it with intention.

Upskilling and reskilling aren’t the same. Knowing the difference can help you build smarter teams and avoid costly missteps in your L&D strategy.

Here’s how I frame it with my leaders: upskilling deepens capability in the role someone already holds—think strengthening discovery, data fluency, or stakeholder management inside an existing lane. Reskilling pivots talent into a new lane—say, a support engineer into data engineering or a product marketer into product operations. Both are essential to building empowered product teams, but they solve different problems.

Deciding which path to take starts with the roadmap and strategy. If your outcomes vs output OKRs signal a need for better execution in current domains, upskilling is the lever. If your strategy introduces new bets—gen AI, privacy-by-design, or a shift to platform architecture—reskilling becomes a strategic investment. I run a simple gap analysis: inventory current skills, map them to near-term outcomes, and identify high-leverage gaps by team.

When I upskill, I prioritize learning in the flow of work. That means structured practice—not just courses—embedded into product discovery, product trios rituals, and code reviews. Shadow sessions, lightweight playbooks, and in-app guides turn new concepts into repeatable muscle memory. For new managers, I add targeted coaching for the IC to manager transition, because role clarity and feedback fundamentals compound quickly.

When I reskill, I treat it like a product launch. There’s a clear charter, staged milestones, a mentor, and onboarding tailored to the new role. I timebox practice projects, use product tours and internal sandboxes, and pair people with forward deployed engineers or senior PMs to accelerate context. The goal is confidence and competence, not just completion.

Measurement keeps the investment honest. I track time-to-productivity during onboarding, deployment frequency and DORA metrics for engineering-heavy paths, and retention analysis for people outcomes. For product and design, I look at decision quality in discovery, reduced cycle time from insight to iteration, and the clarity of written strategy. All of it rolls up into OKRs so learning is tied to business outcomes, not just activity.

The AI wave has made this even more urgent. I’m deliberately upskilling PMs on LLMs for product managers, responsible AI Strategy, and data governance, while reskilling a subset of engineers and analysts into applied gen AI roles. We cover prompt design, evaluation frameworks, and privacy-by-design basics, then ship small internal tools to turn theory into practice.

Culture makes or breaks all of this. I set explicit learning budgets, protect focus time, and model the behavior—publishing my own learning roadmaps and post-mortems. Stakeholder management matters too: I align expectations in QBRs vs OKRs, broadcast progress, and celebrate skill gains the same way we celebrate product wins. When people see that growth is visible and valued, momentum builds.

One example that sticks with me: we reskilled a cross-functional cohort into analytics and experimentation while simultaneously upskilling our existing PMs in discovery synthesis. Within a quarter, decisions got crisper, experiments shipped faster, and collaboration across product trios felt effortless. The compounding effect was unmistakable.

If you’re starting from zero, keep it simple: map the skills you have, the outcomes you need, and choose one upskilling and one reskilling initiative you can deliver in the next 90 days. Make learning visible, measure what matters, and iterate. The teams that master this discipline won’t just keep up—they’ll set the pace.

Inspired by this post on Product School.

November 3, 2025
Inside Our AI-Native Product Training: Accelerating Adoption, ROI, and Measurable Growth

AI is reshaping how we build products, learn new skills, and lead teams. I’ve seen great organizations stall when training lags behind technology. That’s why we rebuilt our approach to product training from first principles—so every team can operate confidently with AI at the core of their product management practice.

Our north star is simple: operationalize AI Strategy for every product manager and cross-functional partner. We designed a learning system that shortens time-to-adoption, amplifies ROI, and links capability-building to clear, measurable outcomes.

Product School transforms product teams into AI-native organizations with training that accelerates adoption, maximizes ROI, and drives measurable growth.

That ambition informs how we design curriculum and delivery. We combine gen AI foundations, LLMs for product managers, applied product discovery, product roadmapping and sprint planning, and product management leadership. The learning experience blends case-based instruction with simulations and real product data so teams practice exactly how they’ll perform.

To ensure knowledge becomes behavior, we embed training directly into product workflows: in-app guides, product tours, onboarding sequences, and user activation loops tied to outcomes vs output OKRs. This closes the gap between knowing and doing, and it makes capability visible in the metrics that matter.

We focus on empowering product teams—clarifying decision rights, elevating accountability, and creating feedback loops that enable faster iteration. When teams own their roadmap and understand the AI building blocks, they move from experimentation to repeatable, scalable value creation.

Measurement is built in from day one. We instrument for adoption, time-to-first-value, feature activation, and ROI attribution, enabling continuous improvement and transparent stakeholder communication. The result is a system that compounds learning into performance.

This is how we’re building AI-native organizations: practical, data-informed, and outcomes-driven. It’s not just training—it’s an operating model that helps teams learn faster, ship smarter, and grow with confidence.

Inspired by this post on Product School.

November 3, 2025