Tag: product discovery

Outcomes Are Hard: How I Lead Teams Beyond Output with OKRs, Discovery, and Focus

Outcomes are hard. I’ve felt that tension in every product organization I’ve worked with, especially when the pressure to ship is loud and the signal from customers is faint. Moving a team from measuring success by output to measuring success by outcomes requires clarity, patience, and a willingness to rethink how we plan, prioritize, and learn.

Here’s how I frame the distinction. Output is what we build and launch. Outcome is the meaningful change for customers and the business—adoption, retention, reduced time-to-value, improved conversion, lower cost-to-serve. When we focus on outcomes, we stop celebrating activity and start optimizing for impact.

Why is this shift so difficult? Outcomes depend on human behavior, not just code. They emerge from messy, interconnected systems: customer jobs-to-be-done, go-to-market motions, pricing, onboarding, and even customer support. That complexity makes outcomes slower to observe, harder to attribute, and easy to dismiss when a deadline looms. It takes leadership, consistent product discovery, and strong instrumentation to stay the course.

OKRs are the most practical tool I use to make outcomes concrete. The Objective expresses a meaningful change we seek. The Key Results quantify that change in customer and business terms. Great KRs describe effects, not activities: increase weekly active usage of the new workflow by X%, reduce onboarding time to first value to Y minutes, lift self-serve conversion by Z%, cut support tickets per account for feature A by N%.

The common pitfalls are predictable. If your KRs read like a roadmap (“ship X,” “integrate Y”), you’re back to output. If they’re vanity metrics (“page views” with no linkage to value), you won’t learn. If they’re sandbagged, you’ll get a false sense of progress. And if time horizons are mismatched—quarterly KRs for outcomes that need a semester—you’ll churn without insight.

This is where product discovery earns its keep. I connect outcomes to discovery by starting with the problem, not the feature. I map assumptions, prioritize the riskiest ones, and test with the lightest-weight experiments—prototypes, concierge tests, or data slices. The goal is to find the smallest bet that can move the needle on the Key Results, then iterate. When discovery is continuous, the roadmap becomes a living hypothesis tied to outcomes rather than a fixed list of outputs.

Instrumentation is non-negotiable. If we can’t measure customer behavior reliably, we can’t manage outcomes. I invest early in event schemas, product analytics, and clear operational definitions for metrics. I also bring forward deployed engineers and designers into customer conversations to compress feedback loops. Being close to the user is a force multiplier for outcome thinking.

Cadence matters. I prefer weekly reviews on leading indicators and monthly deep dives on lagging metrics, with quarterly OKR retrospectives to distill lessons. We celebrate learning—insights that invalidate a bet are just as valuable as wins—because that culture keeps the team curious, honest, and resilient.

As product management leaders, our job is to create the conditions where outcome focus can thrive. That means setting a crisp product operating model, aligning stakeholders on a North Star metric, empowering teams to choose solutions, and protecting time for discovery. It also means having hard conversations about stopping work that doesn’t move the numbers, even if it’s already in flight.

How do you know you’re making progress? You’ll see fewer features, clearer narratives, faster cycles, and better results: improved activation, deeper engagement, and stronger product-market fit signals. Your roadmap will read like a portfolio of bets tied to Key Results, not a calendar of releases.

If outcomes feel hard, that’s normal—and it’s a sign you’re working on what matters. Start small: pick one team, define one consequential outcome, and run one disciplined discovery cycle. Measure honestly, learn in public, and repeat. Over time, you’ll build the muscle memory to move beyond output and deliver durable customer and business impact.

Inspired by this post on SVPG.

October 20, 2025
From Vision to Value: How Generative AI Elevates Product Design and Product Management

Product, design, and AI now converge at the center of how we build value. In my role leading product teams at HighLevel, Inc., I’ve experienced firsthand how generative AI amplifies the craft of product management and product design when we keep the fundamentals tight: clear problems, measurable outcomes, and deep collaboration across disciplines.

The mission hasn’t changed—deliver useful, usable, and trustworthy experiences—yet the means have. Generative AI expands our exploration space, speeds up iteration, and helps us reason over messy, real-world data. When we marry rigorous product discovery with thoughtful design and responsible AI strategy, we move from novelty to durable impact.

In discovery, I use AI to frame hypotheses, generate research questions, cluster customer feedback, and synthesize interview notes—without replacing direct conversations with customers. The goal is sharper insight, faster. I define outcomes in customer language, pressure-test assumptions, and trace every proposed AI capability to a clear job to be done. These habits keep us anchored to product-market fit lessons rather than shiny demos.

For prototyping, I pair designers with forward deployed engineers to build realistic vertical slices quickly. We practice gen ai for product prototyping by wiring prompts, system instructions, constrained outputs, and lightweight evaluators into clickable flows so we can test usefulness early. This reduces risk and helps the team learn which interaction patterns—chat, form, or guided workflows—fit the problem best, especially in product creator experiences.

Designing AI-powered UX means embracing uncertainty without eroding trust. I favor patterns like transparent confidence cues, citations or references where possible, editable outputs, easy undo/redo, and clear pathways from draft to commit. Good empty states, contextual examples, and progressive disclosure teach users how to get high-quality results while keeping them in control.

Quality requires a measurement backbone, not vibes. I define target tasks and build golden datasets, then run offline evaluations before online experiments. The core metrics stay consistent: task success rate, user confidence, time-to-first-value, latency budgets, and cost per resolution. We harden experiences with guardrails, hallucination checks, safe fallbacks, and escalation paths to humans when the model is uncertain.

Responsible AI is a product requirement, not a checkbox. I design for privacy-by-default, PII minimization, and secure data handling; I track prompt and model versions; and I test for bias and accessibility from the outset. Human-in-the-loop review, auditability, and transparent change logs protect users and the business as features evolve.

Go-to-market is part of the product. Clear onboarding, explainers, and in-product education reduce time to value. I align customer support ai strategy with telemetry so support teams can triage AI-specific issues, capture edge cases, and channel learning back into prompt libraries, data pipelines, and design improvements.

From a leadership standpoint, I set strategic guardrails and empower autonomous teams. Product management leadership owns outcomes and decision quality; design leads shape multimodal experiences; engineering owns reliability and performance; and our AI platform team standardizes evaluation, safety, and cost controls. This clarity accelerates learning and throughput.

Recently, we shipped an AI-assisted creation flow that reduced manual steps, improved time-to-first-value, and drove adoption among new users. The win wasn’t a clever prompt; it was disciplined product discovery, fast iteration with realistic data, and a crisp definition of success before we scaled.

If you’re just starting, pick one high-value, low-risk use case, define success in customer terms, and build a thin vertical slice with evaluations and guardrails. Put it in front of real users, instrument everything, and iterate until the experience feels fast, predictable, and genuinely helpful.

The intersection of product, design, and AI will keep evolving, but the bar remains the same: ship outcomes customers care about. When we combine the leverage of generative AI with sound product discovery and strong product design, we turn vision into value—reliably and repeatably.

Inspired by this post on SVPG.

October 20, 2025
Unlocking Team Autonomy in the Age of Generative AI: Practical Guardrails and Wins

I’ve been working on the longer-term implications of generative AI on product teams, and especially since “A Vision for Product Teams” made the rounds, I’ve had many meaningful conversations with leaders and practitioners about the consequences and second-order effects of generative AI. Through these discussions, one thing I’ve learned is that when it comes to product teams, there’s no one-size-fits-all playbook—autonomy only works when it’s matched with clarity of strategy, measurable outcomes, and explicit guardrails.

In practice, that means generative AI doesn’t replace product judgment; it accelerates learning loops. When teams can quickly prototype ideas, summarize research, and simulate user flows, they gain speed. But speed without direction amplifies noise. The teams that benefit most from AI pair autonomy with a crisp product strategy, a clear definition of success, and strong alignment on customer value.

Team autonomy in the AI era means owning problems, not features. Cross-functional squads should be accountable to outcomes, with the freedom to choose tactics—human-centered design, data-informed decisions, and responsible AI practices. Autonomy thrives when teams understand the company narrative, the strategic constraints, and the ethical boundaries that protect customers and the business.

The most underestimated shifts are the second-order effects. As AI reduces the cost of ideation and validation, teams can move faster with smaller surfaces—but the risk of local optimization increases. Without a unifying product strategy, shared data foundations, and platform standards, autonomy fragments the user experience. The solution is not to centralize decisions, but to centralize intent: common objectives, consistent metrics, and reusable capabilities that teams can compose.

Discovery also evolves. Generative AI can help synthesize qualitative feedback at scale, draft experiment variants, and stress-test hypotheses. I encourage teams to treat AI as an assistant for product discovery—use it to explore breadth, then validate depth with customers. Rapid prototyping is more powerful when tied to clear hypotheses, structured experiments, and tight feedback loops.

The role of product management expands from roadmap stewardship to system design. I focus my teams on framing problems, defining outcomes, and setting the rules of engagement: data access policies, model selection criteria, human-in-the-loop checkpoints, and standards for explainability. When we make these guardrails explicit, engineers and designers can move faster with confidence, and leaders can trust the results.

Operationally, I’ve found a few practices to be especially effective: outcome-based roadmaps instead of feature lists; a shared experimentation platform; golden datasets with clear provenance; evaluation rubrics for model quality; and policies for privacy, security, and bias mitigation. These enable autonomy at the edges while maintaining coherence at the core.

Adoption should be staged. Start with internal workflows and low-risk use cases, instrument everything, and expand as confidence grows. Celebrate wins that compound—shorter discovery cycles, better customer insights, and higher-quality decisions—not just raw automation. The goal is augmented teams, not automated teams.

Day to day, I ask teams to make their thinking legible. Treat prompts, hypotheses, and decision logs as living artifacts. When assumptions, constraints, and outcomes are explicit, autonomy scales. And when AI helps us reason faster and see farther, we can reserve human judgment for the choices that truly matter.

My takeaway: generative AI is a force multiplier for autonomous product teams that align on strategy, instrument outcomes, and operate with clear guardrails. Give teams ownership of problems, equip them with responsible AI practices, and hold them accountable to customer and business impact. That’s how we turn speed into sustainable progress.

Inspired by this post on SVPG.

October 20, 2025
Why INSPIRED Still Matters in the Generative AI Era: Access, Insights, and Practical Playbooks

In the Generative AI era, I keep returning to the enduring playbooks that shape great product teams. INSPIRED remains a cornerstone for how I coach on product discovery, product operating models, and product management leadership. I’ve used its principles to align cross-functional squads, empower product creators, and accelerate product-market fit lessons across both startups and scaled organizations.

The book INSPIRED is available in hardcover, digital, and audio versions, but until now, the audio version was only available in an exclusive arrangement with Amazon, on audible.com. The audio versions of our other books have been available from all major audio book providers. The exclusive contract with Amazon has now expired, and…

Why this matters: when knowledge moves beyond a single platform, more of our teams can absorb it in the flow of work. Distributed PMs, designers, data scientists, and forward deployed engineers can learn on their preferred apps during commutes or deep work breaks. That accessibility compounds learning velocity—especially when we’re iterating weekly on discovery insights, opportunity assessments, and bet selection.

What’s changed in our craft is the tooling: gen ai now augments how we validate assumptions, run product discovery, and prototype. Pairing the timeless practices in INSPIRED with gen ai for product prototyping helps my teams get to evidence faster—turning ambiguous narratives into testable artifacts, instrumented experiments, and real customer signals. It also sharpens our product operating model by making continuous discovery the default behavior across the product team.

Here’s how I operationalize this shift: I anchor a short “learning sprint” around one chapter at a time, then immediately translate insights into a concrete discovery activity (problem framing, assumption mapping, or opportunity sizing). We run a gen ai prototyping spike to visualize flows, draft UX copy, and simulate edge cases, followed by quick customer sessions to validate usefulness and usability. We capture outcomes in a working taxonomy of product-market fit lessons and update our decision logs so learning compounds sprint over sprint.

This is also a practical boost for enablement: new hires, customer support leaders crafting a customer support ai strategy, and forward deployed engineers can now engage with the same source material on their own schedules. When the whole team shares a common vocabulary—shaped by proven practices and accelerated by gen ai—the quality of debate improves, discovery cycles compress, and execution becomes more predictable.

If you’ve been meaning to revisit INSPIRED, this is an ideal moment. With access broadening, pick the format that fits your routine and turn insights into action the same day. Use it to pressure-test your product operating model, refine your discovery cadence, and elevate product management leadership across the organization. The combination of timeless principles and modern gen ai tools is exactly what our product teams need right now.

Inspired by this post on SVPG.

October 20, 2025
What I Learned Building AI Teacher Assistants: RAG, Evals, and Designs Teachers Love

How do you build an AI-powered assistant that teachers will actually use?

As a VP of Product Management who ships AI features to real users, I’ve learned that the answer starts with deep empathy and ends with disciplined engineering. I recently dug into a compelling case study of K–5 edtech, where a team with more than a decade of experience building adaptive learning tools launched an AI-powered Teacher Assistant to help educators align supplemental lessons with district-mandated core curricula. The result is a practical blueprint for product leaders navigating gen AI in high-stakes environments.

In this episode of Just Now Possible, Teresa Torres talks with Thom van der Doef (Principal Product Designer), Mary Gurley (Director of Learning Design & Product Manager), and Ray Lyons (VP of Product & Engineering) from eSpark. Listening through a product lens, I focused on what translated from vision to value in busy classrooms—and why some early instincts (like a chatbot-first UI) didn’t survive contact with reality.

Listen to this episode on: Spotify | Apple Podcasts

Here’s what stood out to me. Post-COVID shifts in education created new pressures for teachers and administrators, amplifying the gap between top-down mandates and classroom realities. The team’s first instinct—a chatbot interface—failed in testing, and what ultimately worked was a more structured workflow that mapped to how teachers actually plan, select, and assign lessons. That’s a timeless product discovery lesson: meet users where they are, especially when their cognitive load is already maxed.

On the technical side, their first RAG system surfaced all the usual suspects—and all the usual surprises. The team had to learn to wrangle embeddings, debug semantic search vs. keyword search, and tune retrieval to the nuance of curricula, standards, and lesson objectives. As someone who has shipped RAG-backed features, I appreciate how much of the work happens in the unglamorous middle: data quality, ontology decisions, metadata hygiene, and evaluation strategy.

Speaking of evaluation, their background in education shaped a surprisingly rigorous eval process, long before “evals” became a buzzword. They leaned on rubrics, Braintrust, and a human-in-the-loop approach to ensure the assistant’s recommendations were accurate, aligned, and classroom-ready. It’s a reminder that in domains like education and healthcare, model observability and structured evaluation are non-negotiable for product-market fit.

The most energizing signal for me: they’ve learned from thousands of teachers using the product this school year—and they’re already translating that learning into roadmap bets. What’s next for Teacher Assistant: more contextual recommendations using student data. Done well, that shift moves the product from “helpful” to “indispensable,” grounding gen AI in student outcomes rather than generic assistance.

Show notes for context: Guests include Thom van der Doef, Principal Product Designer at eSpark; Mary [last name], Director of Learning Design & Product Manager at eSpark; and Ray Lyons, VP of Product & Engineering at eSpark. Topics covered span the origin story of Teacher Assistant (connecting administrator mandates with teacher needs), why the team abandoned a chatbot interface in favor of a more structured workflow, how retrieval augmented generation (RAG) and embeddings shaped the product architecture, lessons learned from debugging semantic search vs. keyword search, building evals with rubrics, Braintrust, and a human-in-the-loop approach, and what’s next for Teacher Assistant: more contextual recommendations using student data.

If you like to follow along chronologically, the chapter flow is tight and practical: 02:05 Overview of Epar's Adaptive Learning Program; 07:19 Challenges and Insights from COVID-19; 17:06 Developing the Teacher Assistant Feature; 24:55 User Experience and Interface Evolution; 34:29 Chat GPT-5's New Features; 35:16 Balancing Engagement and Efficiency; 35:40 Seasonal Business and Real Traffic; 36:29 Technical Decisions and RAG Implementation; 38:28 Challenges with Embeddings and Metadata; 41:24 Improving Recommendations and Data Enrichment; 55:18 Evaluating the Teaching Assistant; 01:05:51 Future Plans and User Feedback; 01:07:57 Conclusion and Final Thoughts.

Useful links if you want to go deeper: eSpark Learning; Braintrust.dev – evals and observability for LLM applications; AI Evals Maven Course by Hamel Husain and Shreya Shanker.

My product takeaways for anyone building AI in complex, regulated, or mission-driven domains: First, resist the chatbot reflex; many users need structured, high-signal workflows. Second, treat retrieval as a product surface—data modeling, metadata, and domain language matter as much as model choice. Third, invest early in evals with rubric-based scoring and human-in-the-loop reviews to protect trust. Finally, plan for seasonality and “real traffic” patterns; the strongest eval is usage in production with tight feedback loops from your most demanding users.

Gen AI is only as valuable as the outcomes it enables. In classrooms, that means saving teachers time, raising instructional alignment, and ultimately improving student learning. This case study shows that when we combine empathetic product discovery with disciplined RAG architecture and rigorous evals, AI stops being a demo—and starts being a difference-maker.

Inspired by this post on Product Talk.

October 20, 2025
Turn AI into a Strategic Thought Partner: Real Workflows, UX Shifts, and Personal Agents

I’ve been leaning hard into AI as a strategic thought partner, not a shortcut—and this episode captured exactly why. Listening to Teresa Torres and Petra Wille explore how AI sharpens writing, coding, and product decision-making felt like a mirror of what I’m seeing on real teams: when we treat AI as a collaborator, we unlock quality, speed, and clearer thinking without sacrificing our voice or product judgment.

If you want to dive in, listen on Spotify or Apple Podcasts. There’s also a YouTube version here: watch the episode.

Two themes stood out immediately. First, Petra’s voice-first workflow and how she uses AI to mine her own archive for consistency is a brilliant approach to preserving authorial intent while scaling content creation. Second, Teresa’s claim that “Claude Code in the terminal completely changed her workflow—from planning mode for coding projects to using reviewer “sub-agents” when drafting blog posts” maps closely to how I’ve reshaped my own product and engineering cadence.

On Petra’s side, the combination of voice input and bilingual transcription isn’t just a convenience—it’s a cognitive unlock. By capturing high-fidelity thinking in real time and surfacing relevant prior material, AI becomes a continuity engine for product discovery and leadership communications. I’ve applied a similar pattern for product briefings and executive updates: record voice notes, let AI surface connected fragments from prior docs, and then reconcile differences to maintain a single, coherent narrative over time. Tools like WisprFlow make this feel natural rather than mechanical.

Teresa’s setup with Claude Code resonated as well: planning mode, context from local files, and project planning before writing code is exactly how I prefer to work with engineers and forward deployed engineers. Bringing in local context—sometimes via RAG (retrieval-augmented generation) or MCP (Model Context Protocol)—keeps the assistant grounded in the reality of our repositories and docs. In my experience, that pre-work pays off with cleaner interfaces, tighter tests, and faster reviews when we shift from ideation to implementation.

The framing that matters most to me: using AI as an editor and reviewer rather than as a ghostwriter. I still write every word myself, but I rely on structured critique to reduce blind spots. Creating sub-agents (copy editor, skeptic, devil’s advocate) to critique drafts mirrors how strong product teams stress-test PRDs, strategy docs, and UX copy. When I need a deeper critique, I’ll even spin up dedicated Subagents to review assumptions, risk, and edge cases.

One practical takeaway you can apply immediately: pair models for complementary strengths. How ChatGPT and Claude differ in strengths (structure vs. tone) is a pattern I see daily in gen ai for product prototyping. I often draft structured scaffolds or test plans in ChatGPT, then refine tone, clarity, and nuance in Claude. For “vibe coding” experiments in Python or Node.js, I’ll start in planning mode with Claude Code, anchor on tests and interfaces, and only then move into implementation.

The UX implications are profound. The shift toward personal agents as the interface for products accelerates a world where English becomes the interface for everything we do. That means our information architecture must increasingly be legible to agents, not just humans. It also means onboarding, accessibility, and error recovery will be mediated through conversational patterns, not just screens. For product management leadership, this demands new standards for observability, prompt governance, and cross-model evaluation—core ingredients for trustworthy AI strategy.

If you’re mapping this to your roadmap, here’s how I’d operationalize it: treat AI as a strategic thought partner in product discovery; define explicit roles for sub-agents in reviews; codify planning mode as a precondition to writing code; and document model choices (structure vs. tone) so your team knows when to use what. This is how we turn gen ai into durable product-market fit lessons rather than sporadic wins.

Resources and links mentioned or relevant to the workflows discussed: ChatGPT, Claude & Claude Code (Anthropic), WisprFlow, Vibe coding, Python, Node.js, RAG (retrieval-augmented generation), MCP (Model Context Protocol), agents and workflows, and Subagents.

I’d love to hear how you’re deploying AI in your own stack. What’s working in your editor-and-reviewer setup? Which combinations of models are giving you leverage? Drop your thoughts below—let’s compare notes and sharpen our collective practice as product creators.

Inspired by this post on Product Talk.

October 20, 2025
Deliberate Practice for Product Teams: How AI and On‑Demand Learning Unlock Mastery

I recently tuned into a powerful conversation where Petra Wille sits down with Teresa Torres to unpack a major shift in product learning: moving from purely instructor-led cohort courses to offering on-demand options. As someone leading product management at HighLevel, I’ve wrestled with the same trade-offs—how to scale product discovery skills without compromising depth, community, or outcomes—and this discussion hit home.

What stood out immediately is how Teresa shares why she resisted on-demand for so long, how deliberate practice has always been at the heart of her teaching, and what finally changed her mind. That framing matters. In my experience, deliberate practice is the backbone of real capability building: clear goals, targeted reps, tight feedback loops, and sustained reflection. It’s how we turn continuous discovery from a concept into a craft product teams can reliably execute.

We also dug into the trade-offs between cohort-based vs. on-demand learning. Cohorts bring structure, accountability, and shared language—critical for team-based behavior change. On-demand learning offers flexibility, reach, and just-in-time reinforcement—key for busy product managers, designers, and engineers balancing roadmaps and research. The challenge is not choosing one over the other, but architecting a blended learning system that preserves the rigor of cohorts while using on-demand to extend practice, sustain momentum, and meet learners where they are.

That’s where technology becomes a force multiplier. From AI-powered interview coaches to microlearning formats, we explored how AI can support behavior change and skill building without losing the human element. I’ve seen the same in my teams: when AI provides structured, rubric-based feedback on interviews, assumptions, or opportunity framing, people get expert-quality guidance at scale. Used well, this shortens the feedback cycle and increases the number of high-quality reps—without displacing peer critique or expert coaching.

Microlearning and problem sets deserve special attention. Short, focused practice—think “Duolingo” for product discovery—helps teams internalize patterns like crafting unbiased interview prompts, distinguishing signals from stories, or iterating on interview flow. Combined with spaced repetition, these formats build muscle memory for critical skills, so discovery doesn’t stall the moment the cohort ends. In other words, on-demand isn’t a downgrade; with the right scaffolding, it can be a durability upgrade.

Equally important, why AI should augment—not replace—human connection in discovery. No model can substitute for the trust you build with customers, the judgment you develop through messy real-world conversations, or the creative tension of team debate. My takeaway: use AI to accelerate preparation, evaluation, and deliberate practice; rely on humans for empathy, ethics, sense-making, and decision quality.

If you’ve ever wondered how to balance flexibility, structure, and deliberate practice in product learning—or you’re just curious how AI might reshape how we build skills—this conversation is for you.

Listen to this episode on: Spotify | Apple Podcasts

Explore the resources and links mentioned: Follow Teresa Torres: https://ProductTalk.org; Follow Petra Wille: https://Petra-Wille.com; Product Talk Academy; Continuous Interviewing course by Teresa Torres; Story-Based Customer Interviews On Demand course by Teresa; Customer Recruiting for Continuous Discovery On Demand course by Teresa; Duolingo; Teresa’s Interview Coach; AI as a Strategic Thought Partner with UX Implications podcast episode; Teresa’s socials: X, LinkedIn, Youtube, Product Talk Blog.

I’d love to hear your perspective. How are you blending cohort-based learning, on-demand practice, and AI coaching on your product teams? Drop your thoughts in the comments—let’s compare notes on what’s working.

Inspired by this post on Product Talk.

October 20, 2025
AI and Customer Interview Synthesis: Speed Up Insights Without Losing Empathy

Continuous customer interviews can overwhelm even seasoned product teams. I see it all the time: we commit to weekly conversations, transcripts pile up, and synthesis slips down the priority stack.

When I interview every week, the data builds quickly. Hours of transcripts accumulate, and if I don’t synthesize as I go, I fall behind. I’ve heard countless teams say, "We need to stop interviewing so we can catch up on what we’ve already learned." That’s a red flag—many teams pause and never restart.

I get why this happens. Interview synthesis is cognitively demanding and time-consuming. That’s why so many teams reach for generative AI to help. I use AI a lot—but I’m also careful about where it helps and where it hurts.

Before I explain how I use AI in practice, I want to ground us in the goal of continuous interviewing, why story-based interviews matter, and what good synthesis looks like. With that foundation, it’s much easier to see how AI can accelerate the right work without undermining our judgment, empathy, or product discovery skills.

See how customer insights ladder to a desired outcome: opportunities branch into sub-opportunities, leading to solutions and assumption tests—where AI can speed analysis, but humans decide what to pursue.

The goal of continuous interviewing is to develop a deep and rich understanding of who our customers are, what their goals are, the context in which they pursue those goals, and what opportunities (needs, pain points, and desires) arise along the way. I get there by asking for specific stories about past behavior. Goals, context, and opportunities emerge from those stories.

My flow is simple and disciplined: I synthesize each story using an interview snapshot, then I synthesize across stories by mapping the opportunity space in an opportunity solution tree. Habits vary, but in my experience this approach consistently yields actionable insights and keeps the team anchored in real customer context.

There’s a prerequisite almost nobody talks about: you need deep, rich stories. Many teams haven’t invested in interviewing skill. They ask hypothetical questions about the future (e.g., "Would you use this?") or spend precious interview time seeking solution feedback instead of learning about the customer’s world.

A clean, brand-neutral tile with a bold letter P signals a focus on product insights, introducing a deep dive into where AI speeds up customer interview analysis—and where it adds friction, bias, and preventable errors.

Even when teams ask about goals, context, and needs, they often ask in the abstract (e.g., "How do you decide what to watch?") or speculate (e.g., "What do you typically watch?"). That leads to unreliable feedback. When teams do ask for past stories, they often collect shallow narratives because they haven’t honed their craft to probe for detail and meaning.

Here’s the crux: if you aren’t good at collecting a rich story about past behavior, no amount of AI synthesis can help you. AI can’t add missing context. It can’t infer missing goals and motivations. It can’t create actionable opportunities from shallow stories. This is also why humans struggle to synthesize weak interviews. Better interviewing unlocks better synthesis—human or AI.

Once I have strong stories, I synthesize in two steps. First, I synthesize what I learned from each interview. Second, I synthesize what I’m learning across interviews. That separation matters.

For single-interview synthesis, I use interview snapshots. Each snapshot includes quick facts to contextualize the story, a memorable quote, an experience map of key moments, and—most importantly—a list of opportunities expressed in the customer’s own context. This keeps insights actionable and traceable.

When I synthesize across interviews, I review multiple interview snapshots to ask: what are we learning that can help us reach our desired outcome? Key moments give structure to the opportunity space, and the specific unmet needs, pain points, and desires help me see where our product can meaningfully help. With this foundation, I can reason clearly about where AI helps—and where it hurts.

Where do teams go wrong with AI synthesis? The biggest mistake I see is combining the two synthesis steps into one. Teams dump all their transcripts into an AI workspace or NotebookLM and ask the model to “tell us what we learned.” The second error is using low-quality prompts: asking for a summary, themes, or common pain points. Those outputs are easy to read but rarely actionable.

Product Talk’s Customer Interview Snapshot shows a concise way to document research—capturing a quote, quick facts, insights, opportunities, and an experience map—handy when comparing AI and human analysis.

If I learn the three most common pain points across interviews but don’t know who experienced them or in what context, I can’t design effective solutions. The interview snapshot is designed to avoid that trap by preserving opportunities within the customer’s story and context, tied to a real person. That link is critical for validation and iteration.

Summaries, in particular, are problematic. If you condense a 20–30 minute interview into a paragraph or two, you’ll lose the context, nuance, and detail that makes that customer story unique. One study found that large language models "frequently generated summaries that oversimplified or omitted critical details." Another study found that models struggle to "adequately represent the deep meaning" when summarizing text.

Bias is another concern. Pre-training data can shape outputs in ways that distort meaning. The first study found that pre-training data "introduced biases that affected summarization outputs" and that models often "defaulted to generalizations or inaccuracies." Hallucinations also show up in summaries, theme extraction, and even fabricated direct quotes. I’ve seen this first-hand: when I tested ChatGPT on real interviews, a surprising share of “quotes” were inaccurate or invented.

AI can speed up interview analysis, but this visual shows where it falters: weak prompts, shallow or inaccurate summaries, lost nuance, bias, and missing tone and body language. Use it to check your synthesis process.

There’s also a context gap. Synthesizing an interview well often requires business, product, and customer context to correctly interpret what was said. Unless we provide that context deliberately, the AI doesn’t have what it needs. Finally, most AI synthesis works off text only, missing tone of voice and body language—both of which can materially change meaning.

Despite those risks, I don’t avoid AI—far from it. I use it deliberately in three ways that consistently add value without eroding empathy or skill.

First, AI as a notetaker. AI transcription is excellent and essentially free. I often add structured metadata—date, participant, role, company, topics—so my transcripts are easy to search. Tools like Granola can organize notes, but I always verify those notes against the transcript to avoid subtle misreads.

AI can accelerate customer interview analysis, but this slide highlights the trade‑offs: diminished empathy, weaker pattern recognition, eroding synthesis skills, loss of quality feedback loops, and insights that are harder to act on.

Second, AI as a fresh perspective. In my product trio, each of us synthesizes the interview separately before we discuss. I then add AI as an additional perspective by running the same material through carefully configured Claude and ChatGPT spaces that have the right research context and synthesis instructions. Because I’ve already done my own synthesis, I can evaluate the model’s output, borrow useful frames, and catch anything I might have missed—without outsourcing my judgment.

Here’s the workflow I rely on for using AI as a fresh perspective: I set up a dedicated, persistent space (a Project or equivalent) for synthesis. I define the right context up front—ideal customer profile, current outcome, target opportunity, research questions, and short instructions on how to do each step. I keep single-interview synthesis and cross-interview synthesis separate, using new conversations to prevent context rot. I treat AI as another teammate—not the source of truth. And when AI surfaces opportunities I didn’t capture, I go back to the source to verify.

In practice, Claude (via Projects or Claude Code) is excellent for collaborative synthesis and handling long transcripts. ChatGPT (via CustomGPTs or Projects) offers a complementary perspective, and I’ll often run the same material through both. Granola helps with note organization, provided I review its outputs before using them. One caution: many “interview analysis” tools skip single-interview synthesis and jump straight to patterning—don’t let that happen. Do step one before step two.

A minimalist visual explains three practical roles for AI in customer interview analysis—note-taker, fresh perspective, and synthesis teacher—set in clean typography with Product Talk branding and teal accents.

Third, AI as a customer synthesis teacher. Synthesis is a skill. AI can suggest alternative opportunity framings, propose interpretations of the same passage, and flag when an “opportunity” is really a solution in disguise. I’ve had strong results using AI as a thought partner and coach, especially when I’m deliberate about what good looks like and I verify everything against source material.

There’s an important human dimension to all of this. When I do the deep work of synthesis, I develop empathy for customers. Creating an interview snapshot forces me to ask: What happened? What did we really hear? How can we help? That cognitive effort is what unlocks both empathy and pattern recognition. If I outsource that work to AI, I lose both the learning and the mental connections that fuel better decisions.

There’s another risk: skill atrophy. If I let AI do the synthesis, my ability to synthesize degrades—and that makes me worse at evaluating AI’s output. Two recent studies (see here and here) found that experts are much better than novices at catching the subtle mistakes LLMs tend to make. So if we don’t keep our edge, we not only lose skill—we also get less value from AI.

Clean, modern title card for a post exploring where AI helps and hurts in customer interview analysis. The prompt-centered design in teal and navy, paired with a Product Talk tag, teases insights on faster, smarter interview synthesis.

A final benefit of doing the work yourself: when I revisit transcripts, I see my own interviewing gaps. I spot missed follow-ups, leading questions, or places I misread what was said. Synthesis becomes a feedback loop that improves my interviewing craft. If I outsource synthesis, I sever that loop.

Can AI help humans do better synthesis, faster? The research is encouraging. Those same two studies found that AI can raise the performance of novices. But they also show that experts working with AI perform best. In my experience, the sweet spot is expert human synthesis aided by AI—fast enough to keep pace, rigorous enough to build empathy and insight.

Practically, there are three approaches to interview synthesis. Human-only is the deepest path to empathy and pattern recognition, with no hallucination risk and maximum skill-building—but it’s time-intensive and can be overwhelming at scale. Outsourcing to AI is the fastest and handles volume well, but you risk losing context and empathy, your skills can atrophy, and outputs are often less actionable. AI as collaborator sits in the middle: it catches missed opportunities, adds fresh perspectives, speeds up work without replacing it, and strengthens your synthesis muscles—provided you do your own synthesis first and verify AI’s contributions.

My recommendation is simple. Start with human-only synthesis until you can recognize what good looks like. Then bring in AI as a collaborator once you can evaluate the quality of its output. Only outsource to AI if you’re genuinely blocked and need a temporary bridge—and if you do, plan to build your synthesis muscle alongside it.

So what does AI as a collaborator look like day to day? It looks like tight loops: rigorous single-interview synthesis by each member of the product trio, a second pass with AI configured to your outcome, ICP, and target opportunities, careful verification back to the transcript, and only then cross-interview synthesis that maps the opportunity space. That cadence preserves empathy, sharpens judgment, and gives you the speed benefits of AI without sacrificing what makes product discovery work.

Bottom line: use AI to accelerate clarity, not to replace the human judgment that drives product management leadership. When we protect empathy, preserve context, and practice disciplined synthesis, generative AI becomes a powerful amplifier for product discovery—not a shortcut that dulls our edge.

Inspired by this post on Product Talk.

October 20, 2025
Inside Alyx: Dogfooding, Evals, and Observability That Power an Agentic AI Future

I’ve been deep in the work of building practical, agentic capabilities into AI products, so this story about Alyx immediately resonated with me. It’s a rare, clear-eyed look at what it actually takes to ship a useful AI agent inside an AI platform—while using that same platform to build, test, and continuously improve the agent.

What does it really take to build an AI agent inside an AI platform—especially when you’re using that same platform to build the agent?

Listening to SallyAnn DeLucia (Director of Product at Arize) and Jack Zhou (Staff Engineer at Arize) unpack Alyx—the AI agent that helps teams debug, optimize, and evaluate AI applications—I recognized playbooks I trust: start scrappy, dogfood relentlessly, build intuition with real users, and systematize improvement with thoughtful evals.

Their early phase looked exactly like the messy reality many of us try to hide: Jupyter notebooks, hacked-together web apps, and weekly dogfooding sessions with their customer success team. That’s where patterns emerged, confidence was built, and the highest-leverage skills for the agent were prioritized. It’s a reminder that “vibe checks” matter at first—but you must quickly graduate to measurable, repeatable learning loops.

In my experience, the foundation of GenAI product quality is threefold: tracing, observability, and evals. They reached the same conclusion—defining traces across tool calls and sessions, creating observability into model behavior, and layering evals to compare both micro-decisions and system-level outcomes. That discipline converts hunches into evidence and makes agent behavior improvable, not mysterious.

What stood out was how cross-functional, boundary-spanning teams made the difference. Customer success engineers surfaced repeatable workflows. Product framed early skills. Engineering wrapped prototype tools into something coherent. Using their own platform to build Alyx accelerated intuition and de-risked launch. That’s the product loop I aim to cultivate: close to customers, close to data, and fast to learn.

As Alyx matures, the next step is moving from “on rails” workflows to more autonomous, agentic planning loops. That evolution requires stronger tool design, richer feedback signals, and evals that reflect end-to-end user value. It’s exactly the shift I expect across GenAI: from scripted assistants to adaptive systems that reason, plan, and act with guardrails.

Listen to this episode on: Spotify | Apple Podcasts

Guests:

SallyAnn DeLucia, Director of Product, Arize

Jack Zhou, Staff Engineer, Arize

In this episode, we cover:

What tracing, observability, and evals really mean in GenAI applications

How Arize used its own platform to build Alyx, its AI agent

The role of customer success engineers in surfacing repeatable workflows

Why early prototyping looked like messy notebooks and hacked-together local apps

How dogfooding shaped Alyx’s evolution and built confidence for launch

Why evals start messy, and how Arize layered evals across tool calls, sessions, and system-level decisions

The importance of cross-functional, boundary-spanning teams in building AI products

What’s next for Alyx: moving from “on rails” workflows to more autonomous, agentic planning loops

My takeaways for product teams building GenAI agents are simple and hard: design tools with observability in mind; operationalize evals early even if they’re imperfect; embed customer-facing engineers in the loop to capture real workflows; and keep the first skills narrow, high-impact, and testable. If your team can move from demos to disciplined measurement quickly, you’ll accelerate product-market fit.

Resources & Links

Arize AI — Sign up for a free account and try Alex

Arize Blog — Lessons learned from building AI products

Maven AI Evals Course — The course Teresa took to learn about evals (Get 35% off with Teresa’s affiliate link)

Cursor — The AI-powered code editor used by the Arize engineering team

DataDog — For understanding application traces

OpenAI GPT Models — GPT-3.5, GPT-4, and newer models used in early and current versions of Alex

Jupyter Notebooks — A tool for combining code, data, and notes, used in Arise’s prototyping

Axial Coding Method by Hamel Husain — A framework for analyzing data and designing evals

Chapters

00:00 Introduction to Sally Ann and Jack

01:08 Overview of Arize.ai and Its Core Components

01:44 Deep Dive into Tracing, Observability, and Evals

03:56 Introduction to Alyx: Arize's AI Agent

04:15 The Genesis and Evolution of Alyx

08:51 Challenges and Solutions in Building Alyx

24:33 Prototyping and Early Development of Alyx

26:22 Exploring the Power of Coding Notebooks

26:51 Early Experiments with Alyx

27:59 Challenges with Real Data

29:20 Internal Testing and Dogfooding

31:55 The Importance of Evals

35:16 Developing Custom Evals

43:09 Future Plans for Alyx

47:59 How to Get Started with Alyx

Full Transcript

Podcast transcripts are only available to paid subscribers.

If you’re building in GenAI right now, this conversation offers a pragmatic blueprint. Start with high-signal workflows, turn qualitative insights into quantitative evals, and use tracing plus observability to make agents debuggable. That’s how scrappy prototypes become reliable systems. And if you want a tangible example, “47:59 How to Get Started with Alyx” is a helpful on-ramp.

Inspired by this post on Product Talk.

October 20, 2025
How to Design Your Product Leadership Legacy: Impact, Craft, and Values That Endure

I recently spent time with an episode of All Things Product that hit especially hard as we head into year-end: Petra Wille and Teresa Torres ask, “What do you want to be known for in your work?” As someone leading product management and building high-performing teams, I regularly bring this question into my Q4 conversations. It’s a powerful lens for product management leadership, career transitions, and how we show up for our customers and colleagues.

Listen to this episode on: Spotify | Apple Podcasts

In this conversation, I appreciated how clearly they unpack the nuances of impact, craft, personal brand, and values—and how those ideas shape the footprints we leave in teams, organizations, and the broader product community. Their stories and lessons learned are equal parts relatable and practical, which is exactly what we need when we’re balancing execution with reflection.

Let’s talk about “legacy.” The word can feel loaded—big, vague, and distant. I reframe it with my teams into a question we can act on now: What meaningful change did we enable for customers and our organization this quarter, and what do we want colleagues to remember about how we did it? That framing keeps us grounded in outcomes and behaviors, not just lofty aspirations.

The distinction between impact and craft is central. Impact is the difference our work makes—what changes because of our decisions. Craft is what we hone for intrinsic reward—our product discovery techniques, decision-making frameworks, and communication muscles. Early in my career, I over-indexed on impact metrics and under-invested in craft. I shipped value, but I wasn’t building the repeatable habits that elevate a product creator for the long haul. Over time, I learned that craft compounds—and it pays dividends in both product-market fit lessons and leadership credibility.

Personal brand and values also matter more than many of us admit. When the pressure is on, people remember how we decide, how we communicate trade-offs, and how consistently we anchor on customer value. I want to be known for rigorous product discovery, clarity under uncertainty, and the integrity to say “no” when it protects long-term outcomes. Those cues travel fast across an organization and quietly define our leadership legacy.

Feedback gaps can reveal blind spots—and we all have them. I proactively create multiple feedback loops: structured 1:1s, skip-levels, stakeholder debriefs after key product decisions, and customer touchpoints. I specifically ask for disconfirming evidence—what am I missing, where did my decision-making create friction, and how might I simplify? Weekly customer learning is non-negotiable for me; it keeps the team grounded and accelerates product discovery. If you need a starting point, Teresa’s work on weekly customer interviews is a solid playbook: Customer Interviews: How to Recruit, What to Ask, and How to Synthesize What You Learn.

Here are the prompts I’m using with my team for Q4 reflection. Why “legacy” can feel loaded—and better ways to frame the question. The difference between impact (what changes because of your work) and craft (what you hone for intrinsic reward). How personal brand and values influence what colleagues remember about you. Why feedback gaps can reveal blind spots—and how to proactively seek better input. Reflection prompts to carry into your Q4 (and beyond). I encourage folks to journal on these, then bring two concrete actions into our next planning cycle.

If you’re thinking about your own growth, preparing for career transitions, or simply curious how others reflect on their product practice, this episode offers both inspiration and pragmatic takeaways. I’m weaving these themes into our planning and calibrations because reflection is a force multiplier—it sharpens strategy, strengthens culture, and ultimately improves customer outcomes.

Follow Teresa Torres: https://ProductTalk.org

Follow Petra Wille: https://Petra-Wille.com

Mentioned in the episode: Petra’s Thought-Provoking Questions to Prompt Your End-of-Year Reflection

Mentioned in the episode: Xing

Mentioned in the episode: Teresa’s work on weekly customer interviews: Customer Interviews: How to Recruit, What to Ask, and How to Synthesize What You Learn

Mentioned in the episode: Petra’s guide: The Product Leader’s Guide to Giving Feedback

Join the conversation with me: What do you want to be known for in your product work this coming year? Share your thoughts below and let’s learn from one another.

Full Transcript

Full transcripts are only available for paid subscribers.

Inspired by this post on Product Talk.

October 20, 2025
From Closet Cold Calls to Category Leader: Gusto’s Playbook for Urgent Product-Market Fit

I gravitate to origin stories where product strategy meets real human pain. Tomer London is the co-founder and Chief Product Officer at Gusto, the payroll and people platform used by over 400,000 businesses. He grew up helping run his dad’s clothing store in Israel — an experience that sparked his mission to build better tools for small business owners. After moving to the US for a PhD at Stanford, he met his co-founders and started Gusto. That founding context matters: it rooted the company in empathy for SMBs and created the “burning problem” lens that still defines their roadmap. What stands out most to me is the insistence on “emotional urgency.” In product discovery, polite feedback is noise; urgency is signal. I use a simple heuristic—the tug-of-war test for product-market fit: are customers fighting to pull the product into their workflow today, or gently praising it while doing nothing? Why founders should actively seek rejection is the companion lesson. Rejection exposes the edges of the problem, clarifies the job-to-be-done, and forces focus. When prospects say no with conviction, they’re actually giving you a prioritized backlog. Gusto’s scrappy customer research: cold calling from a walk-in closet is the type of hustle I expect from great product teams. It’s a reminder that qualitative discovery doesn’t require a lab—just proximity to customers. I’ve seen forward-deployed conversations beat large-scale surveys every time, especially in SMB markets where workflows are idiosyncratic and switching costs are emotional as much as economic. Those early calls transformed abstract hypotheses into concrete user journeys, error states, and trust moments. Reinventing payroll without any prior experience can be an advantage when you pair first-principles thinking with domain humility. The discipline is to ship with credibility from day one. “It’s not an MVP, it’s something that wows people” captures this perfectly. For regulated, high-stakes workflows like payroll and taxes, a minimum lovable product must feel complete at the edges that matter—accuracy, compliance, and support—while still being opinionated and simple. Competing with incumbents like ADP, Intuit, and Paychex required that Gusto’s default experience be both safer and easier. Hiring for humility, not just talent is another keystone. In complex categories, humility accelerates learning loops, reduces coordination drag, and keeps teams close to customers. I’ve applied a similar bar in co-founder and executive selection: values alignment over resume prestige. The weekly co-founder ritual that built trust is a practice I recommend—structured, recurring time to surface concerns, decide faster, and avoid silent drift. Teams that maintain this cadence sustain velocity even as they scale. Betting on SMBs – and ignoring investor advice is a familiar crossroads. Serving SMBs vs. startups demands different GTM mechanics, pricing psychology, and onboarding pathways. Gusto’s “start small” GTM playbook—narrow ICP, land with a high-urgency job, then earn the right to expand—de-risks complexity while proving unit economics. The shift from payroll to a multi-product platform only works when the initial wedge earns trust. That’s how switching costs became Gusto’s moat: not through lock-in tactics, but by becoming the source of truth for money-in, money-out, and people ops. I also appreciate the candor around The two lucky breaks that gave Gusto an edge. Timing, regulatory tailwinds, or partner enablement often look like luck from the outside and like relentless preparation on the inside. Programs like Y Combinator can sharpen that preparation, but the compounding advantage still comes from daily execution—shipping, learning, and iterating. Along the way, names like Wells Fargo matter because financial infrastructure choices affect reliability and trust, which in turn affect retention. A throughline here is craftsmanship anchored in real-world retail empathy. What Tomer learned about customers from his dad’s clothing store mirrors what I’ve seen across SMB product-market fit lessons: respect the owner’s time, remove ambiguity, and solve the whole problem, not just the shiny part. Building products customers actually love is the result of pairing opinionated design with verifiable outcomes—on-time payroll, fewer errors, less admin stress, and clearer cash flow. If you’re a product creator tackling a workflow as critical as payroll, take these as your operating principles: measure emotional urgency, welcome rejection, over-index on discovery, hire for humility, and aim for wow, not just MVP. Whether you’re up against ADP, Intuit, Paychex, or building a new wedge entirely, this is a repeatable path from wedge to platform. For inspiration that shaped many builders in our field, revisit Steve Jobs’ “Secrets to Life” clip and Steve Jobs’ Stanford Commencement Speech—both reminders to question defaults and start from first principles. Finally, a note on leadership. Product management leadership isn’t about grand roadmaps; it’s about creating the conditions for truth to surface quickly—through customer conversations, team rituals, and clear success metrics. Do that well, and like Gusto, you’ll earn the right to expand your product surface area without losing the trust that made customers choose you in the first place.

October 20, 2025
How Braintrust Nailed Product-Market Fit: Paranoia, Patience, and High-Bar Quality

Product-market fit in the GenAI era is elusive because both the technology surface area and user expectations change weekly. That’s why Braintrust caught my eye: they set a relentless quality bar, delayed go-to-market on purpose, and used real-world evaluation pain to shape an end-to-end platform for building AI apps. In my work leading product management teams, I recognize this pattern as the difference between shipping demos and shipping durable value.
Context matters. Ankur Goyal’s journey runs through MemSQL (now SingleStore), Impira, and Figma. Working with high-bar users at MemSQL forged a bias toward precision, performance, and reliability—traits that translate directly to AI infrastructure where flaky evals and brittle prompts can quietly erode trust. When you build for exacting users early, the feedback loop is unforgiving—and that’s a gift.
The throughline is quality. Great software often comes from a place of “paranoia”—the productive kind that compels us to fail proofs, harden edge cases, and verify outcomes under load. In AI product development, that paranoia shows up as rigorous evals, clear data contracts, reproducibility, and measured rollouts. It’s not glamorous, but it’s how you earn compounding trust with builders and operators.
Recruiting is strategy. The trick to recruiting well is selecting for taste, curiosity, and ownership—people who elevate the craft and sweat the engineering details. In AI-heavy products, I’ve had the most success with forward deployed engineers who live with users long enough to discover the non-obvious constraints that should drive the roadmap. Taste plus proximity beats velocity without context.
Impulse control creates leverage. Braintrust delayed go-to-market, which is counterintuitive when the market is hot. But in a new category, premature scaling yields fake signals. The better move is to tighten the loop: instrument the “prompt playground,” pressure-test evals, validate the inner loop of building AI apps, and only then broaden access. When the core interaction is right, growth compounds; when it’s off, every feature feels like a workaround.
Figma-era frustrations with evals became the opportunity. Anyone who has tried to standardize AI evaluations across prompts, models, and datasets knows how quickly the surface area explodes. Converting that frustration into Braintrust’s product thesis—reliable, end-to-end workflows for AI app development—speaks to a classic product discovery principle: go deep on a painful, persistent job-to-be-done before you go broad.
How to recognize a real market opportunity: look for high-frequency workflows with measurable outcomes, teams who already duct-tape solutions, and buyers who have the budget and urgency to pull the product in. When you see repeatable pull from discerning users—and you can demonstrate quality with transparent evals—you’re approaching true PMF rather than narrative fit.
Inside the first six months, the right posture is deliberate focus. For a platform like Braintrust, that means obsessing over the developer inner loop: data in, prompt iteration, eval rigor, versioning, approvals, and productionization. The “prompt playground” must evolve from experimentation to governance, so teams can move from clever demos to reliable deployments with confidence.
AI continues to reshape the platform’s future. As model ecosystems shift (OpenAI and beyond) and the data plane sprawls (Databricks, Snowflake), developers want a unified surface to build, evaluate, and ship. Integrations with familiar tools like Airtable, Coda, Zapier, and Figma lower adoption friction by meeting teams where they already work, while enterprise-grade controls unlock buyers at the scale of Goldman Sachs.
The cultural choices matter as much as the code. Make big bets with extreme clarity, or don’t make them at all. Stay mission-driven when novelty tempts distraction. Write down the customer promise and keep it tight. Hiring mistakes—especially around quality, curiosity, and ownership—compound quickly in AI product teams, so reset the bar early and protect it.
What PMF really looks like here: customers self-discover core value, usage deepens without hand-holding, and cross-functional teams (engineering, data science, and operations) align around shared definitions of quality. Support volume becomes more about how-to than break-fix. Roadmap prioritization becomes easier because the next best feature reveals itself in the workflow data.
My playbook takeaways for product management leadership in GenAI: prioritize eval rigor before growth, use forward deployed engineers for product discovery, specialize the prompt playground into a governed inner loop, and delay go-to-market until high-bar users pull you in. These are the same principles I apply to gen ai for product prototyping and customer support ai strategy—because durable PMF in AI still comes down to quality, focus, and earned trust.
Referenced:
• Airtable: https://www.airtable.com/
• Adam Prout: https://www.linkedin.com/in/adam-prout-0b347630/
• Braintrust: https://braintrust.dev
• Brian Helmig: https://www.linkedin.com/in/bryanhelmig/
• Coda: https://coda.io/
• Databricks: https://www.databricks.com/
• David Kossnick: https://www.linkedin.com/in/davidkossnick/
• Figma: https://www.figma.com/
• Goldman Sachs: https://www.goldmansachs.com/
• Kris Rasmussen: https://www.linkedin.com/in/kristopherrasmussen/
• Manu Goyal: https://www.linkedin.com/in/mngyl/
• MemSQL: https://www.singlestore.com/ (now SingleStore)
• Nikita Shamgunov: https://www.linkedin.com/in/nikitashamgunov/
• OpenAI: https://openai.com/
• Snowflake: https://www.snowflake.com/
• Zapier: https://zapier.com/

October 20, 2025