Author: Shivam Tiwari

From Prototype to the Pentagon: My Playbook for Winning DoD Customers and Mission Fit

I’ve spent years building dual-use products and partnering with teams navigating the Department of Defense. In this piece, I share how I move from prototype to program of record by aligning product strategy to real mission outcomes, building trust with end users, and translating commercial product rigor into the national security context. Commercial versus military market strategies require fundamentally different assumptions. In the private sector, we obsess over product-market fit and velocity; in defense, we obsess over “mission solution fit” and survivability in procurement. The buyer is a complex web—operators, program managers, contracting officers, and Program Executive Offices (PEOs)—and each needs a clear value story tied to mission impact, not just features or ARR. When I validate ideas for defense products, I start with deep discovery at the edge: talking with operators, understanding tactics, techniques, and procedures (TTPs), and quantifying what “better” looks like in their environment. The “Mission Model Canvas” helps me capture stakeholders, beneficiaries, and constraints that don’t exist in a typical SaaS motion. “Hacking for Defense” has been invaluable for structuring this discovery and ensuring we test assumptions against mission reality, not just market appetite. A practical guide to military sales and procurement starts by mapping decision pathways. I identify the PEOs, the Program Managers, and the acquisition timelines that govern transition. I treat each step like an enterprise sale with additional layers: requirements, testing, accreditation, and budgeting. I align demonstrations to mission milestones, ensure my roadmap accounts for integration and accreditation lead times, and keep decision-makers looped with concise, evidence-based updates. Rethinking go-to-market strategy for defense means planning for longer cycles and multi-level consensus. Instead of a simple funnel, I build a coalition: an operator champion for pilots, a program sponsor for funding continuity, and a contracting route that matches how the customer buys. The goal is to de-risk adoption across technical, operational, and procurement dimensions in parallel. Building a network in national security is a full-contact sport. I invest time in the field, put forward deployed engineers next to users, and show up at the training ranges and labs where problems are real. Trust accumulates when teams see you adapt quickly, respect constraints, and demonstrate an understanding of mission risk. That trust turns into access and, ultimately, into pull from the organization. The dual-use debate isn’t binary—it’s a portfolio decision. I’ve seen teams succeed by leading with a defense wedge when the problem is uniquely military, and others start commercially to prove traction then tailor for defense. The key is to avoid whiplash: design your architecture and compliance posture so you can serve both without fragmenting your roadmap. Behind the rise of a new generation of “defense founders” is a shift in ambition and capability. Teams are mission-driven, technically sophisticated, and comfortable operating in complex stakeholder environments. They’re building for hard problems and measuring success in operational outcomes, not just revenue milestones. “Mission solution fit” is my north star. I define it as measurable mission improvement with acceptable changes to TTPs, training, and integration. I seek evidence that units can and will use the solution under realistic constraints, that it interoperates with existing systems, and that program leadership can fund it at scale. When those signals align, transition becomes possible. Breaking new ground in military tech often means navigating institutional friction. The “The Frozen Middle” is real—layers that resist change even when leadership and operators are aligned. I plan for this by prototyping where adoption barriers are lowest, securing a senior sponsor, and demonstrating cost, schedule, and performance wins that the middle cannot ignore. The hidden challenges most startups miss tend to be non-technical. Security and accreditation aren’t documentation exercises; they’re product constraints that should shape architecture early. Interoperability isn’t a feature; it’s table stakes. And your ability to explain “why now” in the language of budget cycles can matter as much as a benchmark. Essential resources for any defense founder include “Hacking for Defense,” “The Hacking for Defense Manual,” the directory in “How to find your customer in the Dept of Defense,” the “Mission Model Canvas,” and lessons from “The lean launchpad at Stanford” and “The Secret History of Silicon Valley.” I also draw on the work of Alexander Osterwalder and Eric Ries to bridge discovery, iteration, and disciplined scale. What’s missing from Silicon Valley in this domain is patience paired with rigor. The best teams combine world-class product discovery with respect for acquisition realities. They instrument outcomes in the field, align roadmaps to funding gates, and bring forward deployed engineers to close the gap between prototype and operational capability. From prototype to the Pentagon is a repeatable path when we hold ourselves to mission outcomes, build coalitions across the acquisition chain, and design for constraints from day one. If you’re committed to national security, build with empathy for the operator, clarity for the buyer, and a roadmap that survives contact with procurement.

October 20, 2025
Inside Figma’s Product Playbook: Taste, Simplicity, and Storytelling for Extraordinary PMs

I’ve long believed the best products come from a careful blend of taste, simplicity, and storytelling. Studying how Figma operationalizes these principles has sharpened my own playbook for building, launching, and scaling products. In this piece, I distill the patterns I use and teach: how to approach new products, how to prioritize without losing the plot, and how to use narrative as a force multiplier for teams and customers.

At a high level, here’s the arc I focus on: approaching new products with a strong point of view, shaping product culture that balances craft with outcomes, understanding when to change course, tying business goals to product expansion, going multi‑product deliberately, recognizing the differences between “0 to 1” and “1 to 10” talent, and elevating storytelling from launch polish to a core build-time practice. Along the way, I’ll highlight why taste and simplicity aren’t luxuries—they’re strategy.

When I explore how to build from zero, I start with a crisp customer promise and a single, testable magic moment. The early days demand ruthless focus: one job-to-be-done, one path to value, one reason to share. As teams expand scope, the risk is layering utility without coherence. The countermeasure is systematic simplicity—every addition must make the core value faster, clearer, or more extensible. If it doesn’t, it’s noise.

Product culture is the scaffolding that makes this discipline stick. Speed and operational excellence drive the right kind of urgency; experimentation at scale validates hypotheses without cargo-culting metrics; and rigor in reviews ensures we’re prioritizing outcomes over output. The best cultures pair evidence with taste—data guides, but the bar for quality, narrative, and craft is set by humans with conviction.

Knowing when to change things is both an art and a system. I look for signal in stubborn user friction, plateauing activation, a long tail of workarounds, and moments when a new platform or workflow unlocks 10x value. The framework I use: if a change can simplify the path to the promise, or unlock a whole new class of users without diluting the core, it deserves energy. Change the defaults before changing the philosophy.

Business goals should sharpen, not overshadow, product expansion. Before adding surfaces or SKUs, I insist on clarity around the ICP, the premium moment worthy of pricing, the extensibility story for developers, and the narrative that unifies everything. Multi‑product strategy works best when each product is a chapter in the same book, not a pile of features. That’s why I appreciate how the ecosystem comes together across Figjam: https://www.figma.com/figjam/, Figma: https://www.figma.com/, Figma Dev Mode: https://www.figma.com/dev-mode/, and Figma Slides: https://www.figma.com/slides/—distinct entry points, shared language, and compounding value.

For “0 to 1” product work, I hire for curiosity, taste, and velocity. I want builders who can reduce ambiguity quickly, prototype with whatever tools are at hand, and tell a clear story about why their version of the problem matters. My favorite interview signal is a non-obvious customer insight that changed their roadmap. Entrepreneurial talent shows up in the questions they ask about distribution, pricing, and adoption—not just the feature.

I’m often asked why there aren’t more designer founders. My take: the gap is less about capability and more about exposure to distribution, pricing, and finance. Practical fixes help—give design leaders P&L ownership, put them on customer calls that include procurement, and pair them with GTM partners early. When designers are fluent in business mechanics, their advantage in taste and narrative becomes a superpower.

New product launches work best when the story is built in from day one. I like to “slow-cook” with tight, cross-functional squads, private betas with power users, and an explicit before/after narrative that connects the dots across product, docs, community, and developer ecosystem. As teams scale, I match talent to stage: “0 to 1” thrives in uncertainty; “1 to 10” excels at repeatability, quality, and operational excellence. Both are essential; mixing them at the wrong time creates drag.

Storytelling is not veneer—it’s how we align teams, earn stakeholder trust, and help users see themselves in the product. I anchor roadmaps to a one-sentence promise, show the painful “before,” demonstrate the “after,” and name the magic mechanic that makes it possible. Then I translate that story into prioritization. I stack-rank by value, confidence, and cost, and I’m explicit about what we won’t do. Strategy is as much the boundary as the plan.

If you’re refining your product storytelling, a quick checklist helps: articulate the promise in plain language, show rather than tell with a demo that lands the magic moment in 30 seconds, connect to measurable outcomes, and make the first-run experience feel like the narrative come to life. Don’t bury the lead. If a user can’t explain your product to a teammate after one minute, the story isn’t ready.

The difference between “good” and “extraordinary” product managers is simple to say and hard to do. Good PMs coordinate and ship on time. Extraordinary PMs set a higher bar for taste, simplify relentlessly, and move teams from consensus to conviction. They connect craft to outcomes, use narrative to create momentum, and make decisions that age well because the logic is legible.

Simplicity is a growth strategy. It shortens time-to-value, reduces error surface, and raises retention by making products feel learnable and trustworthy. Tactics I lean on: one hard thing at a time, remove to improve, defaults are design, and compress choices until the right path is the easy path. Simplicity isn’t less—it’s the right less.

Taste, in product and design, is not innate; it’s a practiced sensitivity to what feels inevitable. I cultivate it by collecting exemplars, writing and revisiting product principles, insisting on weekly critiques, and sweating the narrative as much as the pixel. The best teams hold two truths: quality you can feel and outcomes you can measure.

If you want to explore the ecosystem I referenced, here are direct links: Figjam: https://www.figma.com/figjam/, Figma: https://www.figma.com/, Figma Dev Mode: https://www.figma.com/dev-mode/, Figma Slides: https://www.figma.com/slides/.

Whether you’re building your first product or scaling a platform, the throughline remains: lead with taste, ship with simplicity, and align everyone with a story worth rallying around. That combination turns good teams into extraordinary ones—and products into movements.

October 20, 2025
Inside dbt Labs’ $4.2B ascent: category creation, open source, and monetization playbook

As a VP of Product Management, I’m fascinated by the rare mix of strategy, timing, and execution that turns a great idea into a durable category. The arc of dbt Labs is one of those definitive product stories: a cloud-based data management platform that has raised over $400M to date, and was last valued at $4.2B in 2022. What stands out to me first is the scale and velocity. Dbt Labs has grown from just three companies using its free tool in 2016 to an ecosystem of 30,000+ enterprise users. That journey captures the essence of category creation done right: lead with an opinionated product, cultivate a community around clear practices, and sequence monetization only after adoption becomes self-sustaining. When I look at Dbt’s explosive growth, I see a masterclass in product management leadership. The team focused on a precise, under-served problem in modern data workflows and built a tooling philosophy that aligned with how analysts and engineers actually work. That alignment turned a utility into a movement. The strategic pivot from consulting to a software company is a decision I’ve navigated myself, and it’s often misunderstood. Consulting’s hidden scalability and consultancy superpowers aren’t about headcount—they’re about tight customer feedback loops, paid discovery, and rapid learning cycles that directly shape product decisions. In this case, consulting engagements shaped the roadmap and helped validate the eventual product thesis with a clarity that pure software bets rarely achieve. Category creation is rarely a straight line. The team deployed unexpected strategies for building a tech category from scratch—most notably The anti-demo strategy. Rather than an overproduced wow moment, they optimized for real-life proof and repeatable value in the hands of practitioners. That put credibility ahead of theatrics. Community was the flywheel. Community hacking: the Slack group that changed everything wasn’t just a channel—it was a living spec for the product and the practices around it. Pair that with The open source philosophy and you have a compounding effect: trust, transparency, and contribution. When growth went exponential, it was because the community could see, shape, and advocate for the standard. Finding dbt Labs’ first customers mattered less than building a motion they could evangelize. How consulting engagements shaped the roadmap is a reminder that early revenue can be a learning instrument. Done well, it tightens product discovery and derisks foundational bets. Funding is another decision point I pay close attention to. The critical moment: Why and when dbt Labs sought venture funding came only after the system’s constraints were obvious. Fundraising only when “things started to break” signals operational discipline—capital as a force multiplier, not a crutch. On the commercial side, the sequencing was thoughtful. How to drive commercial adoption after open-sourcing is all about value layering: permissions, governance, collaboration, and scale—capabilities that enterprises will happily pay for. That dovetails into Key monetization strategies and the eventual Pivoting from consulting to software—a move that codifies services learnings into scalable product value. There are also powerful founder operating principles here. Becoming an “accidental founder” resonates with many of us who start by solving a concrete problem and wake up running a company. Why “begrudging” CEOs can be successful underscores that obsession with the customer often beats a desire to be a CEO. Advice for finding PMF: “It’s not a playbook” reflects the truth I’ve seen across teams: seek signals, not templates. Lowering your standards is a hack is a counterintuitive push toward shipping, learning, and iterating. Navigating emotional overwhelm and Every CEO needs a coach are signals of mature leadership—build inner capacity as deliberately as product capacity. Two things every founder CEO should do: set the cadence and protect the standards. If you want a quick guide to the narrative arc and key lessons, here’s how I map it to the journey: (00:00) Introduction; (02:56) The critical oversight in data analysis; (05:41) Becoming an “accidental founder”; (07:04) Inside the unique decision to start a consultancy; (08:17) The game-changing principle behind dbt Labs’ rapid growth; (11:20) Finding dbt Labs’ first customers; (15:52) Consulting’s hidden scalability; (17:25) How dbt Labs created a new category; (21:03) The anti-demo strategy; (23:59) Community hacking: the Slack group that changed everything; (26:00) The open source philosophy; (27:39) When growth went exponential; (28:49) How consulting engagements shaped the roadmap; (30:02) Fundraising only when “things started to break”; (32:40) Consultancy superpowers: the hidden advantages; (34:04) Pivoting from consulting to software; (40:00) Key monetization strategies; (48:56) Why “begrudging” CEOs can be successful; (51:02) Advice for finding PMF: “It’s not a playbook”; (51:59) Lowering your standards is a hack; (53:30) Navigating emotional overwhelm; (54:25) Every CEO needs a coach. Referenced: Amazon Redshift: https://aws.amazon.com/redshift/ Bob Moore: https://www.linkedin.com/in/robertjmoore/ Crossbeam: https://www.crossbeam.com/ dbt Labs: https://www.getdbt.com/ Drew Banin: https://www.linkedin.com/in/drewbanin/ Jerry Colonna: https://www.reboot.io/team/jerry-colonna/ RJMetrics: https://en.wikipedia.org/wiki/RJMetrics SeatGeek: https://seatgeek.com/ Steve Ritter: https://www.linkedin.com/in/steve-ritter-69495210/ Squarespace: https://www.squarespace.com/ Where to find Tristan: LinkedIn: https://www.linkedin.com/in/tristanhandy/ Twitter/X: https://x.com/jthandy

October 20, 2025
Inside Clay’s $1.25B Playbook: Unconventional GTM, Pricing Strategy, and Enterprise Wins

Clay’s path to a $1.25B valuation isn’t conventional—and that’s exactly why it’s instructive. Through the lens of product management and go-to-market strategy, I break down how unconventional tactics, rigorous pricing decisions, and a long game on brand combined to create real upmarket momentum. If you lead product, growth, or revenue, there’s a repeatable playbook here for blending product-led growth with enterprise sales without losing speed or signal. Varun Anand is the co-founder and Head of Operations at Clay, a GTM development environment that combines data and AI to help over 5000 companies power everything from CRM enrichment to highly targeted outreach campaigns. Clay recently announced their Series B expansion, raising $40M at a $1.25B valuation. Before Clay, Varun was the Director of Operations at Newfront and the Head of Expansion at Candid. Varun also spent four years working on Hillary Clinton’s presidential campaign. Turning traditional GTM on its head, Clay’s earliest traction didn’t come from glossy campaigns—it came from scrappy sales tactics: “WhatsApp groups, Reddit threads, and reverse demos.” I’ve seen this play repeatedly outperform paid channels early because it compounds social proof in the exact communities where power users congregate. When your ICP hangs out in niche threads, customer acquisition is a function of credibility, not CPM. On pricing, “credit-based pricing” was a pivotal decision. Equally important, the team “rejected the usage-based model.” For PLG plus enterprise, this matters: credits make value legible to buyers, reduce billing anxiety for ops and finance teams, and align with predictable, budgeted workflows. In my experience, credit models also create clearer upgrade paths when your product spans multiple use cases. Clay built a robust self-serve engine and then layered “enterprise customers on top of PLG.” This sequencing avoids the trap of hiring an enterprise team before the product is self-serve-proven. It also creates cleaner handoffs—self-serve for discovery and activation, sales for proof, procurement, and expansion. Content and brand weren’t afterthoughts. Clay made a “big bet on content” and “invested in brand from day-one.” That’s a contrarian move many teams delay, but content accelerates learning loops, reduces sales cycle time, and scales enablement far beyond headcount. In enterprise sales, a trusted brand is an asset class. Winning big accounts required creative proofs of value. “Reverse demos” flipped the script—show the customer’s data, in their workflow, with their outcomes. It’s one of the fastest routes to de-risking adoption and building trust with enterprise buyers. From there, they applied a pragmatic “land and expand model” that aligns with how large organizations actually buy. Clay highlights “3 changes that unlocked Clay’s upmarket motion.” While every company’s inflection points are unique, the meta-lesson is consistent: clarify the ICP, operationalize proof (reverse demos, ROI), and meet enterprise expectations on reliability, governance, and support—without sacrificing the PLG engine. Team construction was equally intentional. Hiring people who are “technical enough” and using a “hands-on interviewing process” raised the talent bar and reduced execution drag. I’ve found this mirrors the strength of forward-deployed mindsets: product, ops, and GTM talent who can prototype, troubleshoot, and translate customer complexity into scalable systems. Finally, Clay’s contrarian take on compensation signals a willingness to design incentives for the business they want to build, not the one the market expects. Compensation philosophies quietly shape culture, velocity, and who opts in. Referenced: Anthropic: https://www.anthropic.com/ Clay: https://www.clay.com/ Clay’s Series B expansion: https://www.clay.com/blog/series-b-expansion Eric Nowoslawski: https://www.linkedin.com/in/outboundphd/ Figma: https://www.figma.com/ Jesse Ouellette: https://www.linkedin.com/in/jesseoue/ Kareem Amin: https://www.linkedin.com/in/kareemamin/ Nick Merrill: https://www.linkedin.com/in/nick-merrill-64562310/ Notion: https://www.notion.com/ Oyster: https://www.oysterhr.com/ Pave: https://www.pave.com/ Rippling: https://www.rippling.com/ Snowflake: https://www.snowflake.com/ Verkada: https://www.verkada.com/ Webflow: https://webflow.com/ Yash Tekriwal: https://www.linkedin.com/in/yashtekriwal/ My takeaway: this is a modern GTM blueprint—prove value in the wild, price for clarity, build self-serve first, then industrialize trust for enterprise. Do that, and you can scale without losing the product signals that got you traction in the first place.

October 20, 2025
Just Now Possible Preview: How Real Teams Ship AI—Workflows, RAG, Agents, Evaluation

I’m excited to share a preview of Just Now Possible, a show where I sit down with the builders who are shipping meaningful AI features in the real world. My goal is simple: pull back the curtain on how AI products actually get made—messy problems, rapid prototyping, and the leadership decisions that move teams from concept to customer value.

Watch the preview on YouTube: https://www.youtube.com/embed/Kb2HbuPbfR8?feature=oembed. Prefer audio? Listen on Spotify: https://open.spotify.com/episode/5xM0pDnqR0JpKmW6aZ0pj6?ref=producttalk.org or Apple Podcasts: https://podcasts.apple.com/us/podcast/podcast-preview/id1838832993?i=1000725807029&ref=producttalk.org. Want a text version? Read the transcript ($): #full-transcript.

How AI products come to life—straight from the builders themselves. In each episode, we dive deep into how teams spotted a customer problem, experimented with AI, prototyped solutions, and shipped real features. We dig into everything from workflows and agents to RAG and evaluation strategies, and explore how their products keep evolving. If you’re building with AI, these are the stories for you.

From my own experience leading product teams, I’ve seen that the real unlocks come from disciplined product discovery, clear outcomes vs output OKRs, and smart use of gen ai for product prototyping. We’ll talk about the tradeoffs between speed and safety, when to bring in forward deployed engineers, and how to validate product-market fit lessons before scaling. Along the way, we’ll unpack practical patterns—like when to use RAG vs fine-tuning, how to evaluate agents in production workflows, and what great product management leadership looks like in AI-first environments.

The first full episode drops on Thursday, September 18th. Don't miss it!

Full transcripts are available to paid subscribers.

Inspired by this post on Product Talk.

October 20, 2025
Building AI Products That Work: My Playbook for LLM Strategy, Evals, and Orchestration

AI features don’t succeed on clever prompts alone—they demand thoughtful product strategy, rigorous evaluation, and tight cross-functional collaboration. As a VP of Product Management and someone deeply immersed in building with Large language model (LLM) technology, I’m constantly refining how we turn generative capabilities into real customer value. This episode of All Things Product zeroes in on that challenge, and it captures many of the principles I rely on when shipping AI to production.

The central question resonates with every product leader I know: How do product teams learn to build AI-powered products “beyond just dabbling with ChatGPT”? I appreciate how the conversation moves past novelty and into the disciplines that make AI reliable, safe, and outcome-oriented.

One metaphor that always lands for me: building AI features is less like writing a single “killer prompt” and more like orchestrating a team of “interns.” You define roles, break down work, set guardrails, and continuously review outputs. That orchestration mindset, coupled with strong observability, evals, and ongoing maintenance practices, is what separates flashy demos from repeatable product value.

Here’s how I frame the work. First, there’s a difference between an AI-powered product manager and an AI product manager. Many of us are becoming AI-powered—using tools to accelerate discovery, ideation, or execution. But when you own AI features end-to-end, you inherit new responsibilities: modeling risks, defining evaluation strategies for non-deterministic systems, and treating prompts and data pipelines as core product surfaces.

Prompt engineering for a product is fundamentally different from prompting ChatGPT for personal use. In production, I rely on prompt decomposition and orchestration—explicitly breaking a task into steps, assigning each step to the right capability, and enforcing consistent formats. This reduces variance, improves debuggability, and enables targeted evals that catch regressions before customers do.

System design and risk mitigation become front and center. I align early with engineering, legal, security, and support on failure modes, privacy expectations (including Personal information or personally identifiable information (PII)), and rollout plans. We log traces for every critical path, treat prompts as versioned assets, and use observability to connect inputs, intermediate states, and outputs. When something drifts, we need to see it fast, explain it, and fix it.

Evaluating non-deterministic AI features is its own craft. “Thumbs up/thumbs down” isn’t enough. I design layered evals: unit-level checks for correctness and formatting, scenario-level evals for edge cases and risk behaviors, and longitudinal evals to monitor model and data drift over time. Clear acceptance thresholds and shadow deployments help us balance velocity with reliability.

Deciding when AI is the right solution starts with the customer problem, not the model. I ask: Is the task ambiguous enough to benefit from generation? Can we bound the failure modes? Do we have affordable latency and cost envelopes? And what’s the graceful fallback if the model underperforms? If a deterministic algorithm or simple rules solve it better, we choose that—no heroics.

The hidden cost of AI is maintenance. Prompts rot as upstream models change. New data skews behavior. Guardrails that worked yesterday might not hold tomorrow. That’s why ongoing evals, robust logging, and a change-management plan (for prompts, schemas, and policies) are non-negotiable. Treat AI features as living systems, not one-off launches.

If you’re exploring gen ai for product prototyping, start small. Pick a narrow, high-value workflow, instrument everything, and ship with clear success metrics. Use your first release to build your team’s muscles around observability, evals, and cross-functional collaboration. The goal is not a perfect model; it’s a reliable product outcome.

Want to go deeper? Listen to the full conversation here: Spotify | Apple Podcasts. Prefer video? Watch on YouTube: Building AI Products.

What you’ll learn in this episode:

– The difference between an AI-powered product manager and an AI product manager

– Why prompt engineering for a product is different from prompting ChatGPT for personal use

– The role of prompt decomposition and orchestration in building robust AI features

– How to think about system design, risk mitigation, and cross-functional collaboration

– Why observability and logging traces are critical for LLM products

– The challenge of evaluating non-deterministic AI features (and why “thumbs up/thumbs down” isn’t enough)

– How to decide when AI is the right solution for a customer problem

– The hidden cost of ongoing maintenance for AI features

Join the conversation: What practices have helped you ship reliable AI features? Drop your thoughts and questions in the comments—I’d love to learn from your experiences.

Inspired by this post on Product Talk.

October 20, 2025
From Disruption to Breakthrough: How Stack Overflow’s AI Pivot Became a Product Playbook

Generative AI doesn’t knock politely—it kicks the door open and forces product teams to re-think the fundamentals. I’ve lived through my share of market shifts, and the story of Stack Overflow’s AI journey hits every note of what it takes to respond with clarity, speed, and rigor.

When ChatGPT launched, Stack Overflow faced a cataclysmic shift: developer behavior was changing overnight. That single sentence captures the urgency I felt as I studied this case: habits, traffic patterns, and value perceptions transformed almost instantly.

Consider the timing: Ellen Brandenburger stepped into Stack Overflow just two weeks before ChatGPT launched. In her shoes, I would have immediately asked the same questions she did: What new developer workflows are becoming “just now possible”? How quickly can we prototype without compromising quality or trust? And how do we avoid overcorrecting in a moment of uncertainty?

In response, the team created Overflow AI, a concentrated effort to explore “what’s just now possible” for developers. I love this framing—it anchors exploration to near-term feasibility while keeping sight of evolving user needs. It’s the kind of focused discovery effort I encourage when a platform-defining shift hits.

They moved through four disciplined iterations of conversational search, each an experiment with clear hypotheses and guardrails:

V1: a chat UI on top of keyword search

V2: semantic search to handle natural questions

V3: fallback to GPT-4 for gaps in Stack Overflow’s corpus

V4: adding RAG for attribution and transparency

Two principles stood out as non-negotiable: attribution and transparency. For developers, trust depends on knowing where an answer came from, why it’s relevant, and whether it reflects source truth. I’ve found the same in my own teams—without provenance and clarity, even great answers feel shaky.

The team’s evaluation approach was refreshingly pragmatic: simple spreadsheets and subject-matter experts assessing accuracy, relevance, and completeness. In my org, we’ve adopted similar lightweight scorecards before scaling LLM investments; it keeps us honest about quality before we fall in love with a demo.

Here’s the moment that demonstrates real product management leadership: despite the investment, Stack decided to sunset conversational search when it couldn’t meet developer standards. That discipline—choosing not to ship what isn’t good enough—preserves brand trust and creates space for a better bet.

And that better bet was a strategic pivot: the team leaned into data licensing, leveraging its 14M+ Q&A corpus to power LLM training and benchmarks. Instead of treating AI as a threat, they turned their differentiated asset into a durable business line.

They went further, building industry benchmarks with subject-matter experts to prove Stack data improved LLM accuracy and relevance. This is exactly how I think about outcomes vs output: quantify lift against real tasks, validate with domain experts, and package value in a way decision-makers can trust.

Key lessons I’m taking forward:

Take one bite of the apple at a time—prototype, learn, iterate.

Product in the AI era means managing probabilities, not certainties.

For context, Ellen Brandenburger is a product leader and coach; former head of product at Chegg Skills and Stack Overflow’s data licensing team. Her arc through this transformation underscores what matters most right now: tight feedback loops, transparent evaluation, and the courage to pivot from feature bets to business model bets when the evidence demands it.

If you’re leading gen AI initiatives, treat this as a playbook: form a focused “just now possible” team, instrument quality with SMEs early, obsess over attribution and transparency, and be willing to sunset—even after heavy investment—when the work doesn’t clear your user’s bar. Then, zoom out: your unique data and workflows may be the moat. Build for that.

Inspired by this post on Product Talk.

October 20, 2025
Mastering AI Evals: Real-World Discovery Tactics to Ship Quality, Safe, Reliable AI

I’ve been shipping GenAI features long enough to know that clever prompts and orchestration aren’t enough. What actually matters is evidence: Does the system work, for whom, and under what conditions? That’s where rigorous AI evals come in—the backbone of building reliable, safe, and continuously improving AI products.

In a recent conversation focused entirely on evaluation, I dug into what “evals” mean in the AI/ML world, why they’re more than just quality assurance, and how to operationalize them end to end. If you want to explore the discussion, listen on Spotify: https://open.spotify.com/episode/7mSiEGSYNO4sXeGAVTJO4V or Apple Podcasts: https://podcasts.apple.com/kh/podcast/ai-evals-discovery/id1794203808?i=1000727980774. There’s also a video version on YouTube: https://www.youtube.com/watch?v=pfSIQMrWhQE.

Here’s how I frame evals with my teams. First, define the behavior you want to see in terms real users care about. Then codify that intent as tests that run consistently. I distinguish between golden datasets, synthetic data, and real-world traces. Golden datasets capture canonical examples that represent “ground truth.” Synthetic data fills important gaps quickly and safely. Real-world traces keep you honest and reflect evolving usage.

The most durable loop I’ve found is simple: identify error modes, turn them into evals, and automate. This is where error analysis pays off. Some checks should be purely deterministic—code-based checks that evaluate structured outputs, schemas, or policies. Others benefit from LLM-as-judge when human-like judgment matters, as long as you calibrate and continuously verify those judges with spot checks and inter-rater agreement.

Discovery practices should inform every evaluation step. If you’re doing “Story-Based Customer Interviews,” you can derive realistic scenarios, acceptance criteria, and edge cases directly from user narratives. That context sharpens the evals and prevents you from overfitting to toy problems or proxy metrics that don’t reflect user value.

Evals require ongoing care and feeding. Criteria drift is real—what counted as “good” six weeks ago may not satisfy users after you ship a new capability or your audience evolves. I treat the eval suite like living product infrastructure: versioned, reviewed, and owned. When we change prompts, models, or retrieval strategies, the evals run first, then we examine deltas, regressions, and surprises before anything reaches production.

Guardrails and human oversight work hand-in-hand with evals. Guardrails enforce non-negotiables (safety, privacy, compliance), while evals measure progress against nuanced goals (relevance, helpfulness, tone). In high-stakes workflows, I combine pre-deployment evals, runtime guardrails, and spot human review. The goal isn’t to eliminate humans; it’s to focus their attention where judgment and context matter most.

Practically, I start with a minimal eval harness that standardizes inputs and outputs—often in JSON (JavaScript Object Notation)—and writes repeatable tests. I maintain a small golden dataset, add targeted synthetic data for coverage, and stream real-world traces into the suite once we have consent and redaction in place. For subjective criteria (e.g., tone, helpfulness), I layer in LLM-as-judge with calibration. For objective checks (e.g., schema validation, policy compliance), code-based checks are my default.

Tooling evolves quickly, but the principles hold. Whether you’re working with Anthropic or experimenting with V0 or Lovable in your prototyping stack, the eval loop stays the same: define success, test it the same way every time, and close the loop with learning. If you’re a product creator or leading forward deployed engineers, this discipline accelerates gen ai for product prototyping without sacrificing safety or quality.

I also tie evals to outcomes vs output OKRs. Instead of “ship three prompts,” we commit to measurable outcomes like resolution rate, time-to-answer, or a target “helpfulness” score. In customer support ai strategy, we monitor real-world traces, CSAT, and handoff quality to ensure the AI augments agents rather than creating silent failure modes. That’s how evals drive product-market fit lessons instead of just dashboards.

If you want to go deeper, explore these foundational concepts and tools: ML (Machine learning), LLM (Large language model), “AI Evals for Engineers and PMs”: https://maven.com/parlance-labs/evals, “The Product Leadership Wheel – A Framework for Defining and Growing Product Leadership at Scale”: https://www.petra-wille.com/plwheel, “How I Designed & Implemented Evals for Product Talk’s Interview Coach”: https://www.producttalk.org/2025/09/interview-coach-evals/, “Behind the Scenes: Building the Product Talk Interview Coach”: https://www.producttalk.org/2025/08/customer-interview-coach/, V0: https://vercel.com/docs/v0, JSON (JavaScript Object Notation): https://en.wikipedia.org/wiki/JSON, Anthropic: https://www.anthropic.com/, Lovable: https://lovable.dev/, and “Story-Based Customer Interviews”: https://learn.producttalk.org/course/story-based-customer-interviews.

If this resonates, I’ll be sharing weekly lessons learned from building and evaluating AI features in the wild, plus conversations with cross-functional teams about real-world AI development. Have thoughts or a tactic that’s worked for you? Drop a comment and let’s compare notes.

Inspired by this post on Product Talk.

October 20, 2025
Inside Braze’s Blitz to $500M CARR: Bold PM Lessons on Going Global and Outsmarting Rivals

I’ve long believed the best product breakthroughs happen at the intersection of market timing, technical first-principles, and relentless customer discovery. Braze’s trajectory is a compelling proof point. Bill Magnuson is the co-founder and CEO at Braze, along with Kevin Wang, who joined as employee #8 and serves as the CPO. The two MIT graduates have built Braze into a publicly listed customer engagement platform with a $4.4B market cap. In 2023, Braze surpassed $500M in CARR, and serves over 2,200 customers worldwide. Before Braze, Bill spent time at Bridgewater Associates. Kevin’s academic background is in brain & cognitive sciences, and prior to joining Braze he worked at Accenture and Brewgene.
What strikes me most is how early conviction catalyzed execution. The Braze founders’ early insights into the mobile revolution weren’t abstract theses; they translated into concrete product choices that aligned with the emerging realities of push notifications, in-app messaging, and event-driven personalization. That early bet on mobile-first customer engagement created strategic leverage that compounding growth later amplified.
Origin stories matter because they encode the decision-making DNA. How a TechCrunch Hackathon sparked Braze’s creation is a reminder that speed to learning often beats speed to launch. Meeting co-founders at an NYC Hackathon stacked the deck for chemistry and complementary skills — a pattern I’ve seen repeatedly when teams form around real problems and prototype under time pressure.
Finding “terminal value” product market fit is more than PMF — it’s about enduring utility that scales with customer complexity. I appreciated how they framed the search as “fishing in every pond,” testing use cases and segments broadly while retaining a coherent platform strategy. That duality — breadth of exploration with depth of conviction — is precisely how I guide teams through product discovery when the surface area of opportunity is vast.
The early journey from 1,000 beta signups to 2,200+ paying customers underscores a disciplined funnel from interest to value to revenue. Braze’s scrappy scaling and early product development show that sometimes you must resist playbook dogma. Breaking the rules of a lean startup doesn’t mean ignoring hypotheses; it means investing ahead of the curve when platform primitives (data, messaging, orchestration) are the real unlock for long-term differentiation.
Navigating early fundraising challenges often forces sharper articulation of strategy and sequencing. I’ve found that the “why now” and “why this architecture” narratives become decisive — especially when your thesis runs counter to conventional wisdom. In Braze’s case, riding the mobile wave to success was inseparable from building the right infrastructure for real-time engagement and global scale.
Competition is inevitable; how you posture is a choice. Approaching competition strategically like a boxer resonated with me — pick your angles, conserve energy, and control the fight tempo. Translate that into product terms: choose the battles that exploit your architectural strengths, avoid the feature-by-feature brawl, and make category-defining bets where your feedback loops are fastest and most defensible.
Globalization rewards systems thinking. Building a global customer base requires architectural foresight (latency, compliance, localization), go-to-market nuance, and a repeatable model for entering new regions. When scale helps or hurts is an under-discussed reality — some processes must centralize; others need to decentralize to stay close to the customer signal. The never-ending quest for PMF is real; every new segment, channel, and geography is a fresh PMF search with its own “viable path to value.”
If I had to distill the practitioner takeaways, I’d start with this: prioritize platform primitives over shiny features; measure learning velocity, not just shipping velocity; and align resourcing to “terminal value” outcomes, not activity. That’s how you out-execute better-funded rivals and convert timing advantages into durable moats.
Referenced:
Accenture: https://www.accenture.com/
Appboy: https://www.braze.com/resources/articles/appboy-social-network-for-mobile-apps
Bipul Sinha: https://www.linkedin.com/in/bipulsinha/
Braze: https://www.braze.com/
Bridgewater Associates: https://www.bridgewater.com/
Jon Hyman: https://www.linkedin.com/in/jon-hyman/
Mark Ghermezian: https://x.com/markgher
MIT: https://www.mit.edu/
Rubrik: https://www.rubrik.com/
WeWork: https://www.wework.com/

October 20, 2025
How Guideline Rewired 401(k)s: First‑Principles Strategy, Gusto Edge, and Product Wins

“I don’t believe in stealth mode” is a product mantra I’ve long embraced, and it immediately came to mind as I dug into how Guideline modernized 401(k)s for small and medium-sized businesses. In a space dominated by incumbents and legacy processes, transparency and execution in public view can be a superpower. That ethos, paired with disciplined product discovery, comes through clearly in Guideline’s story.
Kevin Busque, the co-founder and CEO of Guideline, saw the problem up close while building Taskrabbit: traditional 401(k) plans suffered from complexity, low participation, and “confusing fee structures.” As a product leader, I’ve watched similar frictions stall adoption in other regulated categories—when fees are opaque and onboarding is arduous, engagement dies before it starts. The insight was simple but profound: remove confusion, automate compliance, and make default participation the norm.
After launching Guideline to address those problems head-on, the company rapidly validated market pull, hitting $120 million in ARR by June 2024. That milestone reflects more than growth—it’s evidence that a first-principles approach to retirement plans can outcompete legacy playbooks. It also highlights the compounding impact of product decisions that prioritize clarity, automation, and aligned incentives.
What impressed me most was the “Do the hard thing first” mindset. In practice, that meant investing early in infrastructure others avoided, like deeply integrated payroll workflows and robust compliance automation, rather than deferring them as future tech debt. It’s the opposite of chasing shiny objects: master the unglamorous backbone and everything else compounds.
On market entry, Guideline focused on nailing product-market fit by aligning with payroll ecosystems where SMBs already live. The Gusto partnership was a pivotal move—“Kevin’s insights from the Gusto integration” underscore how strategic distribution, combined with a clean UX and transparent pricing, became a durable edge. Compared to heavyweights like ADP, Fidelity, Paychex, and Intuit, Guideline reframed the buyer journey around simplicity and trust.
Pricing matters in retirement more than most founders realize. “How Guideline set their fees up” and “Lucky 8: Kevin’s unexpected pricing strategy” show how precise pricing architecture can both demystify costs and drive adoption. Clarity isn’t just a marketing claim—it’s a feature that reduces cognitive load and increases participation rates.
I also appreciated how early traction came from a surprisingly broad customer mix—“The surprising range of Guideline’s early customers” points to a product that generalized well across verticals without losing focus. “Working with Plaid as Guideline’s first customer” exemplifies how partnering with trusted fintech brands accelerates credibility and creates feedback loops that sharpen the product.
Defaults drive outcomes. “Guideline’s auto-enrollment feature” is a great example of using behavioral design to improve financial health at scale. When the right default exists and the friction is removed, participation becomes the baseline, not the exception. It’s a masterclass in aligning product and policy to deliver real retirement outcomes, not just feature checklists.
From a roadmap perspective, I was struck by the discipline in resisting premature expansion—“Will Guideline ever go multi-product?” is a nuanced question for any scaling company. “Kevin’s take on product-market fit” and “Guideline’s compounding advantage” reinforce a principle I live by: compound depth before breadth. Every integration, every compliance workflow, every support touchpoint can either compound or fragment your advantage.
Finally, leadership matters as much as strategy. “The challenges faced by introverted leaders” resonate deeply with how I build teams: create space for deep work, institutionalize written decision-making, and use clear operating principles so the product vision scales beyond the founder. It’s the quiet, consistent habits—not the loud slogans—that hold complex products together.
For product leaders working on regulated, high-stakes categories like retirement plans, healthcare, or financial services, the lessons are clear: conduct rigorous product discovery before you ship, pursue distribution advantages through strategic partnerships, architect pricing as an experience, and let default-driven features (like auto-enrollment) do the heavy lifting. That’s how you rewire entrenched markets—by doing the hard thing first, and doing it in the open.

October 20, 2025
What Makes or Breaks Executive Hires: My Lessons on Fit, Red Flags, and Measuring Success

Executive hiring is one of those rare decisions that can bend a company’s trajectory. In my role leading product management at a high-growth SaaS company, I’ve seen the difference between a leader who compounds value and one who quietly drains momentum. That’s why I was eager to examine what actually makes (or breaks) these bets, and to share a practical lens you can use to improve executive hiring outcomes.
I sat down with Eeke de Milliano for a focused conversation on the realities of executive hiring, leadership transitions, and measuring success. We dig into the “buy or build a leader” decision, how to avoid common red flags, and what it takes to set executives up to thrive in hyper-growth environments.
Eeke de Milliano is the Head of Global Product at Stripe, helping drive innovation and success in the company’s product line. Before this role, she was Head of Product at Retool and co-founded Constellate. Eeke previously spent 6 years as Product Lead at Stripe, working with the company during their hyper-growth era.
In today’s episode, we discuss how to rigorously assess executive hiring fit, including the challenges companies face when hiring new executives and the most common red flags and pitfalls I see teams miss under time pressure. We also explore practical advice for measuring success, especially when outcomes vs output get muddled in the first 90–180 days.
A recurring theme for me is that learning your own strengths is an underrated piece of the process. If you don’t understand the leadership leverage you already have on the team, you’ll over-hire for breadth or under-hire for depth. Great executive hiring clarifies the complementary edge you need—then measures it.
On the buy vs build decision: early signals matter. If you’re “buying” an external leader, pre-align on scope, authority, and what great looks like before day one. If you’re “building” from within, design a clear on-ramp and operating cadence so the leader can scale without drowning. In both cases, my mental model is to instrument leading indicators (team health, decision velocity, stakeholder trust) well before lagging business metrics fully show up.
Two red flags I always watch for: first, leaders who default to playbooks without interrogating context; second, leaders who cannot articulate how they measure success beyond activity and output. In hyper-growth, pattern-matching is useful—but uncalibrated pattern-matching is dangerous.
The human dynamics matter just as much as the strategy. What creates dysfunctional exec relationships is often misaligned interfaces: unclear decision rights, overlapping charters, or incentives that reward local maxima. High-functioning executive teams are like parents—a united front in public, with candid debate in private, anchored to shared principles and measurable outcomes.
Referenced:
ASML: https://www.asml.com/en
Claire Hughes Johnson: https://www.linkedin.com/in/claire-hughes-johnson-7058/
Constellate: https://constellate.team/
John Collison: https://www.linkedin.com/in/johnbcollison/
Mike Maples Jr.: https://www.linkedin.com/in/maples/
Patrick Collison: https://www.linkedin.com/in/patrickcollison/
Retool: https://retool.com/
Stripe: https://stripe.com/
Will Gaybrik: https://www.linkedin.com/in/william-gaybrick-5730347/
Where to find Eeke:
LinkedIn: https://www.linkedin.com/in/eeke-de-milliano-3b05a629/
Timestamps:
(00:00) Should you ‘buy or build’ a leader
(03:45) Why do executive hires fail so often?
(09:35) Why the stakes are so high for leadership hires
(12:26) The hardest document Eeke ever wrote
(14:06) Two red flags in a new hire
(17:27) An example of an outstanding leader
(21:40) What creates dysfunctional exec relationships
(22:38) The three steps towards hiring successful leaders
(30:30) What you should know about outside hires
(33:12) Eeke’s advice for easing leadership transitions
(42:06) How to notice success patterns
(47:21) Why high-functioning executive teams are like parents
(52:02) The most surprising lesson from Eeke’s first stint at Stripe
(55:11) The leadership data Eeke wishes we had

October 20, 2025
Scrappy Outbound to ‘Hyperbolic’ PMF: How a COVID Pivot Fueled Owner’s Explosive Growth

I’m drawn to origin stories that turn constraints into catalysts, and this one is a masterclass. Adam Guild is the co-founder and CEO at Owner, an online food ordering system for independent restaurants. Within a year, Owner went from being about to run out of money to having hundreds of customers. Last year, they raised a $33M Series B. Those numbers only make sense when you see the scrappy tactics and the decisive post-COVID pivot that unlocked genuine product-market fit.

Adam’s entrepreneurial journey began as a teenager when he built a successful Minecraft server, which led him to drop out of high school to become a founder. His passion for helping small businesses was sparked by his mom’s struggles running a dog grooming shop, which led him to launch the early iteration of Owner. As a product leader, I recognize that kind of founder-market fit instantly—the best ideas often surface at the intersection of lived pain and hands-on tinkering.

What struck me most was how working with a small business kickstarted Owner. Rather than “build it and they will come,” Adam embedded with real operators, learned their workflows, and shipped fast iterations that directly moved revenue and saved time. I’ve seen this forward-deployed product approach outpace traditional discovery for SMB tools—when you sit in the kitchen, the point-of-sale line, or the back office, your prioritization gets brutally clear and your product discovery becomes grounded in outcomes, not outputs.

Adam’s unusual outbound strategy was a reminder that early-stage go-to-market is a craft. Cold outreach, hands-on onboarding, and relentlessly personalized pitches carried them through the zero-to-one phase. When your ICP is time-starved and margin-conscious, “unscalable” tactics are often the most scalable path to signal: you earn trust, collect high-fidelity feedback, and create case studies that compound.

Then came the COVID pivot. The pandemic accelerated Owner’s success because it reshaped demand overnight: independent restaurants needed a direct, online ordering system to survive. The teams that won were those that eliminated adoption friction, connected the dots between channel, product, and operations, and carried the emotional weight of their customers’ reality. This is where Owner’s speed and empathy turned into a durable advantage.

The quest to find product-market fit crystallized around clear signals: urgent pull from operators, fast time-to-value, and repeatable outcomes. How Owner’s pivot led to “hyperbolic” product-market fit is the throughline—usage intensity, referrals, and condensed sales cycles all pointed to a solution that was now indispensable. Inside Owner’s explosive growth, I see a tight loop: ship, sit with customers, quantify impact, then scale only what works.

What actually worked to get new customers? Channel–product fit over channel proliferation. High-intent outreach, proof via live results, and visible social proof from recognizable restaurants created momentum. I also appreciated the pragmatic lens on partnerships and content—operators trust peers and practical playbooks more than generic marketing. Mentions ranging from Guisados to P.F. Chang’s highlight how credibility compounds when you deliver consistently.

How Owner secured its crucial first round of funding reinforced a familiar truth: narrative quality rises with clarity of problem, velocity of learning, and evidence of market pull. The constellation of names referenced—Alex Bard, Dean Bloembergen, Jack Altman, Kimbal Musk, Naval Ravikant, Neil Patel, Peter Thiel, Sean Rad—underscores how operational rigor plus a resonant mission attracts heavyweight believers. Communities like Thiel Fellowship and Y Combinator also surfaced as formative ecosystems that sharpen founders and widen networks.

The bet on going multi-product is a pivotal inflection in any SMB platform’s life. Expansion only works when each new capability deepens core value for the same buyer, not when it dilutes focus. The winning pattern: solve one painful job thoroughly, earn the right to add adjacent workflows, and measure expansion by net retention and attach rate—not by a feature checklist. This is where outcomes vs output OKRs prevent drift.

I also took note of the hiring philosophy. The two qualities Adam looks for in new hires map to what I’ve seen drive early-stage slope: people who love the problem and run toward responsibility. Narrow hiring bars, clear scorecards, and hands-on working sessions outperform generic interviews—especially when your customers are small businesses who need speed, reliability, and care.

Sales-led vs. product-led growth is often framed as a binary, but in SMB, the blend matters. Early on, a sales-led motion validates willingness to pay and compresses feedback cycles; as fit tightens, product-led loops amplify reach and reduce CAC. The art is knowing when to transition emphasis, which KPI to optimize at each phase, and how to keep experience quality high as you scale onboarding and support.

For additional context and inspiration, the conversation touched on operators, thinkers, and platforms such as HubSpot, Modern Restaurant Management, and communities like Y Combinator and the Thiel Fellowship, alongside individuals including Alex Bard, Dean Bloembergen, Jack Altman, Kimbal Musk, Naval Ravikant, Neil Patel, Peter Thiel, and Sean Rad. The range of perspectives mirrors the range of skills modern product leaders need to wield—customer empathy, scrappy GTM, and disciplined execution.

My takeaway is simple: scrappiness fuels discovery, clarity fuels scale. Owner’s journey—from near-zero runway to hundreds of customers and a $33M Series B—shows how a decisive, customer-obsessed pivot can transform a fragile idea into an enduring company. If you build for independent restaurants or any SMB segment, the blueprint holds: earn trust through results, tighten feedback loops, and let product-market fit pull you forward.

October 20, 2025