Tag: gen ai

  • DevTools at Scale: Hard-Won Lessons on PMF, AI, and Culture from Apple, AWS, Microsoft

    DevTools at Scale: Hard-Won Lessons on PMF, AI, and Culture from Apple, AWS, Microsoft

    Building and scaling DevTools has taught me that world-class culture and relentless product focus are non-negotiable. Drawing on experiences across Amazon, Apple, and Microsoft—and hard-won lessons from startups like Unblocked and Buddybuild—I’m sharing the principles I rely on to ship great developer products at scale.

    Why building for developers is different: developers are discerning, allergic to friction, and quick to churn if the DX isn’t exceptional. That means fast setup, clear docs, ergonomic APIs, sane defaults, and deep integrations with GitHub, GitLab, Bitbucket, Confluence, AWS, and Microsoft Azure.

    I benchmark teams against gold-standard platforms like Stripe, Twilio, and Looker—tools that reward mastery, never bury the lede, and make success observable in minutes, not days.

    From the early days of Buddybuild, the signal was unmistakable: remove toil from CI/CD, shorten feedback loops, and teams will expand usage without a sales nudge. The pattern holds across DevTools: when time-to-value approaches zero, the product sells itself.

    Early signs of product market fit: organic team-to-team adoption, repeatable setup success, contribution from power users, and inbound demand you cannot keep up with. When these show up, “Why great product is everything” stops sounding like a platitude and starts reading like a P&L.

    Monetizing product market fit is straightforward if you align value and pricing units. Seat-based maps to collaboration; usage-based maps to compute, API calls, or storage; hybrid models reduce edge-case friction. Keep the packaging simple and double down on “The power of positioning.”

    AI is complicating product market fit. Gen AI accelerates gen ai for product prototyping, but it also introduces instability: model drift, hallucinations, and evaluation blind spots. I build an evaluation harness, human-in-the-loop review for risky flows, and a clear customer support ai strategy before scaling.

    Being customer-obsessed is the moat. I embed forward deployed engineers with key customers to translate real workflows into product decisions, close the empathy gap, and validate behavior in production environments.

    On decision-making, I blend product discovery with crisp documents and measurable bets: PRFAQs or design docs to clarify intent, guardrails in analytics, and outcomes vs output OKRs to keep teams aligned to impact.

    Unblocked, a developer tool that lets you talk to your codebase, points toward a future where code search, context, and refactoring converge into conversational workflows. I’m bullish on the pattern, but I stay sober about failure modes and cost-to-serve.

    Here’s my cautious take on AI: latency, privacy, and provenance matter as much as model quality. The best teams treat prompts as product, training data as liability, and evaluation as a first-class release gate.

    Hiring is where many teams stumble. Don’t over-index on competency when hiring. I optimize for learning velocity, ownership, and kindness under pressure. Competency scales output; character scales organizations.

    As a second-time founder and operator, I treat mental health like uptime. I schedule recovery, define non-negotiables, and surround myself with peers who normalize the hard days. Burnout is a systems failure, not an individual weakness.

    I don’t do demos. I prefer self-serve trials with instrumented onboarding, sample projects, and guardrails that let the product do the talking. If a prospect can’t succeed in 15 minutes, we fix the product, not the deck.

    On customer feedback, I separate noise from signal with cohorts and context. I prioritize requests that reduce time-to-value, unblock integrations, or meaningfully expand the surface area of successful use cases. That’s how to deal with customer feedback without losing strategic focus.

    To build and scale DevTools, keep the bar high and the loop tight: ship small, watch usage, learn fast. Invest in platform reliability, rock-solid SDKs and CLIs, and a developer experience that earns trust release after release.

    Resources and touchstones I revisit often:

    Apple’s acquisition of Buddybuild: https://www.cnbc.com/2018/01/02/apple-agrees-to-buy-buddybuild.html

    AWS: https://aws.amazon.com

    Bitbucket: https://bitbucket.org

    Confluence: https://www.atlassian.com/software/confluence

    GitHub: https://github.com

    GitLab: https://gitlab.com

    Looker: https://looker.com

    Microsoft Azure: https://azure.microsoft.com

    Stewart Butterfield: https://www.linkedin.com/in/butterfield/

    Stripe: https://stripe.com

    Twilio: https://twilio.com

    Unblocked: https://getunblocked.com/

    If you’re building for developers, stay ruthless about simplicity, respectful of their time, and obsessed with proof in production. That’s how durable product-market fit is earned—and monetized.


    Book a consult png image
  • A Masterclass in Founder Conviction: Gong’s $100m ARR, PMF Breakthroughs, and AI Sales

    A Masterclass in Founder Conviction: Gong’s $100m ARR, PMF Breakthroughs, and AI Sales

    I’ve long believed that true product leadership is measured by conviction you can defend with data. That’s why the story of Gong resonates so deeply with me. Eilon Reshef is the co-founder and CPO at Gong, an AI-powered platform that tracks, records, and analyzes sales calls to drive revenue growth. In 2021, Gong raised $250M at a $7.25B valuation. Gong was one of the fastest SaaS companies to hit $100m ARR, and now has over 4000 customers. Before Gong, Eilon sold his previous e-commerce startup, Webcollage. Why does this matter to product creators like us? Because betting on recording sales calls wasn’t a popular opinion at the time—it was a bold thesis about conversation data as the primary system of record for revenue. The insight was simple and powerful: conversations are the most unstructured and under-utilized signal in B2B sales. Capture them end-to-end, analyze them with AI, and you unlock repeatable sales execution at scale. I was bullish on this category early for the same reason: recording sales calls converts ephemeral “tribal knowledge” into searchable, coachable truth. That enables better product discovery, sharper positioning, and tighter feedback loops between go-to-market and product—even more so as gen ai capabilities matured. Early product-market fit signals were unmistakable: persistent usage by frontline reps, managers organically building coaching rituals around insights, and executives tying outcomes to pipeline velocity and win rates. The emergence of “raving fans” wasn’t a vanity metric—it was the leading indicator that the product was changing behavior and embedding into daily workflows. Keeping the beta lean was crucial. Instead of building a feature buffet, the focus stayed on a few, high-utility workflows that consistently delivered value in the wild. In my own teams, we mirror this with forward deployed engineers and a tight set of design partners who are willing to co-develop, tolerate rough edges, and trade early access for tangible impact. Design partners, when chosen well, become your reality check and your accelerant. Their hardest problems guide prioritization; their workflows reveal where friction truly lives. This is where outcomes vs output OKRs matter—measuring behavior change and revenue outcomes, not just shipped features. The initial demo reactions often sounded like a referendum on change management: legal concerns about recording, rep discomfort, or doubts about AI accuracy. Strong founder conviction met these with data and empathy—clear consent frameworks, rapid improvements in transcription and modeling, and, most importantly, undeniable win stories that reframed risk as opportunity. Monetization followed the value. Pricing and packaging worked best when buyers could connect usage directly to measurable outcomes: faster ramp, better forecast accuracy, higher conversion rates, and more consistent deal execution. With a land-and-expand motion, teams saw success at the manager pod level before scaling across the org. I appreciated the disciplined approach to the roadmap. A unique product roadmap framework anchored on durable customer outcomes created internal clarity: which insights change coaching, which recommendations change behavior, and which automations remove repetitive work. This is classic product management leadership—create alignment with narrative, evidence, and a few high-conviction bets. The journey to multi-product was a natural extension of product-market fit. Start with conversation intelligence; expand to adjacent revenue workflows where the same data asset offers compounding value—forecasting, deal risk, enablement, and coaching. The throughline: one trusted data layer, many value surfaces. Having built AI products since 2015, I’ve learned to prioritize data quality, model reliability, and tight human-in-the-loop design. The best gen ai experiences pair high-recall analysis with opinionated UX that guides managers and reps to take the next best action. That’s how you turn insights into habits. Looking ahead, the future of AI in B2B sales efficiency is practical autonomy: assistants that summarize calls, draft follow-ups, update CRM fields, flag risks, and trigger playbooks—without adding workflow friction. The winners will combine precision models, secure data handling, and workflow-native delivery. Measuring success goes beyond dashboard vanity. What matters: adoption depth across roles, coaching frequency, deal cycle time, conversion lift, forecast accuracy, and the creation of “raving fans” who advocate internally and externally. When the product becomes the backbone of pipeline conversations, you’ve crossed the line from tool to system. I also see enduring relevance in foundational thinking like Crossing the Chasm. It explains why design partner fit precedes market fit, why early majority buyers demand social proof, and why operational excellence matters as much as product insight during hypergrowth. If you want to explore the broader ecosystem and resources mentioned, here are the references exactly as noted: Act-On Software: https://act-on.com/ Amit Bendov: https://www.linkedin.com/in/amitbendov/ BlueJeans: https://www.bluejeans.com/ Crossing the Chasm: https://www.amazon.com/Crossing-Chasm-3rd-Geoffrey-Moore/dp/0062292986 Gong: https://www.gong.io/ Mistral: https://mistral.ai/ OpenAI: https://openai.com/ Salesforce: https://salesforce.com/ Webcollage: https://www.crunchbase.com/organization/webcollage Webex: https://www.webex.com/ Zoom: https://zoom.us/ Where to find Eilon Reshef: LinkedIn: https://www.linkedin.com/in/eilonreshef/ For product leaders, the takeaways are clear: anchor on customer outcomes, cultivate design partners who become co-authors of your roadmap, use gen ai for product prototyping to accelerate discovery, and measure conviction not by opinions but by repeatable revenue impact. That is the essence of durable, product-market fit lessons you can operationalize today.
    Book a consult png image
  • Developing Technical Taste: My Playbook for Next‑Gen Engineers, AI Strategy, and 2024 Scaling

    Developing Technical Taste: My Playbook for Next‑Gen Engineers, AI Strategy, and 2024 Scaling

    When I think about the next generation of engineers and product creators, one capability consistently separates the great from the good: technical taste. It’s the intuition to choose the simplest viable path, the humility to iterate, and the courage to ask “what if” before everyone else. In this piece, I share how I frame technical taste, what it means for AI strategy, and how to scale software teams in 2024 without losing speed or product-market fit.

    Sam Schillace is the CVP and Deputy CTO at Microsoft. Before Microsoft, Sam held prominent engineering roles at Google and Box. He has also founded six startups, including Writely, which was acquired by Google and became Google Docs.

    In this deep dive, I explore themes like “Sam’s advice for future engineers,” “What’s next for AI,” “How to develop technical taste,” “The importance of asking ‘what if’ questions,” “Lessons on market timing,” and “Scaling a software company in 2024.” My lens is product management leadership at scale, with a bias toward clear decision-making, rapid learning, and compounding leverage.

    On market timing, my experience echoes the principle that momentum compounds only after you align product insight with the market’s inflection point. “The Innovator’s Dilemma” reminds us that the very systems designed to protect current value can block new value. The smartest move I’ve seen is to treat disruptive bets like venture portfolios inside the company—small, time-boxed, outcome-driven, and shielded from legacy KPIs. That’s how you preserve execution excellence while creating space for the next S-curve.

    Technical taste is developable. I look for three signals: first, engineers who reduce a problem to its essence and deliver a working slice quickly; second, product creators who anchor on outcomes vs output OKRs; third, teams who habitually ask “what if” questions to surface non-obvious constraints and new leverage. When this mindset meets strong product discovery, you get faster cycles, fewer dead ends, and clearer product-market fit lessons.

    “Building Google Docs” is a case study in choosing the web as the platform before it was fashionable—an act of taste under uncertainty. It’s also a reminder that what looks inevitable in hindsight was controversial in real time. Discussions about “The decline of Google apps” are less about any one company and more about the drift that occurs when focus fragments; taste is how you steer back to the core job-to-be-done.

    On “The Innovator’s Dilemma facing Microsoft” and “The differences between Google and Microsoft,” I’ve seen how culture shapes product motion. One optimizes for experimentation at massive consumer scale; the other, for enterprise trust and durability. The playbook to reconcile both: define two operating modes—explore and exploit—and make the seams explicit. Use forward deployed engineers to learn with customers, while platform teams industrialize the wins.

    “How to build a winning product” in 2024 is straightforward to say and hard to do: shorten the distance between insight and impact. I prioritize gen AI for product prototyping to test feasibility early, pair it with real-user loops from day one, and instrument everything to learn faster than competitors. Ruthlessly prune scope to ship a lovable slice, then iterate. That’s how you scale software in 2024 without bloating teams or code.

    On “Becoming an optimist,” I’ve learned optimism is a practice: assume better solutions exist, then run experiments to find them. “What makes a great engineer” and “One thing the best engineers do” often collapse into the same behavior—holding high standards while moving fast. The best engineers I know ask precise “what if” questions, surface edge cases early, and translate ambiguity into a plan the team can execute.

    “Sam’s prediction about AI,” “Capturing the value of AI,” and “How you should think about AI” all converge on a few product truths. Co-pilots and agents will become table stakes; differentiation will come from domain-specific data, workflow depth, and trust. Value accrues where AI is closest to the decision or outcome—embedded in the flow of work, not bolted on. For customer support AI strategy, the win isn’t a clever bot; it’s reducing time-to-resolution with explainability, guardrails, and continuous learning from real tickets.

    “Microsoft’s new leverage,” “Scaling software in 2024,” and “The future of AI across several sectors” point to a broader shift: platforms that combine distribution, identity, and compliance will set the rules of engagement. But even in that world, local excellence matters—tight loops with customers, forward deployed engineers, and outcome-centric roadmaps will out-execute feature factories. The teams that treat gen AI as a capability—not a feature—will capture durable advantage.

    Referenced:

    Amazon: https://amazon.com

    Box: https://www.box.com/

    Elon Musk: https://twitter.com/elonmusk

    Google Docs: https://docs.google.com

    Itzhak Perlman: https://itzhakperlman.com/

    Microsoft: https://www.microsoft.com

    Netflix: https://www.netflix.com

    Tesla: https://www.tesla.com/

    The Innovator’s Dilemma: https://www.amazon.com.au/Innovators-Dilemma-Clayton-M-Christensen/dp/0062060244

    TurboTax: https://turbotax.intuit.com/

    Uber: https://www.uber.com/

    Walmart: https://www.walmart.com/

    Workday: https://www.workday.com/

    Writely: https://techcrunch.com/2005/08/31/writely-process-words-with-your-browser/

    Where to find Sam Schillace:

    LinkedIn: https://www.linkedin.com/in/schillace/

    Newsletter: https://sundaylettersfromsam.substack.com/

    Twitter/X: https://twitter.com/sschillace

    Timestamps:

    (00:00) Introduction

    (02:54) Lessons on market timing

    (07:30) Developing technical taste

    (09:51) Asking “what if” questions

    (14:03) Building Google Docs

    (19:32) The decline of Google apps

    (20:57) The Innovator’s Dilemma facing Microsoft

    (22:53) The differences between Google and Microsoft

    (24:42) How to build a winning product

    (27:46) Becoming an optimist

    (29:12) Why engineering teams aren’t smaller

    (32:00) Sam’s prediction about AI

    (34:11) Capturing the value of AI

    (37:43) How you should think about AI

    (45:33) Advice for future engineers

    (48:18) What makes a great engineer

    (49:45) One thing the best engineers do

    (51:37) Microsoft’s new leverage

    (56:01) Scaling software in 2024

    (59:50) The future of AI across several sectors

    (64:28) What Sam and a violinist have in common


    Book a consult png image
  • Inside Stripe, OpenAI, Retool: Hard‑Won Marketing Lessons on Brand, GTM, and Scale

    Inside Stripe, OpenAI, Retool: Hard‑Won Marketing Lessons on Brand, GTM, and Scale

    I spend a lot of time studying how the best product-led companies translate world-class product thinking into durable marketing systems. When I zoom out on OpenAI, Stripe, and Retool, I see a repeatable pattern: deep customer empathy, a narrative grounded in real product value, and an operational cadence that scales taste without diluting quality. In this piece, I share what’s worked for me as a product leader, and how I apply these lessons to build brand, accelerate go-to-market, and make smart resource allocation decisions.

    Here’s the roadmap for this deep dive: Marketing lessons from OpenAI, Stripe, and Retool. The 3 pillars of Stripe’s approach to brand. How to manage resource allocation as a marketer. Adapting marketing strategy to different business models. Advice for early marketing hires. I’ll keep the phrases and names intact where they are factual, and I’ll add my own practical commentary on how I use these ideas day to day.

    The 3 pillars of Stripe’s approach to brand is a useful way to think about brand systems in any technical company. Even without enumerating those pillars here, the underlying method is what matters: codify the few non-negotiables (the taste bar, the voice, the promise), make them visible to everyone, and hold the line in reviews. In my teams, we operationalize this by creating a short brand playbook that fits on a single page, pairing it with exemplar assets, and requiring every new program to declare how it advances at least one pillar. Clarity beats cleverness when you’re scaling.

    How to manage resource allocation as a marketer is a perennial challenge as products and teams grow. I’ve had success with a 70/20/10 model: 70% on proven programs with measurable ROI, 20% on emerging bets with leading indicators (pipeline quality, engagement from priority personas), and 10% on frontier ideas that can reset the curve. We anchor work to outcomes vs output OKRs—pipeline, activation, time-to-value, product-qualified leads—so we’re funding results, not activity. As context changes (new ICP, pricing shifts, platform launches), we rebalance quarterly rather than set-and-forget.

    As Stripe scaled taste, it demonstrated that high standards don’t have to mean bottlenecks. Rigorous reviews can empower teams when the criteria are explicit and teachable. Were Stripe reviews micromanaging? The lesson I apply: reviews should audit for narrative clarity, customer truth, and craft—not rewrite. We front-load narrative memos and storyboards, use pre-reads to keep live reviews crisp, and separate “taste feedback” from “blocking defects” to keep velocity high without compromising quality.

    Marketing under founders with strong marketing skills can be a superpower if you channel it. My playbook: align on the narrative spine early, invite dissent in draft form (not in launch week), and turn founder intuition into reusable artifacts—positioning docs, messaging matrices, and reference stories. The goal is to scale judgment across the org, not centralize it.

    Marketing at Retool vs Stripe and Marketing horizontal vs vertical products both highlight an important reality: default motions differ by product architecture and buyer psychology. For horizontal tools, the challenge is framing—teach the problem space, lead with canonical use cases, and invest in education (docs, templates, workshops) that unlock fast time-to-first-value. For vertical solutions, prioritize outcomes, credibility, and proof: ROI narratives, customer stories with industry-specific metrics, and targeted channel plays that map to where those buyers actually spend attention.

    Marketing to mid-market vs SMB vs enterprise requires instrumentation and patience tuned to each segment. For SMB, focus on self-serve journeys, clear pricing, and conversion velocity; for mid-market, emphasize solution fit, workflow integration, and multi-threaded nurture; for enterprise, lead with trust, compliance, partner ecosystem, and value engineering. I set segment-specific “north-star” outcomes (e.g., self-serve activation rate, opportunity-to-close rate, average deal cycle) and build program portfolios around those.

    Marketing programs that had an outsized impact often share a few traits: they’re product-adjacent, community-forward, and inherently educational. Two great examples from Stripe that I keep in my mental model are Stripe’s “Capture the Flag” campaign and Stripe Press—both programs build brand by creating genuine value for developers and builders instead of pushing features. They demonstrate how product-led marketing can compound over years.

    Lessons from OpenAI remind me that speed, clarity, and responsibility can coexist. The best teams tell a simple, credible story about how the tech helps people do meaningful work—then prove it in product. Inside OpenAI’s recent website relaunch, the big takeaways for me were reduction and focus: fewer pages, tighter flows, and a narrative that meets users where they are (from curious newcomers to advanced builders). That same discipline improves any product site: prioritize the jobs-to-be-done, reduce cognitive load, and surface the shortest path to value.

    How OpenAI’s marketers use OpenAI tooling is a model I bring into my teams daily. We use generative AI for content prototyping (outlines, angle exploration, voice calibration), for product discovery (summarizing interviews, clustering themes), and for campaign iteration (subject line tests, message variants, landing page microcopy). The bar is still human editorial judgment; AI accelerates the draft, we own the craft. Outside examples—like the Coca-Cola AI-generated wish card campaign—show how brand and AI can partner when creativity, data, and distribution align.

    Advice for early marketing hires is straightforward and hard-won. Be a product creator at heart: learn the product, sit with support, talk to customers weekly. Start with the shortest loops that drive real outcomes—docs that unlock activation, case studies that remove friction, templates that accelerate time-to-value. Build measurement into everything, but don’t let dashboards paralyze momentum. Above all, write clearly; strong writing is the highest-leverage GTM skill and a forcing function for clear thinking.

    When to start hiring marketers depends on signal. I look for repeatable demand patterns (consistent activation sources, emerging PQL signals), evidence of product-market fit lessons (clear ICP, pain–solution fit), and content debt (PMs and engineers over-producing GTM artifacts). For the first hire, I screen for full-stack utility, narrative instincts, and cross-functional leadership. How to screen early marketing hires: working sessions on positioning, a live critique of a landing page, and a writing exercise that reveals judgment under constraints.

    If you’re orchestrating a website relaunch, a segment shift, or a new product line, the throughline from these companies is simple: set a high taste bar, operationalize it with lightweight systems, and make the customer’s job-to-be-done the hero. Pair that with disciplined resource allocation, and you’ll earn brand, pipeline, and loyalty the hard way—by delivering real value.

    Referenced:

    Coca-Cola AI-generated wish card campaign: https://theprint.in/ani-press-releases/coca-cola-ignites-diwali-celebrations-with-unique-personalized-ai-generated-wish-cards/1840093/

    Cristina Cordova: https://www.linkedin.com/in/cristinajcordova/

    Gong: https://www.gong.io/

    Greg Brockman: https://www.linkedin.com/in/thegdb/

    Kenzo Fong: https://www.linkedin.com/in/kenzofong/

    Retool: https://retool.com/

    Stripe’s “Capture the Flag” campaign: https://techcrunch.com/2012/08/22/stripes-capture-the-flag-2-0-a-hands-on-contest-for-app-developers-to-test-their-security-know-how/

    Stripe Press: https://press.stripe.com/

    Stripe Sigma: https://stripe.com/us/sigma

    Tanya Khakbaz: https://www.linkedin.com/in/tanya-khakbaz-a725732/


    Book a consult png image
  • Inside Intercom’s Bold Reboot: Lessons in AI Strategy, Ruthless Focus, and Culture

    Inside Intercom’s Bold Reboot: Lessons in AI Strategy, Ruthless Focus, and Culture

    I’ve been reflecting on a remarkable comeback story that offers sharp lessons for product leaders navigating AI disruption. Eoghan McCabe is the CEO and cofounder at Intercom, an AI customer service platform. Intercom has raised over $240M, and was last valued at $1.3B in 2018. After spending 9 years building the company, Eoghan left Intercom in 2020, but he’s since returned, reshaping Intercom and pioneering its pivot to an AI-first service. That arc—departure, return, and reinvention—captures a founder’s willingness to defy orthodoxy and act from first principles.

    What stood out to me most was the unapologetic embrace of intuition. In high-variance environments like AI and customer support, best practices lag reality. Founder intuition vs. standard practice isn’t a cliché here; it’s a capability. I’ve seen teams overfit to playbooks and underweight the signals that matter—customer truth, product discovery signals, and outcomes vs output OKRs that force clarity on what actually moves the needle.

    McCabe’s reflections since leaving Intercom highlight the value of distance. Stepping away often exposes where complexity crept in and where focus was lost. On return, the immediate moves were decisive: refocus the strategy, simplify priorities, and set a higher bar for cadence and quality. Those changes were anchored by first-principles thinking and a willingness to question everything, including sacred cows.

    The productivity step-change is telling. How Eoghan increased Intercom’s productivity by 41% wasn’t magic—it was management. In my experience, that kind of shift comes from ruthless prioritization, removing low-leverage work, and consolidating teams around fewer, outcome-aligned bets. Tactically, think tighter operating rhythms, clearer decision rights, and forward deployed engineers who sit closer to customers to collapse feedback loops—especially critical in gen ai and customer support AI strategy.

    Strategy-wise, the pivot to AI-first wasn’t about feature-chasing; it was about category leadership. AI and category disruption demand conviction. Why you can’t make small improvements in big categories is simple: customers reward step changes in outcomes, not incrementalism. In customer service, that means rethinking workflows end-to-end, not just sprinkling gen ai for product prototyping on top of legacy processes.

    Hiring was another area where the guidance was crisp. Tactical advice on hiring top talent included raising the bar on slope (rate of learning) and ownership, biasing for product creators who thrive in ambiguity, and building an executive team that can scale the operating model, not just the org chart. I’ve found this is where product management leadership shows up most clearly—pushing beyond conventional resumes to find people who can compound execution and insight.

    Culture carried equal weight. Crafting a culture of ruthless honesty and transparency isn’t about being abrasive; it’s about creating a system where truth travels fast. In practice, that looks like instrumented business reviews tied to outcomes, written decision memos that capture tradeoffs, and a shared language for escalation. It’s uncomfortable at first, then liberating—because it accelerates learning.

    Brand came in for a reality check, too. Why software branding is in crisis resonates in an era where many products sound the same, look the same, and promise the same. The antidote is clarity: a point-of-view that’s inseparable from the product experience. How Intercom thinks about brand appears to lean into differentiated behavior—speed, quality, outcomes—rather than slogans. In crowded categories, that’s what earns attention and trust.

    Under the hood, this story is a masterclass in product-market fit lessons. It reaffirms that PMF isn’t a one-time event; it’s a moving target, especially when technology paradigms shift. The companies that navigate the shift are those that re-baseline their bets, measure what matters, and ship faster with higher standards. That’s the compounding loop I try to build: focused strategy, outcome-centric execution, and continuous product discovery.

    If you’re steering an AI transformation, a few prompts I use: Are we solving for an outcome that customers will feel in minutes, not months? Where are we making bold, non-incremental bets? Which processes can we kill to regain tempo? And do our leaders model transparency in a way that accelerates truth-telling across the org?

    For further context and inspiration, here are some of the references mentioned: 37signals: https://37signals.com, Basecamp: https://basecamp.com, Brian Halligan (HubSpot): https://www.linkedin.com/in/brianhalligan, David Heinemeier Hansson (37signals, Basecamp): https://www.linkedin.com/in/david-heinemeier-hansson-374b18221, Intercom: https://www.intercom.com, Jason Fried (37signals, Basecamp): https://www.linkedin.com/in/jason-fried, Salesforce: https://www.salesforce.com, Marc Benioff (Salesforce): https://www.linkedin.com/in/marcbenioff, Zendesk: https://www.zendesk.com.

    If you want to follow Eoghan directly: LinkedIn: https://www.linkedin.com/in/eoghanmccabe/ and Twitter/X: https://x.com/eoghan. I find it valuable to track leaders who are actively rewriting the playbook in real time.


    Book a consult png image
  • Just Now Possible Preview: How Real Teams Ship AI—Workflows, RAG, Agents, Evaluation

    Just Now Possible Preview: How Real Teams Ship AI—Workflows, RAG, Agents, Evaluation

    I’m excited to share a preview of Just Now Possible, a show where I sit down with the builders who are shipping meaningful AI features in the real world. My goal is simple: pull back the curtain on how AI products actually get made—messy problems, rapid prototyping, and the leadership decisions that move teams from concept to customer value.

    Watch the preview on YouTube: https://www.youtube.com/embed/Kb2HbuPbfR8?feature=oembed. Prefer audio? Listen on Spotify: https://open.spotify.com/episode/5xM0pDnqR0JpKmW6aZ0pj6?ref=producttalk.org or Apple Podcasts: https://podcasts.apple.com/us/podcast/podcast-preview/id1838832993?i=1000725807029&ref=producttalk.org. Want a text version? Read the transcript ($): #full-transcript.

    How AI products come to life—straight from the builders themselves. In each episode, we dive deep into how teams spotted a customer problem, experimented with AI, prototyped solutions, and shipped real features. We dig into everything from workflows and agents to RAG and evaluation strategies, and explore how their products keep evolving. If you’re building with AI, these are the stories for you.

    From my own experience leading product teams, I’ve seen that the real unlocks come from disciplined product discovery, clear outcomes vs output OKRs, and smart use of gen ai for product prototyping. We’ll talk about the tradeoffs between speed and safety, when to bring in forward deployed engineers, and how to validate product-market fit lessons before scaling. Along the way, we’ll unpack practical patterns—like when to use RAG vs fine-tuning, how to evaluate agents in production workflows, and what great product management leadership looks like in AI-first environments.

    The first full episode drops on Thursday, September 18th. Don't miss it!

    Full transcripts are available to paid subscribers.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Building AI Products That Work: My Playbook for LLM Strategy, Evals, and Orchestration

    Building AI Products That Work: My Playbook for LLM Strategy, Evals, and Orchestration

    AI features don’t succeed on clever prompts alone—they demand thoughtful product strategy, rigorous evaluation, and tight cross-functional collaboration. As a VP of Product Management and someone deeply immersed in building with Large language model (LLM) technology, I’m constantly refining how we turn generative capabilities into real customer value. This episode of All Things Product zeroes in on that challenge, and it captures many of the principles I rely on when shipping AI to production.

    The central question resonates with every product leader I know: How do product teams learn to build AI-powered products “beyond just dabbling with ChatGPT”? I appreciate how the conversation moves past novelty and into the disciplines that make AI reliable, safe, and outcome-oriented.

    One metaphor that always lands for me: building AI features is less like writing a single “killer prompt” and more like orchestrating a team of “interns.” You define roles, break down work, set guardrails, and continuously review outputs. That orchestration mindset, coupled with strong observability, evals, and ongoing maintenance practices, is what separates flashy demos from repeatable product value.

    Here’s how I frame the work. First, there’s a difference between an AI-powered product manager and an AI product manager. Many of us are becoming AI-powered—using tools to accelerate discovery, ideation, or execution. But when you own AI features end-to-end, you inherit new responsibilities: modeling risks, defining evaluation strategies for non-deterministic systems, and treating prompts and data pipelines as core product surfaces.

    Prompt engineering for a product is fundamentally different from prompting ChatGPT for personal use. In production, I rely on prompt decomposition and orchestration—explicitly breaking a task into steps, assigning each step to the right capability, and enforcing consistent formats. This reduces variance, improves debuggability, and enables targeted evals that catch regressions before customers do.

    System design and risk mitigation become front and center. I align early with engineering, legal, security, and support on failure modes, privacy expectations (including Personal information or personally identifiable information (PII)), and rollout plans. We log traces for every critical path, treat prompts as versioned assets, and use observability to connect inputs, intermediate states, and outputs. When something drifts, we need to see it fast, explain it, and fix it.

    Evaluating non-deterministic AI features is its own craft. “Thumbs up/thumbs down” isn’t enough. I design layered evals: unit-level checks for correctness and formatting, scenario-level evals for edge cases and risk behaviors, and longitudinal evals to monitor model and data drift over time. Clear acceptance thresholds and shadow deployments help us balance velocity with reliability.

    Deciding when AI is the right solution starts with the customer problem, not the model. I ask: Is the task ambiguous enough to benefit from generation? Can we bound the failure modes? Do we have affordable latency and cost envelopes? And what’s the graceful fallback if the model underperforms? If a deterministic algorithm or simple rules solve it better, we choose that—no heroics.

    The hidden cost of AI is maintenance. Prompts rot as upstream models change. New data skews behavior. Guardrails that worked yesterday might not hold tomorrow. That’s why ongoing evals, robust logging, and a change-management plan (for prompts, schemas, and policies) are non-negotiable. Treat AI features as living systems, not one-off launches.

    If you’re exploring gen ai for product prototyping, start small. Pick a narrow, high-value workflow, instrument everything, and ship with clear success metrics. Use your first release to build your team’s muscles around observability, evals, and cross-functional collaboration. The goal is not a perfect model; it’s a reliable product outcome.

    Want to go deeper? Listen to the full conversation here: Spotify | Apple Podcasts. Prefer video? Watch on YouTube: Building AI Products.

    What you’ll learn in this episode:

    – The difference between an AI-powered product manager and an AI product manager

    – Why prompt engineering for a product is different from prompting ChatGPT for personal use

    – The role of prompt decomposition and orchestration in building robust AI features

    – How to think about system design, risk mitigation, and cross-functional collaboration

    – Why observability and logging traces are critical for LLM products

    – The challenge of evaluating non-deterministic AI features (and why “thumbs up/thumbs down” isn’t enough)

    – How to decide when AI is the right solution for a customer problem

    – The hidden cost of ongoing maintenance for AI features

    Join the conversation: What practices have helped you ship reliable AI features? Drop your thoughts and questions in the comments—I’d love to learn from your experiences.


    Inspired by this post on Product Talk.


    Book a consult png image
  • From Disruption to Breakthrough: How Stack Overflow’s AI Pivot Became a Product Playbook

    From Disruption to Breakthrough: How Stack Overflow’s AI Pivot Became a Product Playbook

    Generative AI doesn’t knock politely—it kicks the door open and forces product teams to re-think the fundamentals. I’ve lived through my share of market shifts, and the story of Stack Overflow’s AI journey hits every note of what it takes to respond with clarity, speed, and rigor.

    When ChatGPT launched, Stack Overflow faced a cataclysmic shift: developer behavior was changing overnight. That single sentence captures the urgency I felt as I studied this case: habits, traffic patterns, and value perceptions transformed almost instantly.

    Consider the timing: Ellen Brandenburger stepped into Stack Overflow just two weeks before ChatGPT launched. In her shoes, I would have immediately asked the same questions she did: What new developer workflows are becoming “just now possible”? How quickly can we prototype without compromising quality or trust? And how do we avoid overcorrecting in a moment of uncertainty?

    In response, the team created Overflow AI, a concentrated effort to explore “what’s just now possible” for developers. I love this framing—it anchors exploration to near-term feasibility while keeping sight of evolving user needs. It’s the kind of focused discovery effort I encourage when a platform-defining shift hits.

    They moved through four disciplined iterations of conversational search, each an experiment with clear hypotheses and guardrails:

    V1: a chat UI on top of keyword search

    V2: semantic search to handle natural questions

    V3: fallback to GPT-4 for gaps in Stack Overflow’s corpus

    V4: adding RAG for attribution and transparency

    Two principles stood out as non-negotiable: attribution and transparency. For developers, trust depends on knowing where an answer came from, why it’s relevant, and whether it reflects source truth. I’ve found the same in my own teams—without provenance and clarity, even great answers feel shaky.

    The team’s evaluation approach was refreshingly pragmatic: simple spreadsheets and subject-matter experts assessing accuracy, relevance, and completeness. In my org, we’ve adopted similar lightweight scorecards before scaling LLM investments; it keeps us honest about quality before we fall in love with a demo.

    Here’s the moment that demonstrates real product management leadership: despite the investment, Stack decided to sunset conversational search when it couldn’t meet developer standards. That discipline—choosing not to ship what isn’t good enough—preserves brand trust and creates space for a better bet.

    And that better bet was a strategic pivot: the team leaned into data licensing, leveraging its 14M+ Q&A corpus to power LLM training and benchmarks. Instead of treating AI as a threat, they turned their differentiated asset into a durable business line.

    They went further, building industry benchmarks with subject-matter experts to prove Stack data improved LLM accuracy and relevance. This is exactly how I think about outcomes vs output: quantify lift against real tasks, validate with domain experts, and package value in a way decision-makers can trust.

    Key lessons I’m taking forward:

    Take one bite of the apple at a time—prototype, learn, iterate.

    Product in the AI era means managing probabilities, not certainties.

    For context, Ellen Brandenburger is a product leader and coach; former head of product at Chegg Skills and Stack Overflow’s data licensing team. Her arc through this transformation underscores what matters most right now: tight feedback loops, transparent evaluation, and the courage to pivot from feature bets to business model bets when the evidence demands it.

    If you’re leading gen AI initiatives, treat this as a playbook: form a focused “just now possible” team, instrument quality with SMEs early, obsess over attribution and transparency, and be willing to sunset—even after heavy investment—when the work doesn’t clear your user’s bar. Then, zoom out: your unique data and workflows may be the moat. Build for that.


    Inspired by this post on Product Talk.


    Book a consult png image
  • Mastering AI Evals: Real-World Discovery Tactics to Ship Quality, Safe, Reliable AI

    Mastering AI Evals: Real-World Discovery Tactics to Ship Quality, Safe, Reliable AI

    I’ve been shipping GenAI features long enough to know that clever prompts and orchestration aren’t enough. What actually matters is evidence: Does the system work, for whom, and under what conditions? That’s where rigorous AI evals come in—the backbone of building reliable, safe, and continuously improving AI products.

    In a recent conversation focused entirely on evaluation, I dug into what “evals” mean in the AI/ML world, why they’re more than just quality assurance, and how to operationalize them end to end. If you want to explore the discussion, listen on Spotify: https://open.spotify.com/episode/7mSiEGSYNO4sXeGAVTJO4V or Apple Podcasts: https://podcasts.apple.com/kh/podcast/ai-evals-discovery/id1794203808?i=1000727980774. There’s also a video version on YouTube: https://www.youtube.com/watch?v=pfSIQMrWhQE.

    Here’s how I frame evals with my teams. First, define the behavior you want to see in terms real users care about. Then codify that intent as tests that run consistently. I distinguish between golden datasets, synthetic data, and real-world traces. Golden datasets capture canonical examples that represent “ground truth.” Synthetic data fills important gaps quickly and safely. Real-world traces keep you honest and reflect evolving usage.

    The most durable loop I’ve found is simple: identify error modes, turn them into evals, and automate. This is where error analysis pays off. Some checks should be purely deterministic—code-based checks that evaluate structured outputs, schemas, or policies. Others benefit from LLM-as-judge when human-like judgment matters, as long as you calibrate and continuously verify those judges with spot checks and inter-rater agreement.

    Discovery practices should inform every evaluation step. If you’re doing “Story-Based Customer Interviews,” you can derive realistic scenarios, acceptance criteria, and edge cases directly from user narratives. That context sharpens the evals and prevents you from overfitting to toy problems or proxy metrics that don’t reflect user value.

    Evals require ongoing care and feeding. Criteria drift is real—what counted as “good” six weeks ago may not satisfy users after you ship a new capability or your audience evolves. I treat the eval suite like living product infrastructure: versioned, reviewed, and owned. When we change prompts, models, or retrieval strategies, the evals run first, then we examine deltas, regressions, and surprises before anything reaches production.

    Guardrails and human oversight work hand-in-hand with evals. Guardrails enforce non-negotiables (safety, privacy, compliance), while evals measure progress against nuanced goals (relevance, helpfulness, tone). In high-stakes workflows, I combine pre-deployment evals, runtime guardrails, and spot human review. The goal isn’t to eliminate humans; it’s to focus their attention where judgment and context matter most.

    Practically, I start with a minimal eval harness that standardizes inputs and outputs—often in JSON (JavaScript Object Notation)—and writes repeatable tests. I maintain a small golden dataset, add targeted synthetic data for coverage, and stream real-world traces into the suite once we have consent and redaction in place. For subjective criteria (e.g., tone, helpfulness), I layer in LLM-as-judge with calibration. For objective checks (e.g., schema validation, policy compliance), code-based checks are my default.

    Tooling evolves quickly, but the principles hold. Whether you’re working with Anthropic or experimenting with V0 or Lovable in your prototyping stack, the eval loop stays the same: define success, test it the same way every time, and close the loop with learning. If you’re a product creator or leading forward deployed engineers, this discipline accelerates gen ai for product prototyping without sacrificing safety or quality.

    I also tie evals to outcomes vs output OKRs. Instead of “ship three prompts,” we commit to measurable outcomes like resolution rate, time-to-answer, or a target “helpfulness” score. In customer support ai strategy, we monitor real-world traces, CSAT, and handoff quality to ensure the AI augments agents rather than creating silent failure modes. That’s how evals drive product-market fit lessons instead of just dashboards.

    If you want to go deeper, explore these foundational concepts and tools: ML (Machine learning), LLM (Large language model), “AI Evals for Engineers and PMs”: https://maven.com/parlance-labs/evals, “The Product Leadership Wheel – A Framework for Defining and Growing Product Leadership at Scale”: https://www.petra-wille.com/plwheel, “How I Designed & Implemented Evals for Product Talk’s Interview Coach”: https://www.producttalk.org/2025/09/interview-coach-evals/, “Behind the Scenes: Building the Product Talk Interview Coach”: https://www.producttalk.org/2025/08/customer-interview-coach/, V0: https://vercel.com/docs/v0, JSON (JavaScript Object Notation): https://en.wikipedia.org/wiki/JSON, Anthropic: https://www.anthropic.com/, Lovable: https://lovable.dev/, and “Story-Based Customer Interviews”: https://learn.producttalk.org/course/story-based-customer-interviews.

    If this resonates, I’ll be sharing weekly lessons learned from building and evaluating AI features in the wild, plus conversations with cross-functional teams about real-world AI development. Have thoughts or a tactic that’s worked for you? Drop a comment and let’s compare notes.


    Inspired by this post on Product Talk.


    Book a consult png image
  • How a Weekend Hack Hit 7-Figure ARR: My Product Playbook from Reducto’s Rise

    How a Weekend Hack Hit 7-Figure ARR: My Product Playbook from Reducto’s Rise

    I’m often asked how to spot and scale an AI wedge quickly without over-engineering. Recently, I studied how one founder did exactly that—and it’s a masterclass in product-market fit, go-to-market speed, and customer-centric execution. Adit Abraham is the co-founder and CEO of Reducto, which helps leading AI teams extract and structure data from complex documents and spreadsheets in their pipeline. Within 6 months of launching, Reducto went from 0→7 figures in ARR. Reducto has grown to process tens of millions of pages monthly for companies ranging from startups to Fortune 10 enterprises. They just announced a $24M Series A. Before Reducto, Adit was a Product Manager at Google, working on Ads and Search, and conducted machine learning research at MIT’s Media Lab. Here’s what stood out to me as a product leader: the fastest path to traction wasn’t a grand platform vision—it was a weekend project that nailed one painful, universal job to be done: turn messy PDFs and spreadsheets into structured, reliable data that AI teams can trust. Listening to customers revealed an important pivot. Instead of forcing a preconceived product roadmap, the team followed customer signal to PDF processing. The turning point wasn’t a feature bomb—it was clarity: when your users repeatedly drag you toward a narrow, high-pain workflow, follow that pull with urgency. The weekend project that became Reducto’s breakthrough embodied a principle I push with my teams: ship a thin slice that solves one gnarly, repeatable problem end-to-end. It creates credibility, accelerates learning loops, and makes it obvious what to build next. From there, Reducto focused on “transferable features”—capabilities that compound across adjacent use cases (think normalization, validations, lineage, and auditability), so every new customer increases product surface area without bespoke reinvention. Landing a Fortune 10 customer didn’t come from a flashy deck. It came from enterprise-grade reliability, ruthless attention to accuracy, and a willingness to be hands-on. This is where forward-deployed engineering shines: sit with users, work their real documents, and treat integrations, SLAs, and observability as first-class features. In AI document processing, precision and proof beat promises every time. For technical founders, sales can feel unnatural. My guidance mirrors what worked here: reframe sales as active product discovery at the edge of pain. Use the customer’s language, quantify ROI in minutes saved and errors avoided, and reduce the perceived risk with quick pilots, deterministic evaluation, and transparent quality metrics. Caring beats perfect pitches—responsiveness, iteration speed, and real ownership of results build trust faster than theatrics. The strategy behind Reducto’s horizontal expansion was pragmatic: start with a narrow ingestion problem, then generalize through connectors, schemas, and review workflows that serve multiple industries. When a wedge market behaves like infrastructure, platformize the capabilities that every adjacent use case will need. That’s how you broaden TAM without losing product sharpness. I also appreciate the operating cadence: hire slow, go-to-market fast. Keep the bar high on IC excellence while removing friction from the path to revenue. Early-stage advantage comes from fewer handoffs, shorter feedback loops, and tighter alignment between product, engineering, and customer outcomes. On mindset, one line resonated deeply: “You’re going to fail”. The point isn’t pessimism—it’s preparation. Design processes that surface weak signals early, celebrate invalidated hypotheses, and compress the time between insight and iteration. In my experience, the teams that win treat failure as data and speed as a cultural norm. Fundraising-wise, momentum compounds when narrative and metrics rhyme. 0→7 figures in ARR in six months, tens of millions of pages processed monthly, and a clear enterprise motion make a compelling arc for a $24M Series A. The lesson: sequence your proof points—pain, precision, and production scale—so investors can see inevitability rather than potential. If you’re building in document AI or adjacent data ingestion, study the tooling landscape (Anthropic, Scale AI, Stripe, Textract, Y Combinator) not as competitors but as ecosystem rails. Your goal is reliable transformation from unstructured inputs to structured outputs with measurable quality, strong governance, and smooth downstream integration. I’ll leave you with a practical playbook I use with my teams: Listen for intense pull, not polite praise. Pivot when usage—not opinions—clusters around a painful workflow. Ship a narrow, decisive wedge that solves the full job end-to-end. Measure accuracy, speed, and reliability. Invest early in “transferable features” that travel across verticals—validation, audit trails, observability, and schema tooling. Treat sales as discovery. Quantify ROI, shorten time-to-value, and make evaluation deterministic. Scale with forward-deployed engineering until patterns stabilize. Then platformize. Grow revenue faster than headcount. Hire slow, raise the bar, and keep iteration loops tight. If you want to explore more, start with Reducto (https://reducto.ai/) and connect with Adit on LinkedIn (https://www.linkedin.com/in/aditabraham/). Whether you’re chasing your first customer or your first Fortune 10 logo, the blueprint is the same: focus the wedge, prove precision, and move fast where it matters most.
    Book a consult png image
  • Reimagining Product Teams with Generative AI: A Bold, Practical Vision for the Next 24 Months

    Reimagining Product Teams with Generative AI: A Bold, Practical Vision for the Next 24 Months

    In this article, I want to talk about where I believe generative AI is going to take the roles on a product team, and the team topologies of product organizations. I’m motivated to write this both because I think a vision of where we should try to go is important, and also because I see…

    That conviction has only grown as I’ve led cross-functional teams through real deployments. The traditional boundaries between product management, design, engineering, and customer success are blurring as generative AI moves from novelty to dependable copilot. What follows is the vision I’m using to guide our roadmap, hiring, and rituals—practical, near-term, and focused on outcomes.

    First, on roles: product managers will spend less time drafting artifacts and more time validating assumptions and sequencing bets. AI will draft PRDs, summarize interviews, propose opportunity trees, and even flag risks. But we will anchor decisions on outcomes vs output OKRs, using AI to widen the option set, not to outsource accountability.

    Design will accelerate dramatically. With gen ai for product prototyping, designers can turn rough concepts into interactive flows in hours, stress-test copy for clarity, and explore accessibility states before code is written. The craft shifts toward problem framing, system thinking, and quality thresholds—where human judgment remains the differentiator.

    Engineering becomes even more product-facing. Forward deployed engineers will pair with PMs and designers at customer sites (or virtually) to co-create solutions, integrate LLMs, and harden edge cases. Model-aware engineering, evaluation harnesses, and data pipeline stewardship become core competencies, while “prompt engineering” becomes a skill embedded across functions rather than a standalone role.

    On team topology: our default unit stays the autonomous, outcome-owning squad, but we add an enablement layer. An AI platform team supplies shared services—feature stores, evaluation datasets, observability, and safety guardrails—so product teams can move fast without reinventing infrastructure. Guilds or communities of practice steward reusable prompts, patterns, and model cards across squads.

    Discovery evolves too. We’ll pair classic product discovery with AI-accelerated research: large-scale synthesis of qualitative feedback, scenario exploration with synthetic data, and rapid hypothesis testing through simulated cohorts. Human-in-the-loop remains non-negotiable; generative AI helps us see more options, but customers still tell us what’s true.

    Customer support becomes a flywheel. A thoughtful customer support ai strategy turns conversations into structured insights, feeds prioritization, and powers in-product guidance. The same signals that resolve tickets should inform discovery, experimentation, and roadmap trade-offs.

    Governance and safety must be proactive. We’ll define golden datasets, create red-team playbooks, and adopt model-level SLAs alongside product SLAs. Evaluation goes beyond accuracy to include fairness, latency, explainability, and cost, with clear escalation paths when models drift or fail.

    Measuring impact changes as well. Beyond feature delivery, we’ll track time-to-learning, reduction in cycle time, precision of targeting, and the quality of decisions AI actually improves. The goal is durable product-market fit lessons, not vanity metrics or demo-driven development.

    Here’s a pragmatic 90-day starter plan: identify two high-signal use cases where latency, cost, and safety are manageable; form a cross-functional pod with a PM, designer, forward deployed engineers, and a data partner; instrument robust evaluation gates; align on outcomes vs output OKRs; ship, learn, and codify the playbook. In parallel, stand up the minimal AI platform services your squads will reuse.

    This is a leadership challenge as much as a technical one. Product management leadership must set the bar for ethical use, invest in upskilling, and reorganize incentives around outcomes. The teams that win will treat generative AI as a force multiplier for curiosity, learning, and craftsmanship—not a shortcut around them.

    If we do this well, our product teams will be faster, more customer-obsessed, and more resilient. The tools are ready. The real question is whether we are ready to evolve how we work, measure progress, and lead.


    Inspired by this post on SVPG.


    Book a consult png image
  • From Vision to Value: How Generative AI Elevates Product Design and Product Management

    From Vision to Value: How Generative AI Elevates Product Design and Product Management

    Product, design, and AI now converge at the center of how we build value. In my role leading product teams at HighLevel, Inc., I’ve experienced firsthand how generative AI amplifies the craft of product management and product design when we keep the fundamentals tight: clear problems, measurable outcomes, and deep collaboration across disciplines.

    The mission hasn’t changed—deliver useful, usable, and trustworthy experiences—yet the means have. Generative AI expands our exploration space, speeds up iteration, and helps us reason over messy, real-world data. When we marry rigorous product discovery with thoughtful design and responsible AI strategy, we move from novelty to durable impact.

    In discovery, I use AI to frame hypotheses, generate research questions, cluster customer feedback, and synthesize interview notes—without replacing direct conversations with customers. The goal is sharper insight, faster. I define outcomes in customer language, pressure-test assumptions, and trace every proposed AI capability to a clear job to be done. These habits keep us anchored to product-market fit lessons rather than shiny demos.

    For prototyping, I pair designers with forward deployed engineers to build realistic vertical slices quickly. We practice gen ai for product prototyping by wiring prompts, system instructions, constrained outputs, and lightweight evaluators into clickable flows so we can test usefulness early. This reduces risk and helps the team learn which interaction patterns—chat, form, or guided workflows—fit the problem best, especially in product creator experiences.

    Designing AI-powered UX means embracing uncertainty without eroding trust. I favor patterns like transparent confidence cues, citations or references where possible, editable outputs, easy undo/redo, and clear pathways from draft to commit. Good empty states, contextual examples, and progressive disclosure teach users how to get high-quality results while keeping them in control.

    Quality requires a measurement backbone, not vibes. I define target tasks and build golden datasets, then run offline evaluations before online experiments. The core metrics stay consistent: task success rate, user confidence, time-to-first-value, latency budgets, and cost per resolution. We harden experiences with guardrails, hallucination checks, safe fallbacks, and escalation paths to humans when the model is uncertain.

    Responsible AI is a product requirement, not a checkbox. I design for privacy-by-default, PII minimization, and secure data handling; I track prompt and model versions; and I test for bias and accessibility from the outset. Human-in-the-loop review, auditability, and transparent change logs protect users and the business as features evolve.

    Go-to-market is part of the product. Clear onboarding, explainers, and in-product education reduce time to value. I align customer support ai strategy with telemetry so support teams can triage AI-specific issues, capture edge cases, and channel learning back into prompt libraries, data pipelines, and design improvements.

    From a leadership standpoint, I set strategic guardrails and empower autonomous teams. Product management leadership owns outcomes and decision quality; design leads shape multimodal experiences; engineering owns reliability and performance; and our AI platform team standardizes evaluation, safety, and cost controls. This clarity accelerates learning and throughput.

    Recently, we shipped an AI-assisted creation flow that reduced manual steps, improved time-to-first-value, and drove adoption among new users. The win wasn’t a clever prompt; it was disciplined product discovery, fast iteration with realistic data, and a crisp definition of success before we scaled.

    If you’re just starting, pick one high-value, low-risk use case, define success in customer terms, and build a thin vertical slice with evaluations and guardrails. Put it in front of real users, instrument everything, and iterate until the experience feels fast, predictable, and genuinely helpful.

    The intersection of product, design, and AI will keep evolving, but the bar remains the same: ship outcomes customers care about. When we combine the leverage of generative AI with sound product discovery and strong product design, we turn vision into value—reliably and repeatably.


    Inspired by this post on SVPG.


    Book a consult png image