Tag: gen ai

Win AI Search: Proven Playbook to Get Your Startup Recommended by ChatGPT & Perplexity

AI search is quickly becoming the new homepage for startups. When a buyer asks a model for the best tools, they often take the short list at face value. I treat this moment as a product surface I can influence with strategy, content, structure, and distribution—much like any other go-to-market channel.

Early on, I set a simple objective for my team and me: "Learn how LLMs like ChatGPT and Perplexity decide which startups to recommend and what signals help a brand get discovered in AI search." That sentence became our north star for experiments, instrumentation, and content architecture.

Here is the mental model that consistently holds up in practice. Large language models synthesize answers from a knowledge graph built from crawled content, citations, and high-signal sources. They weight consensus, clarity, recency, authority, and machine-readability. I don’t pretend to know the internals, but across hundreds of tests, the same patterns correlate with being surfaced and cited.

First, I make our entity unambiguous. I standardize the company name, product names, and leadership bios across the site and external profiles. I implement Organization and Product markup with schema.org and link out with sameAs to authoritative profiles like LinkedIn, Crunchbase, GitHub, and key directory listings. The goal is to collapse ambiguity so AI search knows exactly who we are and which claims are attributable to us.

Next, I publish definitive, answer-first pages. For every core query—what we do, who it’s for, outcomes, differentiators, pricing, comparisons, and integrations—I ship a page that leads with a crisp summary, then supports it with evidence, examples, and plain language. I include Q&A sections, realistic use cases, and named case studies so models can quote and ground responses in verifiable facts.

I then make the site maximally machine-readable. I add schema.org for SoftwareApplication, Product, FAQPage, and HowTo where relevant. I keep titles, H1/H2 structure, internal links, and metadata descriptive and consistent. I expose last-modified dates, maintain an XML sitemap, and keep a visible changelog and release notes. Freshness matters—Perplexity, in particular, tends to privilege recent, well-cited material when answering time-sensitive questions.

Citations are non-negotiable. I earn credible mentions on third-party properties, analyst lists, comparison pages, and customer reviews. I prioritize authoritative placements over volume, then make sure our site references those sources to reinforce the signal. When Perplexity cites our page alongside a respected third-party review, our inclusion rate in answers rises noticeably.

I also design for developers, buyers, and machines at once. That means clean docs, integration pages, and transparent security and trust content. Clear API references, integration guides, and reliability notes give models concrete artifacts to summarize. Pricing, privacy, and support policies reduce uncertainty and increase the likelihood that an answer will include us.

Measurement turns this from a hunch into a system. I run controlled content experiments, track minimum detectable effect on discovery and mentions, and instrument referral patterns from AI assistants when citations appear. I monitor which prompts surface our brand, which sources are cited, and which pages are repeatedly used as references. When we move a KPI, we codify the pattern into our playbook and scale it.

Trust is the compounding advantage. I maintain a transparent trust center, privacy-by-design posture, and clear data governance practices. I remove vague claims, back up benefits with evidence, and keep all performance or security statements auditable. Models tend to lift brands that feel low-risk, well-documented, and widely corroborated.

If you want a fast start, here’s the checklist I rely on. Standardize your entity and ship schema.org. Publish answer-first pages for core jobs-to-be-done, comparisons, and integrations. Earn authoritative third-party citations and reference them. Keep release notes, changelogs, and dates current. Instrument AI discovery and iterate based on what gets cited. Do this consistently, and your startup earns a fair shot at being recommended when buyers ask AI for the best options.

Inspired by this post on Amplitude – Best Practices.

November 7, 2025
Turn Claude Code Into a Trusted Teammate: My 3-Layer Memory System You Can Copy

"Can you critique the landing page for my new Story-Based Customer Interviews course?" That simple ask used to kick off hours of back-and-forth where I fed an AI the same context over and over—only to get generic feedback that wouldn’t land with my audience or fit my products. As a product leader, that inefficiency was unacceptable; as a writer, it was just plain frustrating.

Not anymore. Today, Claude not only critiques my work, it helps me produce it. It generates marketing copy—in my voice. It helps me write blog posts. It knows what search terms are relevant to my business and helps me optimize my articles for SEO and now AEO. It helps me with competitive research, academic research, and discovery research. And it does all of this with little prompting from me.

I don’t upload files to a web-based project. I don’t manage elaborate prompt libraries. I don’t repeat myself. I ask for help and Claude knows exactly what to do. The shift happened when I learned how to give Claude Code a memory. Claude now knows who my target customer is, the key value propositions I focus on, the specific opportunities each product addresses, my revenue model, my marketing channels, and so much more.

A dark-themed strategy slide for the post Stop Repeating Yourself: Give Claude Code a Memory, showing how to lead with a CLAUDE.md glossary page, write clearly for nontechnical readers, and link glossary and article to boost discovery and engagement.

With that memory, I consistently get high-quality output tailored to my audience and aligned to my products and services. I don’t retype the same context; Claude just remembers. In this article, I’ll show you exactly how I set up that memory. It relies on Claude Code (which requires a Pro subscription), and it’s worth it. If you’re new to Claude Code, start with "Claude Code: What It Is, How It’s Different, and Why Non-Technical People Should Use It."

Here’s the underlying problem: with large language models, every conversation starts from scratch. Yes, ChatGPT can remember some things and Claude can search past conversations, but practically speaking each new thread wipes the slate clean. If I were working on a new landing page, I’d normally need to upload target customer context, product details, primary and secondary value propositions, FAQ questions and answers, plus testimonials and logos for social proof—every single time.

Start fast with Claude’s home screen: Sonnet 4.5 is ready, and quick actions for writing, learning, and coding sit beneath a clean prompt box—ideal for showing how memory cuts repetition and streamlines daily development.

Projects in web-based tools help a bit, but they introduce a new dilemma. When I move to the next landing page targeting the same customer but a different product and value proposition, do I start a new Project (tedious) or keep expanding the old one (which muddies the context window and degrades output quality)? The good news: Claude Code solves this by giving the model a precise, durable memory without overloading any single conversation.

Claude Code can read files on my local machine, which is an understated superpower. I use those files to create a persistent, reusable memory that works across all chats and Projects. Files can be mixed and matched, so I give Claude exactly what it needs for the task at hand—and nothing more. For a first landing page, I reference the target customer and the relevant product; for the second, I reuse the same target customer file and point to the new product file.

Dark-mode Notes screenshot captures Claude Code in action: it fetches producttalk.org, reads context files, and delivers a concise homepage evaluation—showing how memory streamlines repeated analysis tasks.

When you give an LLM the exact right context, output quality jumps. More context only helps if it’s the right context. For a landing page, Claude needs to know about the current product and perhaps related products for differentiation—but it doesn’t need to know about unrelated offerings. Structure your memory so Claude gets precisely what’s required.

Once I did this, Claude shifted from “intern who needs handholding” to trusted advisor and capable teammate. It doesn’t guess at my value propositions—I’ve already told it. It writes in my voice because it has my writing guide and samples. It knows who owns which course and which use cases map to which features. The setup takes a bit of upfront work, but it compounds: update a file when something changes and you’re done. Most of this information already lives in your system; the trick is making it easy for Claude to use.

See how Claude Code stops repetition: global and project CLAUDE.md files, plus custom reference docs, flow into the editor so the assistant remembers your preferences and context while you code and run commands.

Because the files live on my machine, I own the system. No vendor or device lock-in. I decide when and who to share with. I can work with Claude on one project and ChatGPT on another—both can rely on the same file-based memory strategy. It’s an AI strategy that scales with product discovery, accelerates go-to-market content, sharpens competitive differentiation, and supports product-led growth.

Here’s how I design the memory: I use three layers. Claude Code already encourages global preferences and Project-specific instructions, but the third layer—reference context—is where the real power lives.

Peek inside a markdown playbook for Claude Code: concise rules for writing, multi-level planning, and clear feedback that turn repeated reminders into reusable memory and smoother, faster coding sessions.

Layer 1: Global Preferences (Always on). The first time I launched Claude Code, I created a CLAUDE.md file at ~/.claude/CLAUDE.md. This is where I keep the cross-project rules of engagement—how I like to work with Claude. Mine includes: Always create a plan for me to review before you start any work; Give me direct feedback (no hedging, no gentle suggestions); Use bullet points for summaries; Ask clarifying questions one at a time so I can give complete answers; No emojis unless I explicitly ask for them. Claude Code automatically loads this file at the start of every session, so I never restate my preferences.

Layer 2: Project-Specific Instructions. Different projects have different rules. In my writing workspace, the Project CLAUDE.md sets the roles (I’m the primary writer; Claude is my thought partner and editor), defines a multi-round review flow (content → structure → accuracy → typos), prioritizes human readability over SEO, and points to my writing style guide. In my task management system, I include how my Trello integration works, file naming conventions for tasks, and how to process research papers into summaries. In my code projects, I specify the technology stack (Node.js vs. Python), testing framework (Jest for Node.js, pytest for Python), code style and conventions, project architecture and directory structure, and which dependencies and libraries to use. Each project directory has its own CLAUDE.md, and Claude automatically loads the relevant file when I’m working there.

Peek inside a markdown playbook for collaborating with Claude—covering session setup, roles, editorial standards, and research steps—to show how saved instructions create consistent results without repeating yourself.

Layer 3: Reference Context (Pull as Needed)—the real power. LLMs have a context window—a limit to how much they can process at once. Even within that limit, loading too much degrades performance due to “context rot.” The remedy is ruthless context management: small, targeted files that load only when needed. Keep CLAUDE.md files concise and focused on rules and workflows. For detailed knowledge, create separate reference files and list them in your CLAUDE.md so Claude knows they exist and when to fetch them. When I ask for help creating a landing page, Claude knows to use my business profile, the product file, and my target customers context.

Here’s what most people miss: you don’t cram everything into global or Project files. You maintain small, reusable reference files that Claude only loads on demand. In my walkthrough, I share exactly which context files I created and why; how I got Claude Code to help me create them; how I break them into small, reusable components so Claude gets precisely what it needs; how I keep everything up to date; and step-by-step instructions so you can set up a similar memory system.

Three project notes funnel into Claude Code, turning reusable context into working output. This visual shows how saving key docs as memory lets the AI pick up where you left off and skip repetitive prompting across tasks.

Let’s dive in.

Inspired by this post on Product Talk.

November 5, 2025
AI at Home, Impact at Work: Experiments That Supercharged My Product Leadership

I recently tuned into an insightful All Things Product episode featuring Teresa Torres and Petra Wille on how experimenting with AI in everyday life sharpens how we build AI-powered products at work. The core premise resonated deeply with my AI Strategy: low-stakes, personal experiments accelerate confidence, clarify limitations, and build an AI product toolbox we can bring into the office with rigor.

If you want to dive in, you can listen on Spotify or Apple Podcasts. I found the conversation especially relevant for product trios and anyone shaping LLMs for product managers in high-stakes environments.

The idea is simple but powerful: when I prototype with AI at home—where the stakes are low—I learn faster, make safer mistakes, and internalize critical product patterns. Over time, those patterns transfer directly to work: tighter context management, sharper bias awareness, clearer human-in-the-loop guardrails, and a more nuanced view of when to use AI as a thought partner versus when to consider agentic AI.

In my own practice, I’ve mirrored many of the scenarios discussed: using ChatGPT by OpenAI to plan meals, analyze public data sets like school budgets, and even sanity-check real estate evaluations. These seemingly mundane tasks are fertile ground for learning about context window limits, hallucination (artificial intelligence), AI bias, and privacy-by-design trade-offs. Each experiment helps me craft better prompts, structure data for clarity, and decide when a human review step is non-negotiable—core habits for AI risk management.

At work, I treat AI as a thought partner for writing, research synthesis, and contract review. I also explore when and how to responsibly evolve toward agentic AI for repeatable workflows. The distinction matters: a thought partner augments judgment; an agent automates execution. Building the right scaffolding—data governance, auditability, constraints, and escalation paths—ensures we unlock speed without compromising safety.

Three lines from the episode stayed with me: “I’m trying to write things that only I can write — that’s my guiding writing light right now.” — Teresa. “The more we use AI, the more we learn what it’s good at, what it’s not good at, and where context becomes a limitation.” — Teresa. “It’s a safer playground — we can build our toolbox at home before bringing those lessons to work.” — Petra. These are practical north stars for product management leadership in the GenAI era.

For anyone getting started, here’s what worked for me: begin with “low-stakes” personal experiments, write down your prompts and outcomes, and reflect on failure modes. Treat each activity as product discovery: What problem am I solving? What outcome matters? What data and context does the model need? Which decisions must stay human-in-the-loop? This discipline builds an AI product toolbox you can confidently apply to real customer problems.

I also keep a running toolkit of references and tools that inform my practice: Context window as a concept helps me size and sequence information. Visual and video tools like Midjourney and Sora expand how I think about multimodal experiences. I rotate between Claude by Anthropic and ChatGPT by OpenAI depending on task fit, and I’ve used Claude Code when I need structured assistance with code review. For knowledge capture and workflow, Readwise and Ghost help me structure insights and ship content.

If you want more structured learning paths, I found Josh Seiden’s Learn AI With Me, A 30-Day Sprint to be a practical primer, and the broader community conversation at Product at Heart Conference is invaluable. For a deeper grounding in risk, I recommend reviewing topics like Hallucination (artificial intelligence), AI bias, and Agentic AI—and revisiting the complementary episode, Context is King.

I’d love to hear how you’re experimenting: Where have you seen AI meaningfully reduce toil? Where does it still struggle? How are you balancing creativity, data safety, and compliance as you scale? Drop a comment below and let’s compare notes—especially on patterns that help product trios move faster without sacrificing trust.

Bottom line: start small at home, carry lessons into the office, and build with curiosity and intentionality. That’s how we level up our product discovery, sharpen our value proposition, and lead teams confidently through the GenAI transition.

Inspired by this post on Product Talk.

November 4, 2025
From Chaos to Consistency: How I Built a Scalable AI Content Design Agent with RAG

It’s Monday morning, and my Slack and email are already overflowing with content requests: “Can you review this flow?”; “Can you rewrite this screen?”; “Can you name this feature?” I’m not freshly back from holiday—this is just a regular work week kicking off. If you’ve ever been a solo content designer supporting multiple teams, you’ll recognize the pressure. The pipeline for content in product design is always full, and the demand for expertise never stops.

Fixing this isn’t just a matter of better time management or incremental process tweaks. To truly scale, I needed to extend my reach by bringing AI into the design process—without sacrificing judgment, standards, or quality. That Monday morning, I realized I had to scale my skills, my judgment, and our systems, not just my calendar.

Building AI is fundamentally about building systems. I wanted to use AI to scale myself without devaluing critical thinking or flooding the product with generic, verbose content. I also knew a useful AI tool must do more than spit out microcopy—it has to plug into a system we can continually shape. As a content designer, the system is always the starting point. Strong design systems create strong content standards; then AI agents can produce content that meets those standards at speed, freeing me from the bulk of standardized work. That’s not a threat—it’s an advantage. To instruct AI well, our systems must be well constructed.

I often think about this work like a bakery. You need a recipe before you can make a loaf of bread. Most interface content churns out the same loaf, day in and day out. It’s better for the master bakers to focus on the unique, custom bakes—and how the recipe needs to change. With that mindset, I set out to build an AI content design agent.

Inside the Content Design Agent workspace, a clean chat UI titled VERBI pairs a central prompt box with chips for writing, editing, and reviews, plus clear controls to view permissions and open the agent setup for product teams.

When I started this project back in May 2025, many LLMs still had frustrating limitations. Google Gemini let me build a custom Gem agent, but I couldn’t share it with other users. ChatGPT could be customized, but only with static files: I couldn’t point it to live, updatable URL sources. I settled on Glean for three simple reasons: everyone at the company had access; Glean could access all internal documentation and treat URLs as sources of truth; and its then-new Agents feature made AI search customizable. Configuring an agent in Glean is straightforward—you choose a trigger, a set of prompts, and a set of actions—but first I needed to get the inputs right.

AI agents need focus. We had a wealth of internal information at Intercom, but not all of it was current or reliable. I curated exactly what the agent could access and assembled a tightly governed knowledge collection in Glean. Only essential information made the cut: the Intercom style guide—our definitive house style, including regularly-broken rules like “always write in US English” and “use sentence case everywhere”; tone of voice guidance for how we show up across mediums; a product glossary with hundreds of feature names and writing conventions; a monetization glossary for prices, plans, and add-ons; product marketing messaging guides with positioning for every feature and launch; core research insights across the product; and fin.ai and intercom.com/suite as the official, most up-to-date messaging sources.

This is classic RAG (retrieval-augmented generation) in action, ensuring every answer is grounded in approved sources of truth. With the collection in place, I instructed the agent to prioritize these resources above anything else.

Step into a clean, no-code builder that shows how to assemble a Content Design Agent: kick off with a chat-trigger, run a company search, then respond with expert guidance, all guided by a simple starter checklist.

Then came the fun part—building and branding the agent. “Content Design Assistant” felt bland, so I named it VERBI, a nod to its “verbal” design job. When people interact with VERBI, they usually begin with a question, but the intent varies widely. I defined a set of task prompts to guide expectations and outputs: “Can you write this?”; “Can you edit this?”; “Can you review this?”; “Can you name this?”; “Give me options”; “Give me guidance”; “Give me strategy”; “Give me research.” This mirrors the real breadth of content design, from creation to critique to discovery.

To manage responses, VERBI needed three things: start with a specific task prompt; understand how to draw on the right resources each time; and connect with other systems. With task prompts defined, I wrote a detailed system prompt covering the essentials. Role: you are a content designer, supporting product designers. Employer: Intercom (consisting of Fin AI Agent and our next-gen Helpdesk). Resources: content design collection, research collection, Storybook design system. Tone of voice: follow a specific tone for our UI, adjust the tone for everything else. Components: for UI, use the specific guidelines in our design system only. Use cases: writing, editing, critiquing, naming, researching, and more.

One connection mattered most: our design system, recently rebranded as “Surge.” Surge contains detailed content guidelines for every component in our product UI, from accordions and banners to tabs and tooltips. That granularity took months of human effort to codify, and it paid off. Designers no longer guess how to write for a toggle, a button, or a tooltip—and now VERBI understands and enforces those rules, too. A great content design assistant isn’t just a clever system prompt; it needs deep, component-level guidance to retrieve.

UI documentation showcases the Badge component’s content rules, teaching how to name statuses, define types, and apply color so labels read clearly. A handy visual for building a content design agent and ensuring consistent product messaging.

Accessing the design system wasn’t simple at first. It lives in Storybook, which Glean couldn’t access directly. I started by scraping guidance from Storybook into an HTML file with Cursor and uploading it to VERBI—a functional but clunky workaround that required re-scraping every few days. Then our IT team stepped in. They used the Glean Indexing API to turn Storybook into a live data source. Now VERBI connects to Storybook directly. Ask it something ultra-specific, like the correct date format for Japan, and it returns the right answer. That integration elevated the agent from helpful to indispensable—human-level precision, 24/7, at scale.

With prompts and resources in place, I launched VERBI and pressure-tested it. It was accurate and well-informed most of the time, but like any AI agent, it had quirks. I needed it to act as a gatekeeper, not a brainstorming partner that might bend rules or invent new ones. So I added a few explicit guardrails to the system prompt. Stopping sycophancy: “Inform, challenge, and assist. Never placate. Don’t agree by default. If something’s wrong, say so. Challenge assumptions.” Halting hallucinations: “If you don’t find the information required in our resources, say you don’t know the answer. Don’t guess and don’t give answers based on general knowledge.” Avoiding verbosity: “Keep answers short and to the point. Cut the fluff. Skip all niceties and social padding. Only give longer answers if the user asks you to.” These constraints keep responses crisp, correct, and consistent. Like any living system, the prompt needs occasional tune-ups, but the maintenance is minor compared to the upside.

Where we are now: VERBI has been triggered 700+ times since launch. The benefits are tangible. For me, quality scales without constant policing; repetitive questions about naming, style, or punctuation have dropped significantly. I reclaim time because the agent drafts and checks V1 content across teams, enabling me to focus on higher-impact work. For the design team, iteration is faster, confidence is higher, and strategic clarity improves because shared language and grounded guidelines make decisions easier and more consistent.

I used to spend too much time mopping up basic content mistakes and untangling spaghetti-like UI copy prone to human error. VERBI removes those errors at the source. The real advantage is speed: we get from blank slate to a high-quality first draft quickly, which means we can spend our energy deciding whether the content is right, not just “good enough.” Design is the whole interface—words, visuals, interactions—so reviews now happen with real content, never “copy TBD.” Our principle to sweat the details applies equally whether work is human-made or AI-assisted.

Knee-jerk critiques of AI-driven content design often assume teams generate content from nothing and ship it. In reality, great AI is the outcome of great human decisions and strong systems. Its value is pulling us together faster—getting us to a complete, standards-compliant design we can review as a team before sharing it with the world. That’s how AI helps us win: by turning chaos into consistency, and consistency into velocity.

Inspired by this post on The Intercom Blog.

October 31, 2025
What I Learned from Trainline’s Agentic AI: Building a Trusted Travel Assistant at Scale

Over the past year, I’ve been shipping agentic AI into production and coaching product teams on what it really takes to make these systems trustworthy in the wild. One story that crystallizes the playbook comes from Trainline’s move to an agentic architecture for travel assistance—an approach that mirrors what I’ve seen work in high-stakes, real-time customer experiences.

Trainline—the world’s leading rail and coach platform—helps millions of travelers get from point A to point B. Now, they’re using AI to make every step of the journey smoother.

I studied how "David Eason (Principal Product Manager) Billie Bradley (Product Manager), and Matt Farrelly (Head of AI and Machine Learning)" approached the build of "Travel Assistant, an AI-powered travel companion that helps customers navigate disruptions, find real-time answers, and travel with confidence." Their work exemplifies the kind of end-to-end thinking required to move beyond demos into dependable, on-the-go assistance.

They share how they: Identified underserved traveler needs beyond ticketing; Built a fully agentic system from day one, combining orchestration, tools, and reasoning loops; Designed layered guardrails for safety, grounding, and human handoff; Expanded from 450 to 700,000 curated pages of information for retrieval; Developed LLM-as-judge evals and a custom user context simulator to measure quality in real-time; Balanced latency, UX, and reliability to make AI assistance feel trustworthy on the go.

I align strongly with their core takeaways: "AI assistants need both scalable reasoning and deep domain context to be useful." "Tool design and guardrails are as critical as prompt design in agent systems." "LLM-as-judge evals make it possible to measure open-ended systems without massive labeling costs." And perhaps most importantly, "Even legacy companies can move fast when they embrace experimentation and tight PM–engineering collaboration."

From an AI strategy perspective, starting "fully agentic" was the right call. When the problem space is dynamic—disruptions, route changes, fare conditions—reasoning loops and orchestration aren’t luxuries; they’re table stakes. Tool selection becomes product design: you need the right retrieval interfaces, constraint-aware planners, and API contracts that are resilient to partial failures. Layered guardrails for safety, grounding, and human handoff reduce hallucination risk while preserving responsiveness—critical when users are standing on a platform waiting for an answer.

The retrieval scale-up—"Expanded from 450 to 700,000 curated pages of information for retrieval"—is a classic inflection point. I’ve seen teams stall here when they treat content growth as a pure indexing problem. The winning move is curation and structure: normalize sources, encode policy-level constraints, and align retrieval chunks to decision boundaries the agent actually uses. That’s how you keep precision high while coverage explodes.

Evaluation is where most open-ended assistants fail quietly, which is why I was encouraged to see "Developed LLM-as-judge evals and a custom user context simulator to measure quality in real-time." In practice, LLM-as-judge gives you scalable, scenario-based scoring without prohibitive labeling, while a user context simulator surfaces regressions tied to persona, itinerary state, and device constraints. The combination closes the loop between model behavior, tool layer changes, and UX outcomes.

On product delivery, the decision to have the system "Balanced latency, UX, and reliability to make AI assistance feel trustworthy on the go" shows mature prioritization. For travel, trust accrues in seconds: fast-enough responses, graceful degradation when upstream data lags, and explicit handoff when confidence dips. This is where guardrails meet UX writing—clear, bounded language signals competence even when the system defers.

Finally, the organizational pattern matters. The teams that win in agentic AI are cross-functional, experimentation-driven, and ruthless about instrumentation. Tight PM–engineering collaboration, explicit safety thresholds, and an eval stack that mirrors real user journeys are what turn promising architectures into dependable products.

It’s a behind-the-scenes look at how an established company is embracing new AI architectures to serve customers at scale.

If you’re building agentic AI in production, borrow these moves: invest early in tool and guardrail design, scale retrieval with curation not just volume, adopt LLM-as-judge plus context simulation for continuous evaluation, and treat latency and reliability as core product requirements—not afterthoughts. That’s how you ship AI assistance that customers trust when it matters most.

Inspired by this post on Product Talk.

October 30, 2025
Why We’re Building Our Next AI R&D Hub in Berlin—and Hiring 100 to Power Fin’s Growth

I’m excited to share that we’re opening our next R&D hub in Berlin to support significant investment in our AI customer service platform, Intercom, and market-leading AI Agent, Fin. We intend to hire 100 people in Berlin over the year ahead across engineering, AI, data science, product, and design. This move reflects our AI Strategy, our commitment to product management leadership, and our focus on building enduring product-led growth.

We believe that in a short number of years, the vast majority of customer service will be done by AI. Fin is already the world’s best Customer Service Agent. At Pioneer, our recent summit for AI customer service leaders in NYC, we talked about how Fin will become a true end-to-end Customer Agent, extending far beyond service. We showcased how companies like WHOOP, Anthropic, and Lightspeed are already pushing Fin in ways that help them grow their business.

This market opportunity is massive and expanding at unprecedented pace. Our ambition is to earn our place as one of the most successful AI businesses during this wave of AI disruption, and we want more brilliant people on our team to pursue this as aggressively as possible. If you’re motivated by Generative AI, LLMs, and building real products that scale, you’ll find both challenge and impact here.

We are already on track to be one of the fastest growing private software companies. Fin is the primary contributor to this, and is months away from passing $100m in ARR. So far, more than 7000 businesses have transformed their customer service with Fin, including German companies like electricity provider Ostrom, smart home technology provider tado°, and grocery delivery company Flink, along with global leaders like Vanta, Clay, Lovable, and Miro.

Why Berlin? We’re drawn to the city’s rare blend of deep technical talent and rich creative culture—within a vibrant, globally connected ecosystem close to our R&D hubs in Dublin and London. It’s a place where top-tier engineers and designers thrive, and where ambitious builders from around the world want to relocate and create category-defining products.

Momentum is building: this month-by-month chart shows a consistent rise from the mid-20s to nearly 70% between May 2023 and Sep 2025—signaling strong progress as we expand engineering, AI, and automation at our new Berlin R&D hub.

We needed a new location that would sustain the high ambition and standards held by our world-class AI teams in Dublin and London. Berlin has emerged as one of Europe’s hottest centers for AI talent, with a high density of AI-focused startups, applied research labs, and practitioners who bring exceptional literacy, optimism, and ambition. It’s the right accelerator for our AI hiring and a place to bring in brilliant minds to shape the future of our product and business.

While Intercom’s reach is global with our headquarters in San Francisco, our R&D leadership remains anchored in Dublin, where half of the executive team sits—making Berlin both geographically and strategically an ideal next location for our growth.

This isn’t our first time expanding our footprint; we previously bet on London and are delighted with how that’s been working. When we shared our Berlin news internally, the energy was palpable, with many teammates volunteering to help spin up the hub successfully—including colleagues who helped make London a big success, like Danny. That level of ownership and momentum is exactly what we aim to cultivate in Berlin.

We’re looking for people who thrive in a high-intensity, high-ambition, high-standards environment and want to help build one of the world’s best AI companies. For builders like that, the opportunity for impact, growth, and career progression is extraordinary. As with London and Dublin before it, the early Berlin cohort will have a disproportionate influence on team norms, culture, and long-term outcomes. We are in the middle of a huge disruptive wave with AI, and Fin is one of the leading examples of commercially successful AI applications. Joining Intercom is an opportunity to be part of this disruptive wave, and help us build out our vision for Fin becoming the world’s best Customer Agent.

On a minimalist stage, four speakers share insights on AI research, automation, and engineering as part of a panel tied to Berlin expansion and the launch of a new European R&D hub.

There are plenty of AI companies to join, but our technology and culture set us apart. Any AI product is only as good as the AI layer powering it. Ours is industry-leading, built by a highly talented, ambitious, and technical team of over 40 machine learning scientists, engineers, and designers in Europe who continuously optimize Fin’s performance through cutting-edge research, experimentation, and innovation. Fin’s average resolution rate increases 1% every month. That kind of steady, compounding improvement is exactly what great customer support AI strategy looks like in practice.

We also build in public and share our progress and learnings with the AI community at large. Recently, our Chief AI Officer Fergal Reid and SVP of Engineering Jordan Neill joined leaders from Cognition, Harvey, and Perplexity in San Francisco to share real lessons, challenges, and breakthroughs from building frontier AI products. Our AI team regularly publishes their insights on the AI research blog; from optimizing inference speed and availability, to building our own proprietary models that outperform general purpose models for CX.

Our AI group and the broader R&D org they operate within work at extraordinary scale and speed. We recognize that moving fast can’t be taken for granted—you must fight for it—and we’re doing just that, embracing the capabilities AI tooling brings us to achieve 2x the throughput. One example of this mindset in practice is us “Betting on the future of frontend at Intercom,” making a technology choice that optimizes for our teams’ ability to build high-quality product, fast.

Our design and product teams are world-class and forward-thinking; they’re embracing AI to evolve how they work, as shared in our 3-point framework for AI-driven design and recently presented by Emmet Connolly, our SVP of Design, at this year’s Hatch conference in Berlin. As a product leader, I’m grateful to work alongside brilliant product and design thinkers—it gives me confidence that we’re solving the right problems, solving them well, and driving real impact.

From live demos to hands-on coding, this snapshot captures the momentum we're bringing to our Berlin R&D hub – AI experiments, hand-tracking prototypes, and simulation tools powering our next wave of engineering.

We plan to open our Berlin office space in December or January. To get the office started, we’re hiring Senior Product Engineers, Machine Learning Scientists, Product Managers, Senior Product Designers, Engineering Managers, and Data Scientists immediately. If your craft sits at the intersection of LLMs for product managers, agentic AI, and empowered product teams, you’ll be right at home.

You can learn more about our open roles, company, culture, and locations on our careers site, or feel free to reach out to me, Jordan, Fergal, or Brian directly on LinkedIn if you have any questions.

Some of our engineering team will also be at LeadDev Berlin on November 3rd—come say hi if you’re attending.

I’m looking forward to continuing to build Intercom as one of our generation’s best AI companies—and I’m excited for our expansion into Berlin to be a major contribution to that success.

Inspired by this post on The Intercom Blog.

October 29, 2025
Context Is King: My Playbook to Prep Product Teams for High-Impact AI Collaboration

Context is king in AI-powered product work—and I felt that deeply while digging into “Context is King – All Things Product Podcast with Teresa Torres & Petra Wille.” The conversation affirmed a truth I see daily: AI becomes a powerful teammate only when we give it the right context, just as we do with empowered product teams. When we treat AI like a colleague joining mid-flight—without our company history, industry nuances, or strategy—we instantly unlock better outcomes.

Listen to this episode on: Spotify | Apple Podcasts

Here’s what stood out and how I’m applying it. First, most AI outputs fail without proper context. That’s not a model problem; it’s a leadership problem. Thinking of AI like onboarding a new intern is the right mental model—start with the minimum viable context, then iterate. Practical first steps matter: decision logs, clear success metrics, and structured documentation. The art is balancing enough context to guide performance without overloading the system. The parallels are striking: the way we create strategic context for product trios and teams is the same way we’ll empower agentic AI systems.

In my teams, we prepare for AI collaboration by operationalizing context. We keep decision logs to capture the why behind choices, use outcome-based success metrics (not just output), and maintain machine-readable documentation that LLMs for product managers can parse reliably. We define guardrails up front—constraints, customer segments, privacy-by-design considerations, and the non-goals that often trip up gen ai. This foundation turns AI from a novelty into a force multiplier for product discovery and product roadmapping and sprint planning.

I use a simple “context pack” to onboard AI agents and teammates alike: 1) business goals and outcomes, 2) constraints and guardrails, 3) canonical artifacts (like PRDs, journey maps, interview notes), 4) domain vocabulary and definitions, and 5) operating procedures (how we make decisions, when to escalate, what good looks like). Start small, then refine as the AI demonstrates capability. This mirrors great onboarding—and it works just as well for agentic AI as it does for humans.

Not all context is helpful. More isn’t better; the minimum effective context is. I resist the urge to dump our entire Confluence on an AI system. Instead, I progressively reveal relevant details—just like I would with a new PM on a complex problem space. This keeps signals high, noise low, and performance measurable against clear success metrics.

If your org isn’t adopting AI yet, don’t wait. You can become AI-ready now by documenting strategic intent, decision rationale, and definitions in structured, searchable, machine-readable ways. Treat this as core AI Strategy work that strengthens empowered product teams—regardless of tooling—while building your AI product toolbox for tomorrow.

For those who want to explore further, these resources and mentions are a strong complement to the episode’s themes.

Follow Teresa Torres: https://ProductTalk.org

Follow Petra Wille: https://Petra-Wille.com

Agentic AI

Teresa’s new podcast, Just Now Possible in Youtube, Apple Podcast, and Spotify

Petra’s Coaching Packages

ChatGPT

Henrik Kniberg’s talk at Product at Heart on treating AI agents like interns

Teresa’s webinars on how she built the Product Talk Interview Coach: Behind the Scenes: Building the Product Talk Interview Coach and How I Designed & Implemented Evals for Product Talk’s Interview Coach

Josh Seiden’s blog series about AI

Teresa’s new blog posts: 15 Ways to Use AI at Home (and Fill Your AI Product Toolbox) and 21 Ways to Use AI at Work (And Build Your AI Product Toolbox)

Petra's new blog post: Why Context, Not Just Data, Will Define AI-Ready Product Teams

Have thoughts on this episode or how you’re preparing your teams to collaborate with AI? Leave a comment below—let’s compare playbooks and level up together.

Inspired by this post on Product Talk.

October 28, 2025
Inside Japan’s AI Marketing Shift: How 500 Teams Boost Efficiency, Results, and Careers

I just finished reviewing new findings on Japan’s marketing landscape, and the signal is clear: AI isn’t just a shiny tool—it’s a force multiplier for outcomes and careers. The headline that caught my attention, "Amplitude Releases New Research in Japan: Marketers are Unlocking Efficiency, Results, and Career Growth," aligns with what I’m seeing on the ground: teams that blend disciplined analytics with pragmatic AI adoption are pulling ahead.

Amplitude released a new survey of 500 Japanese marketers, which reveals how teams are benefiting from AI. Get the insights from the data

Here’s how I interpret the shift. AI accelerates the cycle from insight to action when it’s grounded in a unified analytics platform. With Amplitude analytics stitched into campaign and product signals, marketers can move beyond vanity metrics to diagnose true drivers of activation, engagement, and retention. That’s where efficiency compounds: fewer blind spots, faster iteration, and clearer attribution of what actually drives results.

On the strategy side, I’m seeing two dominant patterns. First, gen ai is speeding up creative workflows—audience research, message testing, and content generation—without sacrificing brand rigor. Second, agentic AI is emerging in operational loops: routing leads, prioritizing segments, and suggesting next-best actions based on behavioral data. The common denominator is data governance; without clean event schemas and consent-aware pipelines, AI amplifies noise instead of signal.

For product-led growth motions, this research validates what empowered product teams have practiced for years: instrument the customer journey, frame outcomes vs output OKRs, and experiment in short, learnable cycles. When marketing, product, and data join forces as true product trios, teams can run in-app guides and product tours, tune onboarding, and perform rigorous retention analysis that ties growth to product value rather than spend.

My playbook in this environment is simple but disciplined. Start with first principles decision making: define the problem, the decision, and the evidence required. Use a unified analytics platform to connect lifecycle events across acquisition, activation, and expansion. Align go-to-market strategy with product roadmapping and sprint planning, so insights move directly into experiments—not slide decks. Then close the loop with clear outcome metrics and QBRs that reward learning velocity, not activity volume.

There’s also a career arc embedded in this shift. Marketers who cultivate analytical fluency and AI literacy are becoming indispensable partners to product management leadership. They can articulate a differentiated value proposition, shape product positioning with live behavioral data, and influence board-level narratives with credible, causal evidence. That combination—story plus signal—unlocks both performance and professional growth.

My commitment going forward is to operationalize these lessons: tighter event taxonomy, sharper outcomes framing, and more systematic experimentation across channels and in-product touchpoints. With the right data foundation and a pragmatic AI strategy, we can convert curiosity into capability—and capability into repeatable growth.

Inspired by this post on Amplitude – Perspectives.

October 24, 2025
How Luminance Builds Legal-Grade™ AI at Scale: My Product Lens on Trust and GTM

I’m fascinated by how the most credible legal-tech platforms operationalize AI in the enterprise, where risk tolerance is near zero and trust is the product. When I evaluate solutions in this space, I look for rigor in model design, governance, and go-to-market execution—not just raw model performance.

Discover how Luminance CEO Eleanor Lightbody builds Legal-Grade™ AI for enterprise. See how their specialized, agentic AI models lawyers trust at scale.

That framing resonates with me. “Legal-Grade™” isn’t a slogan; it’s a product requirement that implies auditable decisions, explainable outputs, robust data governance, and demonstrable accuracy under real-world legal workflows. “Agentic AI” adds another layer: autonomous orchestration of tasks with explicit guardrails, role definitions, and escalation paths to humans-in-the-loop.

From a product management perspective, I start with outcomes. For legal teams, the jobs-to-be-done are concrete: contract analysis and redlining, due diligence, compliance reviews, investigations, and eDiscovery. The success criteria are equally concrete: precision and recall on domain-specific clauses, latency under load, traceability of sources, and the ability to scale across matter types, jurisdictions, and languages without degrading trust.

Building that foundation requires deliberate AI strategy. I look for domain-specialized models, retrieval-augmented generation tuned to legal corpora, evaluation harnesses with gold-standard datasets, and continuous red-teaming. Just as important are deployment choices—on-prem or VPC isolation, encryption in transit and at rest, strict PII handling, and granular access controls—to satisfy the security posture of enterprise legal and compliance teams.

Governance is where “legal-grade” is won or lost. Robust audit trails, versioned prompts and policies, model cards, clear data lineage, and event logs that support defensibility are table stakes. Human review workflows, explainability tooling, and remediation paths ensure the system remains trustworthy when edge cases arise.

On product process, I favor empowered product teams and forward-deployed engineers partnering directly with attorneys and legal ops. Co-designing workflows with subject-matter experts surfaces the right constraints early: how redlines are presented, what confidence thresholds trigger review, and where to anchor the user experience in familiar legal tools and document structures.

Competitive differentiation and product positioning hinge on clarity: what specific legal outcomes are delivered faster, safer, or more accurately than alternatives? I prioritize transparent benchmarking against baselines, proof-of-value pilots that mirror production data conditions, and pricing that aligns to measurable outcomes (e.g., time-to-first-draft, review throughput, or risk reduction) rather than abstract usage metrics.

Go-to-market strategy in enterprise legal is a discipline in itself. Expect rigorous InfoSec reviews, stakeholder alignment across legal, IT, and procurement, and the need for customer references that demonstrate “trust at scale.” Clear messaging around value proposition, safety posture, and operational readiness shortens cycles and builds confidence among risk-averse buyers.

The big takeaway for product leaders: Legal-Grade™ AI isn’t about novel models; it’s about orchestrating specialization, safeguards, and enterprise-grade delivery into a coherent system that lawyers can rely on daily. When agentic AI is harnessed with the right guardrails and domain depth, it becomes a force multiplier for legal teams—accelerating work without compromising standards.

Inspired by this post on Amplitude – Perspectives.

October 24, 2025

Governed GenAI Delivery: A Practical Operating Model

Your team has a GenAI prototype that looks convincing in a demo. The launch meeting exposes a harder problem: nobody can say exactly which data it may use, which failures block release, who reviews an exception, or how to turn it off without breaking the workflow.

That is a delivery problem, not a policy-writing problem. Governed GenAI delivery gives every workflow an explicit risk boundary, evidence-based release gates, named decision owners, and a safe path back when the system behaves unexpectedly. Done well, it removes late-stage uncertainty without lowering the bar for trust.

Start with a delivery contract, not a policy library

A broad AI policy can describe good intentions and still leave a product team unable to make a release decision. Before a GenAI workflow enters the backlog, create a delivery contract on the same page as its value hypothesis. Use one contract per workflow because the customer, data, possible action, and cost of failure can change even when several features use the same model.

The contract should answer these questions in language that product, engineering, design, security, and business owners can all test:

User and moment: Who receives the output, and what are they trying to accomplish at that point in the journey?
Intended outcome: Which customer or business behavior should improve? Name the outcome rather than an output such as messages generated.
Allowed inputs: Which data classes may enter the prompt, retrieval layer, model service, logs, and evaluation environment?
Allowed outputs and actions: Is the system drafting, recommending, deciding, publishing, or changing an external system?
Failure boundary: Which errors are inconvenient, which require human review, and which must prevent release?
Decision rights: Who approves the use case, the data boundary, the evaluation results, and an exception?
Evidence and escape hatch: What must be true before launch, and what fallback or rollback will protect the user if it stops being true?

Route review by consequence, not by how impressive the technology appears. A familiar model can support a risky workflow, while a new model can be relatively low-risk when it only prepares an internal draft that a qualified person must inspect.

Workflow property	Default delivery treatment
Internal drafting or analysis that a trained employee reviews before use	Constrain the data, evaluate task quality, disclose the assistance where required, and preserve the employee’s ability to reject the output.
Bounded customer-facing output such as onboarding guidance, contextual help, or lifecycle messaging	Apply brand and policy checks, test representative journey scenarios, release to a controlled audience, and monitor both experience and product outcomes.
Pricing, security, compliance, incident communication, sensitive-data handling, or an action with material external consequences	Keep the final judgment human-led. Require the relevant domain owner to approve the boundary, evidence, release path, and exception process.

The last row is deliberately strict. In high-judgment moments, AI can assist with drafts and analysis while a person retains the final decision. If the workflow involves regulated activity, contractual exposure, or sensitive personal data, have qualified privacy, security, compliance, or legal owners define the applicable requirements. A product team should not interpret those obligations on its own.

Run product discovery and risk discovery in the same loop

Governance becomes slow when a team builds the experience first and asks for risk approval at the end. By then, data choices, vendor dependencies, prompts, and user expectations are embedded in the design. A late objection forces a rewrite because the risk work never influenced the product shape.

Keep the product trio accountable for customer value, then bring domain specialists into discovery when the workflow crosses their boundaries. PM, design, and engineering should shape the in-product experience together; security, privacy, data, compliance, support, and domain owners should contribute decisions rather than becoming a standing approval audience for every meeting.

Use a narrow slice to answer feasibility, usability, safety, and value questions in parallel. A two-week iteration cycle with explicit exit criteria can keep the investigation focused, but the calendar is not the goal. Each cycle must retire a named uncertainty.

Useful exit questions include:

Can the workflow complete the intended job on representative inputs, including ambiguous ones?
Can the user understand what the system did, correct it, and recover when it cannot complete the job?
Does every data flow stay inside the approved boundary?
Can the team observe the prompt, retrieval context, output, action, fallback, and policy decision without exposing prohibited data?
Does the workflow improve the intended behavior, or does it merely generate plausible-looking content?

Map the data path before connecting production information. Record where data originates, what is added through retrieval, which model or service receives it, what enters logs and traces, how long those records are retained under your policy, and which downstream system receives the output. A prototype is not permission to run a customer pilot with unapproved data. Use synthetic, de-identified, or explicitly approved information until the data owner authorizes the next stage.

Customer-facing language needs its own product specification. Convert voice and tone into examples of acceptable and unacceptable language for specific customer moments. Add the audience, channel, goal, length, reading level, regional spelling, accessibility constraints, and sensitive-topic rules to the prompt pattern and evaluation criteria. A generic instruction to sound like the brand is too subjective to test and too easy to reinterpret.

Version the system prompt, model configuration, retrieval sources, policy rules, and tool permissions. Without that record, a team cannot tell whether a changed result came from the product, the model, the context, or the controls.

Turn evaluations into release gates

A good demonstration proves that the workflow can succeed once. A release gate asks whether it succeeds often enough for its purpose, fails inside the agreed boundary, and gives the team enough evidence to intervene. If an evaluation has no acceptance rule and no decision owner, it is an observation rather than a gate.

Build the evaluation pack before tuning to it

Create the first evaluation pack from the delivery contract and customer journey before repeated prompt changes move the goalposts. It should contain:

Representative cases from the personas, lifecycle stages, and tasks named in the use case.
Ambiguous and incomplete inputs that reveal whether the system asks for clarification or invents missing context.
Prohibited and sensitive cases that test the explicit policy boundary.
Failure and recovery cases that verify fallback behavior, escalation, and user-facing explanations.
Brand and interaction cases for customer-facing language, including the moments where tone must change.
Previously observed failures, preserved as regression cases after the underlying issue is corrected.

Keep a stable release set so results remain comparable. Add new cases as the product learns, but do not silently remove difficult examples or rewrite old expected behavior to make a new version pass.

Keep separate gates for separate kinds of evidence

Do not collapse every evaluation into one average score. A strong task result can hide an unacceptable data disclosure, and polished prose can hide a workflow that does not improve the customer outcome.

Gate	Question	Useful evidence
Task quality	Does the output complete the defined user job?	Labeled scenarios, a scoring rubric, reviewer agreement, and comparison with the current workflow.
Safety and data	Does the system remain inside prohibited-content, privacy, permission, and action boundaries?	Policy checks, adversarial cases, data-flow inspection, and review by the responsible domain owner.
User experience	Can the user understand, edit, reject, and recover from the result?	Usability scenarios, clarity criteria, accessibility checks, tone checks, and recovery-path inspection.
Operational readiness	Can the team detect a failure and safely contain it?	Logs and traces within the approved data boundary, alert ownership, fallback verification, rollback verification, and an incident path.
Product outcome	Does the workflow change the behavior named in the delivery contract?	An experiment plan, a baseline, outcome metrics, guardrail metrics, and segmented analysis.

Set acceptance thresholds from the use case’s consequence, current baseline, and organizational policy. There is no responsible universal pass score for every GenAI workflow. If policy prohibits a behavior, any observed instance of that behavior should fail the relevant gate until the owner accepts a documented exception or the issue is fixed.

Human review also needs testable routing. Send novel narratives, ambiguous exceptions, sensitive cases, and high-consequence decisions to a person with the right domain knowledge. Routine outputs that have passed their gates can stay within the approved automated path. Human review for net-new narratives and automated checks for tone drift and sensitive topics provide a useful division of labor.

The reviewer must see enough context to make a real decision: the user’s approved input, relevant retrieved material, proposed output or action, applicable policy rule, and reason the case was routed. The interface should support rejection, correction, and escalation. Capture those decisions as evaluation data; otherwise the same edge cases will keep returning without improving the release process.

Release progressively and define stop conditions first

Passing a pre-release evaluation does not justify an unrestricted launch. Real inputs, customer behavior, and downstream systems introduce conditions that an evaluation pack may not contain. Expand exposure only as evidence accumulates, and keep every stage reversible.

Exercise the complete workflow internally or offline with synthetic, de-identified, or otherwise approved data. Do not permit external actions during this stage.
Release behind a feature flag or equivalent control to an approved customer cohort. Keep the existing workflow available as a fallback.
Compare quality, safety, experience, operational, and product signals with the release gates. Segment the results by persona and lifecycle stage where the experience differs.
Expand only when the named owners accept the evidence. Preserve rollback until the replacement workflow has met the organization’s operational criteria.

Write stop conditions before launch, when nobody is under pressure to defend a rollout. Pause or roll back when:

Prohibited or sensitive data appears in a prompt, log, retrieval result, output, or downstream action.
A high-consequence output bypasses its required human decision.
A release regresses a gate that the delivery contract marks as mandatory.
The team cannot identify which prompt, model, retrieval set, policy rule, or tool permission produced the behavior.
The fallback or rollback path is unavailable.
An incident has no accountable responder or cannot be contained inside the approved workflow boundary.

Monitor four signal families together. Clarity, reading time, click-through, activation, progress to the aha moment, support deflection, and retention can show whether customer-facing assistance is useful. Quality failures, overrides, escalations, fallback use, latency, and incidents show whether the system is producing that value sustainably.

Signal pattern	What to investigate before expanding
Evaluation quality improves, but the product outcome stays flat	The model may be solving the wrong task, appearing at the wrong journey moment, or adding effort without changing behavior.
The product metric improves, but a safety or data gate regresses	Do not scale the workflow. Short-term engagement does not override a mandatory risk boundary.
An aggregate result improves, but one persona or lifecycle stage declines	Inspect the affected segment and change the experience, routing, or eligibility rather than hiding the mismatch in an average.
Human edits and escalations cluster around the same scenario	Add that scenario to the evaluation pack and correct the prompt, context, policy, interaction, or workflow boundary.

Put these signals in a unified analytics view tied to real outcomes. Separate dashboards encourage separate stories: model quality may look healthy while the customer outcome is flat, or a conversion metric may rise while operational exceptions accumulate.

A/B tests are useful only after every variant clears the same safety, data, and experience gates. Test bounded variations, select the version that improves the intended outcome without violating guardrails, and codify the winning pattern back into the prompt library. That turns an experiment into a reusable delivery asset instead of a one-off launch result.

Give every decision one accountable owner

Governance stalls when everyone is consulted but nobody can make the decision. It also fails when one product owner is expected to approve risks outside their expertise. Assign ownership by decision, and record the evidence each owner must accept.

Owner	Decision they should own	Evidence they should maintain
Product lead	User, use case, intended outcome, eligibility, product guardrails, and expansion decision	Delivery contract, baseline, experiment design, segmented outcome analysis, and decision log
Design or conversation/content owner	Interaction pattern, user control, disclosure, clarity, voice, and recovery experience	Journey scenarios, language criteria, usability findings, and approved recovery patterns
Engineering owner	Architecture, permissions, observability, fallback, rollback, and operational containment	Version records, traces, control verification, runbook, and incident ownership
Data, security, privacy, or compliance owner	Requirements and exceptions within their professional domain	Data map, threat model, approved boundary, policy tests, and documented exceptions
Business or domain reviewer	Judgment for consequential outputs and ambiguous exceptions	Review rubric, disposition history, escalations, and new regression cases

One person may hold more than one role in a small organization. The important constraint is that each decision has a named owner who has the authority and expertise to make it.

Keep a lightweight decision log with the use-case hypothesis, risk treatment, evaluation-pack version, prompt and model version, retrieval and tool configuration, approvals, release scope, stop conditions, exceptions, and observed outcome. The log should answer why a version was released without reconstructing the decision from chat messages and meeting notes.

Treat a change to the model, system prompt, retrieval corpus, tool permissions, data flow, or policy controls as a product change. Re-run the gates affected by that change before expanding exposure. The review can be proportional to the change, but it should never be implicit.

The operating rhythm is straightforward: classify the workflow during discovery, update evidence during each iteration, approve against explicit gates before release, and feed production failures and successful experiments back into the evaluation pack and prompt library. Governance then becomes part of delivery rather than a separate ceremony.

Key takeaways

Govern the workflow, not just the model. The same model can carry very different risks depending on its data, audience, and authority to act.
Write the data boundary, failure boundary, decision rights, release evidence, and rollback path before implementation hardens those choices.
Test feasibility, usability, safety, and value in the same discovery loop so risk findings can change the product design.
Use separate release gates for task quality, safety and data, user experience, operations, and product outcomes.
Route human review by novelty and consequence. Keep the final decision human-led for high-judgment workflows.
Release to controlled cohorts, predefine stop conditions, and turn production failures into regression cases.

For your next GenAI initiative, choose one workflow and complete its delivery contract before approving a pilot. If the team cannot name the mandatory evidence, accountable owners, stop conditions, and safe fallback, the workflow is not ready to reach customers. Once those answers are explicit, the team can move quickly without asking trust to depend on memory or optimism.

References

October 24, 2025

From $2M to $100M ARR: Inside fal’s Explosive Pivot and the Future of Generative Media

Generative media is no longer a curiosity on the edges of product roadmaps—it’s fast becoming a core capability. Watching one company sprint from uncertainty to undeniable traction reminded me how much a decisive pivot, a developer-first brand, and ruthless focus can bend a growth curve. This is a story about finding product-market fit in real time, scaling with intention, and staying lean while the category accelerates beneath your feet.
Gorkem Yurtseven is the co-founder and CEO of fal, the generative media platform powering the next wave of image, video, and audio applications. In less than two years, fal has scaled from $2M to over $100M in ARR, serving over 2 million developers and more than 300 enterprises, including Adobe, Canva, and Shopify. In this conversation, Gorkem shares the inside story of fal’s pivot into explosive growth, the technical and cultural philosophies driving its success, and his predictions for the future of AI-generated media.
What stood out to me first was the clarity of the pivot: “How fal pivoted from data infrastructure to generative inference.” The hardest decisions often feel like abandonment—of code, roadmap, and even identity—but the right pivot reframes everything around a higher-signal customer need. That decision, described as “The hardest decision that saved the company,” unlocked a new trajectory and set a crisp north star for the team.
Equally important was the market intuition. As they put it, “Why ‘generative media’ is a greenfield new market.” Greenfield means pattern-breaking strategy: prioritize outcomes over parity, embrace new workflows rather than retrofit old ones, and measure value in quality, latency, and unit economics—not just features. In my experience, this is where product teams win or lose: you either build the new default or get trapped perfecting the old one.
fal’s “explosive year” wasn’t luck; it was systems thinking applied to a developer platform. The team stayed small—”lean <50-person team” and “Staying nimble as a 45-person company”—and built a brand that feels genuinely for builders: “Building a brand that resonates with developers.” That shows up in everything from docs and SDKs to the cultural quirks that scale signal, like “Why fal has 500 Slack channels.” Velocity and clarity compound when communication is designed for ownership.
Early traction came from sharp use cases and fast feedback loops. I loved the transition arc from “The early adopters of the first fal product” to “The transition from toy to tool.” In a new category, the fastest path to durable usage is making something delightful and then relentlessly hardening it for production: uptime targets, deterministic APIs, transparent pricing, and repeatable performance. That’s how you move from demos to dependable workflows.
The timing call is bold and specific: “Why 2025 is the year of AI-generated video” and “Predicting AI-generated film in 2027.” If you build in gen AI, this matters. Video will force teams to optimize for cost per second, temporal coherence, and developer ergonomics across long-running jobs. The winners will combine model choice (OpenAI, Anthropic, Google DeepMind, Stability AI; “Stable Diffusion XL (SDXL)”, “Sora”, “DALL-E”, “LLaMA”) with world-class inference, smart caching, and autoscaling that feels invisible to the developer.
On the go-to-market side, I see a masterclass in founder-led GTM and developer evangelism. “Competing in a fast-moving, fragmented market” requires sharp messaging and distinctive ideas. The story behind “GPU Rich / GPU Poor” is a perfect example: a memorable narrative that encodes a real infrastructure advantage. Pair that with “fal’s greatest optimization wins” and you get a brand promise rooted in measurable performance, not just clever copy.
Culture and team design are the force multipliers. “How to build a world-class team” and “fal’s unique hiring philosophy” emphasize high-slope talent, ownership, and speed over headcount. The result is a product org that ships, learns, and iterates without bureaucratic drag. For technical founders, “Learning sales as a technical founder” is a reminder that the best sales motion often emerges from the same instincts as great product discovery: ask better questions, observe real workflows, and sell through outcomes.
Here’s how I translate these lessons into a practical playbook for product leaders working in gen ai and developer platforms: double down on developer experience (time-to-first-output, clear pricing, robust SDKs), make latency and reliability your product features, sequence the roadmap from delightful demos to dependable production tools, and stay lean enough to pivot as models and use cases evolve. Above all, treat “Why generative media is a greenfield market” as a call to invent the defaults others will copy.
Looking ahead, the path is clear: as AI-generated video normalizes in 2025 and professional-grade content follows by 2027, the products that win will combine inference excellence with a brand developers trust. If you’re building in this space, now is the moment to ship fast, optimize relentlessly, and meet creators and developers where they already work.

October 22, 2025
Supercharge Your Engineering Org: Alignment, AI, and Productivity from Adobe to Etsy

I obsess over building high-velocity engineering organizations that ship meaningful outcomes. When I evaluate what reliably moves the needle—across startups and scaled enterprises—it always comes back to alignment, disciplined management, and a modern view of engineering productivity. Recently, I revisited a set of insights that crystallize these themes and translate them into practical rituals any leader can adopt.

Kellan Elliott-McCrea is a Head of Engineering at Adobe, overseeing Frame.io, a newly acquired video review and collaboration platform. He is known for his experience and expertise as an engineering leader. He was previously a VPE at Dropbox, and CTO at Etsy where he built and led a team of 300 people, from tech and platform reboot through to IPO. Kellan also built and scaled teams at Flickr, and has a coaching and advising practice for companies looking to supercharge their engineering teams.

Here’s what we dig into when we talk about world-class engineering orgs: how software engineering has changed in the last 10-15 years; the future of software engineering, and the impact of AI; the importance of alignment and tactics for achieving it; how to think about and enable engineering productivity; lessons on culture from Adobe, Dropbox, and Flickr; concrete tips for being a better manager; and rituals for building business literacy throughout an org.

Let’s start with a reality I see in my own work: engineering teams are bigger than they were a decade ago, despite dramatically better tools and platforms. The reason isn’t inefficiency—it’s scope. Today’s products carry higher bars for reliability, privacy, security, compliance, and multi-surface experience. The coordination surface area has exploded. That’s why operating models must evolve: clear interfaces between teams, standardized decision-making, and reliable cross-functional rhythms are no longer nice-to-haves—they’re throughput constraints.

Alignment, then, is the ultimate speed multiplier. I’ve learned the hard way that slow teams are rarely under-skilled; they’re misaligned. “Slow teams are misaligned teams.” To counter this, I anchor on a few tactics: articulate a clear strategic narrative (why now, why us, why this), commit to outcomes vs output OKRs, and institutionalize decision logs so debates don’t reset every sprint. When teams know the customer problem, the business bet, and how their work ladders up, the flywheel starts turning.

On engineering productivity, I avoid vanity metrics and favor a portfolio: flow and focus (interruptions, WIP), system signals (lead time, deployment frequency, change fail rate), and outcome alignment (how progress maps to customer value and revenue impact). Tools matter—DX investment in CI/CD, observability, and paved roads—yet the largest gains usually come from simplifying priorities and reducing cross-team coupling. Fewer, better bets will beat “more tickets shipped” every time.

The future of software engineering is inseparable from AI. In my practice, I treat gen ai and gen ai for product prototyping as core accelerators: copilots for code and tests, scaffolding services that convert specs to boilerplate, and retrieval-augmented knowledge that collapses the gap between tribal lore and action. The key is to measure impact at the team level—cycle time, defect escape, and learning velocity—so AI augments engineering judgment rather than creating hidden complexity.

Culture is the compounding edge. Lessons on culture from Adobe, Dropbox, and Flickr converge on a few essentials: invest in psychological safety and clarity of purpose, operationalize blameless learning, and make information radically accessible. “How Complex Systems Fail, by Richard I. Cook, MD” is a touchstone here—complexity punishes organizations that rely on heroics and rewards those that build resilient systems and shared mental models.

For managers, I return to a short, durable list. Schedule real one-on-ones that prioritize coaching over status. Write more than you speak; clarity scales through documents. Run crisp, time-boxed decision forums with pre-reads and owners. Close the loop on feedback—especially in moments of disagreement—by documenting trade-offs and naming the decider. These concrete tips for being a better manager build trust, accelerate decisions, and enable autonomy.

Every high-performing engineering org I’ve led invests in business literacy as a first-class ritual. I recommend monthly “Finance 101” briefings, customer support ride-alongs, and deal reviews to connect engineers to revenue realities. Pair that with tactics and rituals for enabling effective teams—weekly written updates, demo-driven reviews, and pre-mortems—and you get sharper prioritization and far better cross-functional coordination.

Why so few companies successfully go multi-product? Most underinvest in platforms, shared services, and explicit funding models for internal APIs. The remedy: treat platforms as products with clear roadmaps, SLAs, and customer empathy; align incentives so teams don’t fork capabilities in the rush to ship; and adopt technical governance that favors standardization where it compounds and freedom where it differentiates.

For compensation and career architecture, I pressure-test common models by asking: does this design reward the behaviors we say we want? If we value outcomes, impact, and enabling others, the ladders should reflect it. When the incentives match the mission, the org learns faster and scales cleaner.

Referenced:

Adobe: https://www.adobe.com

Dropbox: https://www.dropbox.com/

Flickr: https://www.flickr.com/

Frame: https://www.frame.io/

How Complex Systems Fail, by Richard I. Cook, MD: https://how.complexsystems.fail/

How Etsy Grew their Number of Female Engineers by Almost 500% in One Year https://review.firstround.com/How-Etsy-Grew-their-Number-of-Female-Engineers-by-500-in-One-Year

Where to find Kellan Elliott-McCrea:

Twitter: https://www.twitter.com/kellan

LinkedIn: https://www.linkedin.com/in/kellanem

Website: https://kellanem.com/

Personal blog: https://laughingmeme.org/

My bottom line: if you want to supercharge your engineering org, anchor on alignment, measure what matters, and leverage AI to elevate—not replace—engineering judgment. Do that, and you’ll turn coordination costs into compounding advantages that show up in customer value, velocity, and morale.

October 21, 2025