I’ve learned the hard way that sample size calculators can be both empowering and deceptive. They feel wonderfully precise, but they’re only as trustworthy as the assumptions you feed them. When I lead A/B testing at scale, I treat the calculator as a planning tool, not a verdict—then I systematically validate the assumptions behind it so our decisions stay rigorous and our roadmap stays credible.
At a minimum, most calculators assume you know your baseline rate, your “minimum detectable effect (MDE),” your desired statistical power, and your significance level. They also quietly assume independent observations, clean randomization, stable traffic quality, and a fixed test horizon with no peeking. If any of those break, the “right” sample size can be wildly wrong—and the test conclusions can nudge teams toward the wrong product or go-to-market bet.
Baseline and variance come first for me. I estimate the baseline conversion (and volatility) from recent behavior using behavioral analytics, sanity-check it across key segments, and look for seasonality. Tools like Amplitude analytics help me spot anomalies, bots, or instrumentation drift. If baseline is unstable or highly skewed, I either stabilize it with longer lookbacks or narrow the target segment to reduce noise.
Setting the “minimum detectable effect (MDE)” is where product strategy meets statistics. I work backward from an outcome that actually matters: the revenue, retention, or activation uplift that justifies the opportunity cost of building and running the experiment. If that effect size is implausible given historic lift and variance, I rethink the scope or stack changes into a sequenced set of learning experiments rather than overpromising a single moonshot.
For power and alpha, I default to 80–90% power and a 5% significance level unless the downside risk of a false positive is unusually high, in which case I tighten alpha. I choose one-tailed tests only when we would not act on a negative result and we’ve explicitly pre-registered that decision; otherwise, two-tailed is safer for real-world ambiguity.
Randomization and independence are where many tests quietly fail. I randomize at the user level (not session or pageview), guard against cross-device contamination, and ensure consistent exposure via feature flags. If there’s shared context—say, team-based usage or geographic clustering—I account for it via cluster randomization or acknowledge the inflated variance it can introduce.
Traffic allocation integrity is non-negotiable. I monitor for sample ratio mismatch by comparing observed group splits to the intended allocation and immediately pause if they drift. When SRM appears, the root cause is often instrumentation gaps, eligibility filters applied asymmetrically, or caching layers. Fixing that early preserves trust in every test that follows.
Fixed-horizon math assumes no peeking. If stakeholders need continuous reads, I use sequential testing methods with alpha spending or always-valid approaches designed for ongoing monitoring. If we commit to a fixed horizon, we stay disciplined: no early looks, no midstream metric swaps, no retrofitted hypotheses.
Multiple comparisons can quietly inflate false positives. I predeclare one primary metric to decide, define guardrail metrics to protect experience and revenue, and apply appropriate corrections (for example, controlling the false discovery rate) when testing many variants or slicing results by numerous segments.
Duration and seasonality matter more than most roadmaps admit. I run through full business cycles (at least one complete week for daily patterns, longer for B2B buying rhythms), plan for novelty effects, and watch for behavior settling after initial exposure. If the intervention changes long-run behavior, I extend the measurement window or add a post-test holdout to capture durable impact.
Not all metrics are binomial. For revenue, time-on-task, or heavy-tailed distributions, I confirm variance assumptions, use robust estimators or bootstrapping, and consider variance reduction methods like CUPED to improve power without overextending duration. The calculator’s simplicity should not mask the data’s complexity.
Finally, I connect experimentation to product outcomes. I map hypotheses to a driver tree, ensure each test ladders to activation, retention, or monetization, and document assumptions up front so we learn even when results are null. The result is a culture that respects the math and moves faster precisely because we trust our reads.
Here’s the practical checklist I use before pressing “Start”: validate baseline and variance from recent behavior; set an MDE tied to meaningful business impact; choose power and alpha explicitly; confirm user-level randomization and stable exposure; watch for sample ratio mismatch; align on fixed-horizon vs sequential testing; predeclare a single primary metric and guardrails; run long enough to cover seasonality; use robust methods for non-binomial metrics; and write a brief pre-read so the whole team commits to the plan.
When we honor these assumptions, sample size calculators become sharp instruments rather than blunt ones. You’ll ship fewer misleading wins, avoid costly false negatives, and build a repeatable experimentation engine that compounds learning—and results—over time.
Inspired by this post on Amplitude – Perspectives.
“Continuous Discovery Habits” turns five this year, and I’m celebrating by reading it with our community—together, in practice, not just in theory. Each month, I’m publishing an in-depth reading guide with the chapters we’ll cover, a preview of the most important concepts, short videos you can share with your teams, individual and team discussion questions, practical exercises to apply what you read, and additional resources to go deeper.
We’ll keep the conversation active in the comments each month and meet live once a quarter to compare notes, share what’s working, and troubleshoot what’s not. If you’re joining late, no problem—start with the current month or go back to January. You can also find all of the book club articles here.
If you want to participate, grab a copy of the book (or dust off your old one), share the “Spread the Love” videos with colleagues, block time for the team exercises, and register for the community sessions. Let’s dive in together.
This chapter grounds us in why interviewing on a regular cadence is critical to the success of any product trio; how cognitive biases affect what we learn from direct questions; the difference between research questions and interview questions; how to use story-based interviewing to uncover actual customer behavior (not ideal behavior); the interview snapshot, a one-page tool for synthesizing what you learned from a single interview; how to automate the recruiting process so interviewing becomes easier than not interviewing; and why product trios should interview customers together.
Need a copy? Grab the book.
Share the Love with Friends and Colleagues
We learn best in community. To help your team rally around these practices, share these concise primers and invite them to join the book club discussion with you.
What are customer interviews? – Build a competitive advantage that compounds over time.
What should we ask in customer interviews? – Mitigating cognitive biases.
Research questions vs. interview questions – And why the difference matters.
Getting reliable feedback from customer interviews – Ask the right questions.
Who should conduct customer interviews? – My answer might surprise you.
How do you find customers to interview? – Automate the recruiting process.
The Interview Snapshot – How to synthesize a single customer interview.
Reflect and Discuss What You Read
Reflection cements learning. This month, I’m challenging you—as I challenge my own teams—to build a weekly habit of interviewing customers and to shift from direct questions (which trigger bias) to collecting specific stories about past behavior. For many teams, this is a big mindset change: from infrequent “big research projects” to lightweight, continuous conversations that fuel daily decision-making.
Individual Reflection: Think about your last customer interview or conversation. Did you rely on direct questions, or did you excavate a specific story about what happened? How might the answers have changed if you had used the other approach?
Consider your own behavior—buying jeans, going to the gym, choosing what to watch on Netflix. Where do your ideal intentions differ from what you actually do? How might that same gap show up in your customers’ answers to direct questions?
Scan your calendar from the past month. How many customer interviews did you conduct? If it’s fewer than four, what got in the way? What needs to change to make weekly interviewing sustainable?
Team Discussion: As a team, discuss your current interview cadence. If you’re not interviewing at least weekly, name the biggest obstacle—recruiting, time, or synthesis—and commit to reducing one barrier this month.
Try this together: Ask a teammate, “How does a product idea go from concept to launch at our company?” Have them write it down. Then ask for the last specific feature or improvement that launched and capture the story. Compare the two. What’s different? What does this reveal about the gap between ideal process and actual process?
If you already interview regularly, ask: Who participates? Is it just one person (like the designer or product manager), or does the whole trio join? What value might you be missing by not having all three perspectives in the room?
Put It Into Practice
Understanding the “why” is easy; building the habit is the work. The following exercises are how my teams operationalize continuous interviewing week over week.
Exercise: Conduct a Story-Based Interview (Time: 20–30 minutes. Do this with your product trio.) Schedule a conversation with a current customer. Instead of drafting a long script, identify a handful of research questions (what you need to learn) and translate them into one story-based interview question (what you’ll ask).
For example, research questions might include: What challenges do customers face when onboarding? Where do they get stuck? What are we asking them to do that they don’t understand? How can we make it easier for them to get to the activation moment? The corresponding interview question could be: Tell me about the first time you used our product.
During the interview, excavate the story with temporal prompts like “What happened first?”, “What happened next?”, and “What happened before that?” If the participant drifts into generalities (“I usually…” or “In general…”), gently bring them back to the specific instance.
After the interview, debrief as a trio. What did each of you hear? Which opportunities surfaced? What surprised you? If you want personalized, detailed feedback on your technique, consider the Interview Coach available through the Story-Based Customer Interviews course.
Exercise: Create Your First Interview Snapshot (Time: 30 minutes. Do this with your product trio immediately after the interview.) Using the interview snapshot template, capture a photo of the participant (or a visual that represents their story), quick facts about their context, a memorable quote you’ll still recall months from now, the opportunities (needs, pain points, desires) you heard, notable insights that aren’t yet opportunities, and an experience map that illustrates the story. Over time, aim to complete each snapshot in 15–20 minutes.
Go Deeper: Additional Reading
If you prefer audio, I’ve included an audio summary for paid subscribers that covers this month’s chapter plus the resources below.
Related In-Depth Guides: Customer Interviews: How to Recruit, What to Ask, and How to Synthesize What You Learn.
The Value of Continuous Interviewing: Why Product Trios Should Interview Customers Together – How interviewing together ensures research is timely, actionable, and believable.
How to Find Customers to Talk To: Customer Recruiting: Get Easy Access to Customers Week Over Week – Practical strategies for automating your recruiting process. Ask Teresa: How Do You Select Customers for Customer Interviews? – Who to interview and how to recruit them. Tools of the Trade: Finding People to Interview Before You Have Customers – Recruiting strategies for early-stage products.
What to Ask in Your Interviews: Why You Are Asking the Wrong Customer Interview Questions – Understanding the gap between ideal behavior and actual behavior. Story-Based Customer Interviews Uncover Much-Needed Context – Why collecting specific stories is more reliable than asking direct questions. Ask Teresa: What Are the Best Customer Interview Questions? – Common questions and how to improve them. Ask About the Past Rather than the Future – Why memories about recent instances are more reliable than speculation.
How to Take Notes and Synthesize What You Are Learning: How to Take Notes During Customer Research Interviews – Practical tips for capturing what you hear. The Interview Snapshot: How to Synthesize and Share What You Learned from a Single Customer Interview – A comprehensive guide to creating and using interview snapshots. Customer Interview Analysis: How AI Helps and Hurts – Learn how to use AI effectively.
Videos: All Things Product Podcast: Customer Interview Analysis – Petra and I discuss using AI to analyze customer interviews, the risks and benefits, and why your interviewing skills matter more than any AI tool.
Other Resources from Around the Web: The Top 5 Mistakes Product Teams Make With Customer Interviews by Pragmatic Live. Continuous interviewing with Kristian Collin Berge (CEO & Co-founder at UX Signals) by Afonso Franco. How to Make Time for Customer Interviews & Validation by Rich Mironov. Brave UX: An interview with Teresa Torres by Brendan Jarvis.
Related Courses: Customer Recruiting for Continuous Discovery – Get easy access to customers week over week. Story-Based Customer Interviews – Collect reliable feedback from every customer conversation.
Our Live Discussion Schedule
Our live discussion sessions are for paid subscribers. Sessions are not recorded. Invitations will go out to members two weeks before each event—add these to your calendar now: Tuesday, June 16, 2026: 9am–10am PDT. Thursday, September 17, 2026: 9am–10am PDT. Wednesday, December 16, 2026: 9am–10am PST.
Audio Summary
This summary was produced by NotebookLM. The sources supplied were the book chapters as well as all of the additional reading.
This article is part of the CDH Book Club celebrating the five-year anniversary of Continuous Discovery Habits.
Disruption is the only sustainable strategy in product. When a platform meaningfully changes how we build and operate, I pay attention—not just as a product leader, but as someone accountable for turning AI Strategy into durable competitive differentiation. That’s why the launch of the Fin API platform stands out: it’s a concrete step toward agentic AI at enterprise scale.
Today, I’m diving into what this launch includes, why it matters for product strategy, and how I’d navigate the build vs buy decision in this new landscape. My goal is to translate the announcement into actionable guidance for product teams, CX leaders, and forward-deployed engineers who are building the next generation of customer support and product-led experiences.
Fin is a customer agent platform that at present resolves over 2M customer issues a week, growing at a rapid exponential pace. It’s relied on by the best brands, large and small, in every vertical you can imagine. From Atlassian and Riot Games, to smaller hot upstarts like Mercury and Polymarket. It runs on a family of models trained by its AI group. Last week, they announced Apex, which is the world’s first specialized customer service LLM. In production tests over the last 6 months, it beat every single frontier model, including those from Anthropic and OpenAI, on resolution rate, latency, hallucination rate, and cost.
With this launch, teams can access the platform’s core capabilities and underlying models directly via API, with contracts starting at $250k per year, and usage rates that are by far the cheapest in the industry for each of the model’s subcategories. For leaders evaluating total cost of ownership, this is a meaningful data point: it shifts the economics of scaled automation from experimental to operational.
Why now? Because builders want options. I hear from teams daily that want to design their own agents, tune prompts and policies, and integrate with bespoke CRMs, data lakes, and product surfaces. The Fin announcement meets that demand with three clear build-paths, each mapping to a different operating model and maturity stage.
First, for the vast majority of companies, the Fin Agent Platform is the pragmatic starting point. Fin reports ~8k companies on it today. It addresses 99% of customer needs out of the box—without exhausting consulting engagements—while delivering top-tier resolution rates. If your priority is time-to-value, governance, and platform scalability, this route de-risks implementation and accelerates outcomes.
Second, for teams that need custom surfaces or channels, the Fin Agent API lets you present Fin in unique contexts. You get the Fin platform’s orchestration and controls, but you’re free to bypass the default messenger, email, voice, or any prebuilt channel and embed the agent natively in your product. I see this as the sweet spot for product-led growth motions where conversation design and UX writing are strategic levers.
Third, for companies building hyper-specific agents—think service plus in-product actions—the new API access to Apex and the broader collection of models is the obvious move. Unlike generalized models, these are purpose-trained for customer service scenarios and operational policies. If you have strong in-house solutions engineering, a retrieval-first pipeline, and eval-driven development in place, this path maximizes control without reinventing the model layer.
This also opens the door for vertical specialists. Fin-like businesses focused on deep domains can emerge quickly—Fin for dentists? Why not? Fin for car dealerships? Sure. I expect startups and modern CX providers (including players like Decagon and Sierra) to carve out niches where domain data, workflows, and compliance are the real moats. That’s where differentiated AI beats generic capability.
There’s a defensive reason to pay attention here. The software landscape is shifting fast: the moat is no longer feature parity—it’s the quality of your agents and the data flywheels powering them. Building software is simply less hard now, and I’ve watched engineering teams more than double measurable productivity as they adopt AI-assisted development. The implication is clear: the interface-and-features era is giving way to an agents-and-outcomes era.
Serious software companies must evolve from being a features company to an agents company—and build those agents on differentiated AI. More value will accrue at the model and orchestration layers, where safety, latency, cost, and resolution quality are won. That puts a premium on prompt engineering discipline, policy routing, continuous discovery of edge cases, and rigorous offline/online evals to keep hallucination rates low while maintaining speed.
How would I choose among the three build-paths? If you’re early or resource-constrained, start with the Fin Agent Platform to validate outcomes and align stakeholders. If you need branded experiences and tighter product integration, use the Fin Agent API to control surfaces without owning the heavy lifting. If you have strong ML ops and a mature customer support ai strategy, go model-level with Apex and companions, layering in your own guardrails, context window management, and test harnesses. In each case, balance velocity, control, and risk—your build vs buy decision should be grounded in clear metrics and an explicit product strategy.
Where does this lead? We’ll see more companies expose specialized model families with clearer economics and stronger governance. For now, I’m excited to see what teams build with the Fin API platform—and how they turn agentic AI into measurable improvements in resolution rate, CSAT, cost-to-serve, and ultimately, customer loyalty.
I’m continually asked how machine learning can make product analytics more actionable. Drawing from Amplitude analytics in real-world settings, I’ve distilled what matters most for product teams that want faster, smarter decisions without sacrificing rigor.
When I design experiments, I start with minimum detectable effect (MDE) to size samples correctly and avoid costly, inconclusive tests. I pair that with disciplined A/B testing hygiene—clear hypotheses, thoughtful stop rules, and guardrails for key metrics—so results translate into credible product strategy choices instead of noisy dashboards.
For growth and retention, I map behavioral analytics to activation and long-term value. Driver trees help me connect feature adoption to revenue or retention, and anomaly detection keeps me from overreacting to outliers when seasonality or data quality shift.
I segment cohorts by user intent and lifecycle stage, measure user activation with crisp event definitions, and monitor leading indicators across a unified analytics platform. This keeps cross-functional conversations grounded, accelerates product-led growth, and reduces the risk of optimizing for vanity metrics.
Operationally, that means building self-serve views that flag MDE-ready experiments, surface retention analysis by cohort, and trigger anomaly detection alerts only when the signal outpaces noise. The payoff is fewer meetings debating data quality and more time shipping value.
If you’re leveling up your analytics stack, start by tightening experimentation basics, instrumenting activation and retention with behavioral analytics, and wiring in anomaly detection as a safety net. You won’t just move faster—you’ll learn faster, and with the confidence to bet big when the data earns your trust.
Inspired by this post on Amplitude – Perspectives.
“Is product management dead?” I hear this question at almost every conference hallway chat. After listening to the latest Product Builders – All Things Product Podcast with Teresa Torres & Petra Wille, I’m more convinced than ever: product management isn’t dead—it’s evolving fast, and the leaders will be those who embrace the shift.
Listen to this episode on: Spotify | Apple Podcasts
The core take resonated deeply with my day-to-day at HighLevel: product management isn’t dying—“the traditional product trio (PM, design, engineering) is collapsing into something new.” The center of gravity is shifting from swim lanes to outcomes, from rigid handoffs to fluid collaboration, and from role definitions to capabilities that actually ship value.
AI is raising the baseline across the board. That “80/20 shift: AI handles patterns, humans handle hard problems” is real on my teams. With LLMs like “GPT 5.2” and “Opus 4.5,” coding agents such as “Claude Code” and “Codex,” and tools like “Replit” and “Lovable,” we’re compressing cycle time on the repeatable 80%. The bottleneck is no longer typing code or drafting copy—it’s selecting the right problems, crafting sharp product strategy, and making confident trade-offs.
This is why the future belongs to “product builders” — people with a shared foundation across disciplines and deep expertise in one area. I look for teams that can shape, prototype, validate, and iterate in tight loops, blending continuous discovery with empowered product teams. The baseline expands, the craft deepens.
Functional expertise still matters—more than ever—because the hard parts are getting harder. We need leaders who can weigh platform scalability against time-to-value, protect privacy-by-design, apply AI risk management, and navigate data governance while sustaining product-market fit. When AI accelerates execution, judgment becomes the differentiator.
For leaders, this creates a clear mandate: “What product leaders must do to create safe AI infrastructure.” In practice, that means building guardrails early—security reviews tailored to AI workflows, QA harnesses that include eval-driven development, model performance observability, and human-in-the-loop review systems. You can’t bolt this on later without paying a tax in velocity and trust.
Hiring signals are already shifting. “How job descriptions and hiring expectations are already shifting” shows up in my reqs: we emphasize cross-functional range, fluency with AI workflows, prompt engineering literacy, and the ability to frame measurable outcomes. We still want craft depth—design systems, systems thinking in engineering, rigorous discovery—but we prize people who move seamlessly from discovery to delivery.
In the episode, I appreciated the crisp framing of why product management isn’t dying—but changing. The rise of the “product builder” foundation reframes team topology and unlocks smaller, more cross-functional squads. AI changes the baseline skill set across product teams, and ignoring it is a career risk. If you’re not learning AI tools, you’re falling behind.
My key takeaways were straightforward and actionable. Smaller, more cross-functional teams are likely. Deep expertise still matters—especially for complex trade-offs. Leaders need guardrails: security, QA, and review systems built for an AI-driven workflow. And if you work in product, design, or engineering, this episode is your signal to start upskilling now.
“The risk of ignoring AI in your craft” is not hypothetical. I encourage PMs to carve out weekly lab time for hands-on experiments with LLMs for product managers, build lightweight prototypes with Replit or Lovable, and pressure-test opportunity solution trees with data-informed discovery. Pair with your engineers on agentic AI use cases, and integrate model evals into your CI/CD pipelines.
“Mentioned in the episode” were several resources worth exploring: “Product at Heart” (June, Hamburg), “Replit,” “Lovable,” “Every,” “Petra’s Coaching Packages,” and “coding agents (Claude Code, Codex) and LLMs (GPT 5.2, Opus 4.5).” These are great jumping-off points for your own product builder toolkit.
My recommendation: queue up the episode on your commute, then pick one workflow to augment with AI before the week ends. Replace a handoff with a shared canvas. Automate a repetitive analysis. Ship a scrappy prototype. Momentum compounds.
Have thoughts on this episode? Leave a comment below. I’d love to hear how your teams are evolving your product trios, what AI workflows are sticking, and where governance has been most challenging.
I just watched one of the most significant leaps in customer service AI in years. Last week, a quiet but seismic release landed in CX: Fin introduced Apex, a vertical model purpose-built for support that raises the bar on speed, accuracy, and cost. As a product leader, this is exactly the kind of breakthrough that changes roadmaps, vendor strategies, and what customers can expect from modern service operations.
It’s a brand new model for Fin called Apex, and it’s objectively the highest performing, fastest, and cheapest model for customer service. It beats the very best models in the industry including GPT-5.4 and Opus 4.5.
In this analysis, I’ll unpack why the launch matters for the customer service agent category, what it signals for frontier labs and open‑weight ecosystems, and how leaders should rethink their AI Strategy, build vs buy decisions, and eval-driven development roadmaps.
Fin was already the highest performing and most sophisticated agent in the customer service space, consistently beating impressive competitors like Decagon and Sierra at an average win rate in the 70s. It operates at tremendous scale, now resolving almost 2M customer issues per week, a number that’s growing at an exponential clip. In its short life it’s grown to nearly $100M in recurring revenue.
As of last week, ~100% of all (English language, chat and email) customer conversations are now running on Apex. Since day 1, the Fin engine has comprised a system of models, and last year the team began replacing off‑the‑shelf models with custom ones trained on proprietary data. The core answering model had been a frontier labs offering—initially versions of GPT and more recently Sonnet 4.0. Now, that core answering model is Apex 1.0.
This model resolves customer issues at a materially higher rate than any other model available. One of their largest customers in the gaming space saw the resolution rate improve overnight from 68% to 75% (i.e. a reduction in unresolved conversations of 22%). The team notes they had never seen a jump this large from a single improvement since they started Fin.
Just as important, it’s dramatically faster, has fewer hallucinations, and is far cheaper than other available models—exactly the attributes operations leaders weigh most when deploying agents at scale. In practice, these are the levers that unlock higher CSAT, tighter SLAs, and better unit economics.
Achieving all three simultaneously is extraordinarily hard. Credit goes to foundational research from a 60‑person AI group run by Fergal Reid, and, crucially, to domain‑specific proprietary evals drawn from billions of human and agent interactions produced by the Fin resolution engine—already hand‑tuned to be the most effective in the category. That creates a flywheel: an eval‑driven development loop that trains models to keep improving at the edge of the system’s abilities. In other words, Apex 1.0 looks like the tip of the iceberg.
Zooming out, service is one of the few categories where generative AI has already delivered commercial impact at scale (alongside coding, and arguably the legal industry). With TAMs measured in the hundreds of billions, competition is intense and well capitalized. The pattern I’ve seen repeatedly is clear: winners in these spaces must become full‑stack AI companies. As features become ~free to build, durable competitive differentiation shifts under the hood—to proprietary data, post‑training, inference efficiency, and the quality of the eval loop.
Fin Apex raises the bar for finance-ready AI, highlighting a -65% cut in hallucinations and a quicker first token at 3.7s (0.6s faster), compared with Sonnet 4.6, Opus 4.5, and GPT-5.4 in side-by-side charts.
That’s why competitors will need to release their own models. Many appear to be just starting to hire the talent to do so, which likely gives Fin at least a year of head start. For product leaders, this is a strong signal to revisit build vs buy assumptions, and to quantify when owning your post‑training pipeline and evals becomes the rational move.
Honestly, 2–3 years ago I expected AI application differentiation to live mostly in what we built around third‑party models. The AI game humbles all of us; today it’s obvious that vertical models paired with proprietary evals create compounding moats.
In a podcast interview last week, Andrej Karpathy said:
"I do think we should expect more speciation in the intelligences. The animal kingdom is extremely [diverse] in the brains that exist. And there’s lots of different niches of nature… And I think we should be able to see more speciation. And you don’t need this oracle that knows everything. You kind of speciate it. And then you put it on a specific task. And we should be seeing some of that because you should be able to have much smaller models that still have the cognitive core."
The frontier labs still have the very best models, but open‑weight models aren’t far behind—making pre‑training look increasingly like a commodity. The frontier is moving to post‑training, which is precisely what we see with Apex (and Cursor’s Composer 2), and what we should expect to dominate going forward.
Labs now face a dual reality. On one hand, horizontal general‑purpose models can over‑serve specific verticals (e.g., customer service doesn’t need an oracle that knows everything). On the other, open‑weight models are good enough that high‑quality, domain‑specific post‑training can produce superior models for special‑purpose jobs—and in the ways that matter for those jobs. In service, soft factors like judgement, pleasantness, and attentiveness matter alongside hard factors like resolution effectiveness, speed, and cost.
I’m still bullish on the labs. Many organizations remain heavy customers of Anthropic—whether as part of multi‑model systems or through deep usage of Claude Code in engineering teams (see this example of Claude Code adoption). Yet classic disruption (à la the late, great Clay Christensen) is now at their door. The way out is to disrupt themselves by building cheaper specialized models too, which likely requires acquiring the evals—or the companies with the evals—needed for each task. Expect creative data partnerships, M&A consolidation, and a wave of hyper‑specific model providers that compete head‑to‑head with the labs.
In the meantime, Fin appears to be the only vendor in its space with a custom model that’s also objectively superior to everything else out there. I’m excited to see it deployed broadly for end customers, and I’m watching closely for the next announcement that will accelerate that rollout. For product leaders, the message is clear: the age of vertical models and agentic AI is here—bring your evals, or bring your checkbook.
Making the leap from engineer to CEO demands an almost entirely new skillset. I’ve felt that jolt firsthand: the tools that serve you as an IC or even a product leader—system design, crisp PRDs, elegant roadmaps—only get you about 20% of the way. The rest is learning to orchestrate go-to-market strategy, finance, hiring, culture, and product positioning with just enough depth to make sound, fast decisions while empowering true experts to execute.
My operating heuristic is the 80% rule. As CEO or GM, I don’t need to be the best marketer, seller, or finance leader; I need to understand 80% of each function well enough to set a compelling product strategy, ask the right questions, and catch the second-order effects. That breadth unlocks speed, quality of judgment, and the conviction to say no when the organization is tempted by what it can do rather than what it must do.
The clearest illustration comes from the journey that turned Apache Kafka—originally built at LinkedIn—into Confluent, a publicly traded enterprise software company. The technical insight was powerful, but the real lift came from translating that insight into a repeatable go-to-market engine. That required building new muscles: founder-led GTM, enterprise sales orchestration, and open source monetization without alienating the community that fueled adoption.
Early on, the product was “embarrassing” by enterprise standards—thin features, sharp edges, and a long tail of operational gaps. Shipping anyway was the point. A thin vertical slice into the market created learning loops with real customers, not hypotheticals. That uncomfortable speed became a superpower, especially when the company decided to push toward a cloud-first business in the face of widespread opposition.
The messaging challenge was just as hard as the technical one. Most marketing fails because it starts with what we built, not what customers must achieve. A simple product marketing pyramid—vision at the top, category framing and points of parity in the middle, crisp value props and proof at the base—helped explain Kafka to the world in customer language. When the narrative snaps into place, adoption accelerates. In Kafka’s case, one well-timed blog post clarified the “why now” and unlocked a step-change in community and enterprise pull.
There’s a pivotal distinction leaders underestimate: the gap between what a company can do and what it must do. I use a must-do filter before every planning cycle: What moves are non-discretionary for durable product-market fit? For Kafka and Confluent, that meant ruthless prioritization on managed cloud services, reliability, and platform scalability—even when it jeopardized short-term revenue or required retooling how engineering, sales, and support worked.
Fundraising strategy mirrored this clarity. Planning to raise before building the full product wasn’t about hype; it was about matching capital to the physics of the problem. If your category requires enterprise credibility, global infrastructure, and 24/7 SRE, you finance those table stakes early. That’s first principles decision making: instrument the constraints, then design the sequence that gets you to scale with the fewest irreversible mistakes.
In the early years, every product decision felt like a trade between polish and learning. The team essentially bludgeoned its way into a cloud-first posture—less because the initial product was ready, and more because the market’s must-do was obvious. That’s the essence of founder-led GTM: get into the field, close lighthouse customers, and use their arcs to shape the roadmap. It’s also where open source monetization matures from downloads into durable, enterprise value.
As the organization scales, excellence often erodes—the Chipotle problem. Process hardens; quality blurs; the magic decays. The antidotes are simple but hard: a few non-negotiable product quality bars, a short set of product-market fit metrics that everyone can recite, and empowered product teams who own outcomes over output. This is where organizational development matters as much as code: design clear interfaces between product, sales, and success, and you’ll keep velocity without losing standards.
Contrary to popular lore, founder optimism is overrated. Constructive realism wins. I try to model “probabilistic optimism”: assume we will win, but instrument the journey like an SRE runs an incident. Set leading indicators, rehearse failure modes, and make pre-commitments to the must-do path so you’re not swayed by the latest anecdote. It keeps the team out of a failure mindset while making room for rigorous course correction.
Giving up the right things at the right time is a CEO superpower. As complexity grows, I hand off decisions that benefit from specialization and keep only those tied to company narrative, must-do prioritization, and talent bar. CEO time management becomes a portfolio problem: ensure each week contains deep product time, frontline customer exposure, and one compounding systems fix (hiring loop, pricing rubric, or GTM enablement) that pays back for quarters.
If you’re moving from IC or PM into a GM/CEO role, here’s a practical playbook: build your product marketing pyramid; write the one-page must-do memo for the next six quarters; ship a narrow, managed cloud slice early; pick three product-market fit metrics (usage, time-to-value, retention) and publish them company-wide; and architect an enablement engine that turns field learnings into roadmap changes within one quarter. That’s how you transform technical advantage into a category-defining business.
The Kafka-to-Confluent arc reminds me that technology can open a door—but clarity of narrative, sequencing, and must-do focus determines whether you walk through it. When in doubt, bias toward shipping, talking to customers, and tightening the loop between what you learn and what you build. That’s the work of product management leadership at scale.
I’ve spent my career building and scaling product platforms, and I’ve seen firsthand how the right AI Strategy can unlock disproportionate impact. Foundational AI platforms are the engine room of modern analytics—when they’re done well, they compress time-to-insight, improve quality, and empower empowered product teams to deliver outcomes that matter.
Across leading analytics ecosystems, including Amplitude analytics, the winning pattern is consistent: invest in a unified analytics platform that abstracts complexity while enabling rapid iteration. By standardizing data governance and privacy-by-design, teams gain the freedom to experiment confidently without sacrificing compliance or security.
For me, “foundational AI platforms” means pragmatic building blocks that product and engineering can trust: evaluation harnesses for models, retrieval pipelines that surface the right context, feature stores that ensure consistency, and CI/CD with robust observability. When these AI workflows are in place, behavioral analytics, anomaly detection, and A/B testing stop being one-off projects and become repeatable capabilities.
The payoff isn’t just efficiency—it’s strategic differentiation. Internal innovation accelerates when teams can go from idea to live experiment in days, not quarters. That speed shapes the future of AI analytics: richer insights woven directly into product experiences, LLMs for product managers to prototype faster, and analytics that feel conversational, contextual, and deeply actionable.
Execution still makes or breaks the vision. I align product strategy around outcomes vs output OKRs, pair product trios with forward-deployed engineers, and use a clear build vs buy rubric for platform components. The goal is platform scalability without reinventing the wheel—own the parts that differentiate, integrate the rest, and keep your interfaces painfully simple.
If you’re leading this journey, start by mapping your critical use cases to platform capabilities, close gaps in data governance, and stand up an eval-driven development loop. Within one or two quarters, you should see a measurable lift in deployment frequency, a sharper signal on performance, and a culture that ships with confidence. That’s how foundational AI platforms empower internal innovation and help define the future of AI analytics.
Inspired by this post on Amplitude – Best Practices.
What if AI could help reduce the 10-plus years it takes to get a new drug to market? That question has shaped much of my own product strategy thinking, and it’s exactly why I was drawn to Medable’s bold move with Agent Studio. It’s a rare look inside an enterprise AI platform built for one of the most regulated industries in the world—and a team that’s still figuring it out in real time.
In this episode of Just Now Possible, Teresa Torres talks with four members of the Medable team: Luke Bates (Product Leader, Agent Studio), Jen Brown (Product Manager), Matt Schoolfield (Product Designer), and Fiachra Matthews (Principal Architect). Listening through a product management lens, I focused on how their choices reflect a modern agentic AI strategy that balances speed, safety, and scale.
Medable does something uniquely hard: enabling global clinical trials across 100+ languages and accelerating drug-to-market timelines. That scope demands more than clever prompts—it requires a durable platform approach. Their answer is Agent Studio, a no-code/low-code platform for configuring and deploying agents across the clinical trial lifecycle.
What impressed me most was how clearly the platform’s primitives map to repeatable value: models, skills, knowledge bases, MCP connectors, versioning, and trigger types. In my experience, platforms win when these building blocks are composable, governed, and observable—exactly the direction Medable is taking.
You’ll also hear about the two agents they’ve built on top of it: an ETMF agent that automates document classification across 80,000-plus documents per year, and a CRA agent that monitors patient safety and data quality across 13 different clinical systems. For a domain where errors carry real human consequences, this is the right mix of automation and oversight.
Under the hood, their architecture choices echo what I’ve seen work in other high-stakes environments. They walk through RAG approaches at scale: embeddings vs. markdown hierarchies vs. just-in-time MCP retrieval, and explain Why they built custom MCPs with an authentication and credentialing wrapper. They also detail Context window management with sub-agents and automatic tool filtering—critical to keep agents focused and reliable as complexity grows.
Data alignment is often the unsung hero of agent reliability. I appreciated how they described How they built a unified ontology layer to map terminology across 13 different clinical data systems. Equally important, they show their paper trail: How they document agent intent → specification → test evidence to satisfy regulatory bodies. In a GXP context, this kind of lineage isn’t “nice to have”—it’s the price of admission.
Discover how Medable's Agent Studio reimagines clinical operations, shrinking drug-to-market timelines from a decade to a year with no-code agents, automated eTMF document classification, unified data monitoring, and human-in-the-loop validation.
Strategically, I love that Medable chose a platform approach to agents instead of one-off builds. They outline Three deployment models: Medable-built products, services-led custom builds, and self-serve platform access. This mirrors a healthy platform business model: prove value with first-party solutions, extend via services for complex needs, and unlock scale with self-serve—while keeping governance centralized.
Reliability is a theme throughout. They describe Evaluation design in a GXP-regulated environment: golden datasets, production monitoring, and the challenge of human feedback as ground truth. We also get a concrete picture of what human-in-the-loop really looks like when clinical decisions are on the line—tight feedback cycles, auditable interventions, and clear escalation paths.
Looking forward, they don’t shy away from ambition. The "full self-driving" vision for clinical trials and what it would take to get there is both provocative and grounded. My read: the path runs through stronger domain ontologies, standardized interfaces (MCP done right), eval-driven development, and relentless simplification of agent skills.
If you’re a product leader building in regulated spaces, this discussion is a masterclass in balancing innovation with compliance. The takeaways map cleanly to AI Strategy: define platform primitives, invest in retrieval-first pipeline patterns, design for context window management, lean into eval-driven development, and operationalize regulatory compliance from day one.
To dive deeper, listen to the conversation on Spotify or Apple Podcasts, and explore Medable’s broader platform work at medable.com. I left both inspired and practically equipped—an uncommon combo in today’s AI noise.
I keep a running list of product wisdom that sounds great on a slide but quietly sabotages execution. Recently, I revisited that list after a deep conversation with a seasoned CPO from a leading security and compliance platform and reflected on how these lessons show up in my own operating rhythm. What follows is my practical playbook for scaling product organizations without losing speed, quality, or the soul of the product.
Most big-tech veterans struggle when they leap into startups because the safety net of process disappears. At a startup, the buck truly stops with you—there’s no committee to shield a decision and no process to rescue a weak plan. The mindset shift is simple to say and hard to do: own outcomes end to end, reduce your reliance on institutional scaffolding, and make decisions with incomplete information while keeping standards high.
“Great product leaders stay in the details.” I sample artifacts every week—PRDs, design flows, user research notes, postmortems—and I read customer threads to calibrate my intuition. To maintain shipping velocity as headcount grows, I instrument a few critical indicators (deployment frequency, change failure rate) and favor outcomes over output. Data guides my attention; it never replaces judgment.
As teams scale, I use a blunt rule to keep speed high: small autonomous teams, small batch sizes, short feedback loops. One clear owner, one prioritized backlog, and weekly demos to customers. We ship thin slices, not big bangs. And “Great CPOs should avoid comfort metrics”—the easy dashboards that rise when nothing meaningful is moving. I push for outcome-centric OKRs tied to customer value, not vanity charts.
Rigid hierarchies derail quality decision-making. They slow signal, encourage escalation theater, and suppress the truth from the edges. I shorten paths between PMs, engineers, designers, research, and go-to-market leads, and I strip out stage gates that don’t add learning. Above all, I refuse to “Stop making your team fetch rocks”—randomized executive requests without context. Instead, I frame clear problem statements, explicit constraints, and observable success criteria.
Revenue and product can feel at odds, but they don’t have to be. The key to a quality CPO and CRO relationship is a shared operating model: one customer narrative, a joint pipeline of problems worth solving, and a common scorecard. We meet weekly, review the same signals, and align on sequencing: what we solve now for impact, what we stage for scale, and what we sunset to reduce complexity. When trade-offs get tough, we anchor on customer value and long-term defensibility.
Who ultimately oversees the quality bar? I do—and I do it through clarity, exemplars, and consistent feedback loops, not micromanagement. When I leave feedback, I make it actionable and specific: name the user scenario, note the friction, propose a sharper decision frame, and suggest a smaller, testable slice. I expect narrative memos and crisp acceptance criteria; I offer rapid, detailed responses so momentum never stalls.
Open office hours are my forcing function for transparency and speed. Anyone can bring a thorny escalation, a design in progress, or a customer insight. Pair that with weekly 1:1s—non-negotiable for developing leaders and unblocking work—and the organization learns to surface issues early, make faster decisions, and self-correct without drama.
Here’s a glimpse into my working week: Mondays set priorities and confirm the few decisions that matter; midweek is for deep reviews across roadmap, research, and engineering readiness; Thursdays I’m with customers and partners; Fridays I write and synthesize. I leave space for unscripted time with individual contributors—because ICs are the unsung heroes of a company—and I celebrate excellent craft out loud.
The hardest leadership skill is knowing when to push and when to give space. I push on clarity, sequencing, and quality; I give space on solutions and implementation paths. I reject comfort metrics, reinforce outcomes vs. output, and keep the organization close to customers and details. If you’re stepping from big tech into a startup or scaling your product org through rapid growth, these practices will help you ship faster, decide better, and raise the quality bar without burning out your team.
“Outcomes over outputs” is the right mantra—and one I’ve championed across product teams—but turning it into daily practice is where most teams stumble.
It’s simple in theory: focus on the impact of what we build, not just shipping features. In reality, it’s rarely black and white because most teams are asked to do both—hit outcomes and deliver specific outputs—at the same time.
In a benchmark survey, 20% of product teams claim to be outcome-focused, nearly half describe themselves as working in a mix of outcomes and outputs, and about 30% are still primarily working with outputs. I’ve seen versions of this in my own org: we aspire to outcomes, but our rituals, roadmaps, and reporting still reward shipping.
Here’s how I draw the line clearly, coach my teams to avoid common traps, and negotiate better, more actionable outcomes that unlock genuine product discovery and business results.
Simple definitions we live by
An output is something you build or produce—a feature, a project, an initiative. It’s something your team ships.
An outcome is the impact of that output—a change in customer behavior or a business result.
Josh Seiden puts it well in his book Outcomes Over Output: “An outcome is a change in human behavior that drives business results.”
Shift from shipping to shaping results. This graphic clarifies outputs vs outcomes, revealing that value emerges between deliverables and impact—when features change customer behavior and move business results.
I distinguish business outcomes from product outcomes. Business outcomes are typically financial metrics that measure the health of the business (e.g. increase revenue or reduce costs) while product outcomes measure a customer behavior in the product or a sentiment about the product.
Here’s a simple example I’ve used with platform teams. Many B2B companies support a number of integrations. Integrations are outputs. Having integrations alone doesn’t create value. Customers using and finding value in those integrations—that’s an outcome. If those customers retain their subscriptions longer because of the integrations—that’s also an outcome.
Building something isn’t the same as creating value. That’s the core of this distinction, and it’s what separates empowered product teams from feature factories.
Why this distinction matters for empowered product teams
When we task teams with delivering outputs, they’re done when the software ships. When we task teams with delivering outcomes, they aren’t done until the software ships and has the expected impact.
That small shift changes almost everything about how a team works: what we measure (impact, not just delivery), how we know we’re done (measurable behavior change, not release notes), the autonomy we grant (told what to achieve, not what to build), and the planning artifacts we use (an opportunity solution tree beats a feature roadmap when we’re exploring the best path to an outcome).
When I assign outcomes, I’m giving the team latitude—and responsibility—to figure out the best path to success. That’s what opens the door for real product discovery and continuous discovery habits.
Shift your lens from shipping features to achieving impact. This side-by-side visual explains how outcome-driven teams measure success, grant more autonomy, define 'done' by results, and plan with an opportunity solution tree.
Examples: spotting outputs disguised as outcomes
Clear-cut example: “Our outcome is to deliver an Android app.” An Android app is something we build and ship. It’s clearly an output.
To get to an outcome, I ask, “What’s the value of having an Android app?” or “How will we know the Android app is successful?”
We might answer: “Having an Android app will allow us to engage more users. We’ll know it’s successful when people engage with the app on a regular basis.”
This answer uncovers the hidden outcome: engage more people. Now we can set the right scope: increase the percentage of engaged users across any platform; increase the percentage of engaged mobile users; or increase the percentage of engaged Android users.
Any of these outcomes gives us more room to explore than a fixed output. Maybe we don’t need a native app at all. We could deliver the same engagement through a mobile web experience, notifications, or email. And we’re not done when we ship—we’re done when the right people are actually engaged.
Tricky example 1: measure the value creation moment (hires, not applicants)
Move beyond shipping features to the impact that matters. This visual maps the path from build an Android app to the real goal, increase engaged users, by asking why, defining value, and owning results.
When setting outcomes, it’s tempting to choose the easiest-to-measure metric. But a good outcome measures the customer’s value creation moment.
I worked at a company that helped new college grads find their first job. When I started working there, the primary outcome was “increase job applications.” This technically is an outcome—it measures a specific behavior in the product.
But it doesn’t measure the value creation moment. A job seeker doesn’t get value when they apply for a job. They only get value when they get the job. Similarly, employers don’t get value from any job applicant, they get value when the right job applicant applies.
Many job boards try to measure qualified applicants—instead of counting any applicant, they compare the credentials of the applicant to the job description and only count qualified applicants. This is better. But it still doesn’t measure the value creation moment. Both the job seeker and the employer get value when an open job is successfully filled. The right metric is hires.
Yes, “hires” can be hard to instrument because it happens off-platform and incentives misalign. Measure it anyway, even with proxies. The easy metric isn’t always the right outcome.
Tricky example 2: measure impact, not user-generated output (the course reviews trap)
I worked with a team that helped students choose university courses. They set their outcome as: “Increase the number of course reviews on our platform.”
Confusing activity with impact? This visual breaks down four common outcome traps—measuring at the wrong moment, mistaking outputs, chasing adoption, and relying on sentiment—so teams focus on real value.
Sounds like an outcome, right? It’s a metric. You can measure it. It’s an action users take on the site—writing a review. But it’s actually an output in disguise.
Reviews are valuable when they help a student evaluate a course. They don’t create any value if a student never sees them. More reviews aren’t always better, especially if they’re clustered where nobody looks.
A better outcome is “Increase the number of course views that include reviews.” Now we’re measuring impact on the decision moment, not just the production of content.
If you can hit your metric without helping customers, you’re tracking an output, not an outcome.
Tricky example 3: measure success, not just adoption (the traction metric trap)
“Increase the percentage of users who viewed the performance report.”
This looks like a good outcome. It measures a specific behavior in the product. It’s within the team’s control. But it’s what I call a traction metric—it measures adoption of a single feature, not value to the customer.
Why teams get trapped in shipping features: a vicious trust cycle fuels micromanagement, while performance-linked outcomes push safe targets. Break the loop and refocus on customer outcomes that truly move the needle.
Two problems arise. First, people can view the report and still not find what they need. Second, we might have perfectly happy customers who don’t need the report at all. Driving usage of an unneeded feature wastes time and erodes trust.
Measure the value creation moment, not just feature adoption.
Tricky example 4: pair sentiment with behavior
I define a product outcome as a metric that measures either 1. a specific behavior in the product or 2. a sentiment about the product. But sentiment metrics—like CSAT or NPS—can be tricky on their own.
Sentiment metrics are outcomes, but they aren’t directional. They don’t tell us where to explore or set guardrails for what to avoid. So I pair a behavior with a sentiment, for example: “Increase engagement without negatively impacting satisfaction.” I use sentiment as a counterweight.
Facebook and Instagram illustrate why this matters. Meta is exceptional at driving engagement—but to a fault. Many of us don’t like these addictive products. Pairing engagement with a satisfaction guardrail prevents “engagement at all costs.”
Why getting this right is hard (and how I counter it)
Ready to move from shipping features to creating impact? This visual playbook shares five practical moves—translate metrics, partner with teams, iterate, avoid traps, and dig deeper—to turn outputs into measurable outcomes.
The trust cycle. Managers don’t trust that teams can reach outcomes on their own. So managers micromanage the outputs. Teams, in turn, don’t communicate their progress toward outcomes—they communicate their progress on features. This reinforces the manager’s belief that they need to stay involved in the details. It’s a vicious cycle.
I break it by asking teams to show their work—share assumptions, research, opportunity solution trees, and evidence behind choices—and by giving feedback on the thinking, not just the solutions.
The accountability trap. When performance reviews are tied to hitting outcomes, teams play it safe. They sandbag their targets. They disguise outputs as outcomes to guarantee “success.”
I treat outcomes as learning opportunities first. When we start on a new outcome, I set a learning goal—“learn what moves the needle on this metric”—before a performance goal—“increase X by Y%.” This creates space to explore without fear.
How I get teams started with better outcomes
Translate business outcomes to product outcomes. Business outcomes like revenue, retention, and market share are lagging indicators—by the time you see them, it’s too late to act. Product outcomes measure behavior changes within the product that lead to those business results. They’re leading indicators within the team’s control.
Negotiate outcomes with your team. Outcome-setting should be a two-way conversation. Leadership brings the cross-company context. The team brings customer insight and technical realities. Neither side dictates; we co-own the target and the constraints.
Stop celebrating shipped features and start celebrating change. This visual contrasts a feature factory mindset with a true product team, urging teams to track impact, not output, and define success by outcomes.
Expect to iterate on your metrics. Your first outcome metric probably won’t be right. That’s normal. Sonja at tails.com went through four iterations—from 90-day retention to 30-day to 5-day to behavior-based metrics—before landing on something actionable. Thomas at Bluestone Analytics iterated three or four times before finding the right metric. Iteration is the work.
Watch for common mistakes. Outputs disguised as outcomes. Traction metrics masquerading as product outcomes. Sentiment metrics without direction. Business outcomes assigned directly to product teams without translating to behavior change.
Use the right artifacts. Replace feature roadmaps with an opportunity solution tree to explore multiple paths, test assumptions, and sequence bets explicitly against a clear outcome.
Align OKRs with outcomes. If your company uses OKRs, make sure the “KR”s are true product outcomes (behavior change and value creation), not a list of features to ship.
The bottom line
When we shift from an output-first mindset to an outcome-first mindset, it doesn’t mean that outputs stop mattering. Product teams will always ship features, and the ability to do so quickly and with quality still matters. This shift simply ensures those features achieve the intended impact. We aren’t done when we ship—we’re done when what we shipped has the intended impact.
Measure success by the impact of what you ship and you’ll build a product team that learns, adapts, and creates real value. Measure success by what you ship and you’ll get a feature factory.
Quick self-check: is your “outcome” really an outcome?
Ask yourself: 1) Does it measure a behavior change or a sentiment tied to value creation? 2) Could we hit it without helping customers? 3) Is it adoption of a single feature (a traction metric) or a result that customers and the business care about? 4) Do we have a counter-metric to prevent unintended harm? If you stumble on any of these, refine it before you commit.
The world can feel like it’s spinning, and as a product leader, I feel that pressure acutely—juggling customer needs, stakeholder expectations, and the relentless news cycle. I recently listened to a powerful conversation with Teresa Torres and Petra Wille about staying grounded when everything feels “bonkers,” and it offered a practical, human way to keep showing up without losing yourself.
What resonated most was the invitation to live my values through small, consistent actions. Rather than waiting for grand gestures or perfect solutions, I’m leaning into the mindset of “Something is better than nothing.” It’s the same spirit we bring to continuous improvement in product: make a change, evaluate impact, iterate.
“Create the world you want to live in” has become a daily prompt for me. I’m applying it to how I spend my attention, time, and platform—three scarce resources for any product management leader. I’m not going to do everything perfectly, but I can make better trade-offs this week than I did last week, and I can keep improving.
Practically, that looks like reconsidering which speaking invites I accept, especially when representation is skewed. If a stage is heavily male, I now ask organizers about their plan for balance before committing. I also question travel expectations for short talks when a high-quality virtual experience is possible—good for sustainability, budgets, and energy. These choices compound, just like product roadmapping and sprint planning decisions.
Petra’s “under-complexity” lens was a wake-up call. In product, oversimplified narratives—whether a single KPI, a vanity metric, or a forced binary—usually increase fear and bad decisions. The same is true in civic discourse. To counter that, I’m seeking more nuance on purpose: reading multiple sources on the same story, listening for who’s not in the room, and noticing how the same facts can carry different meanings depending on who’s telling it.
One simple habit helps: I’ll read The New York Times and The Wall Street Journal on a headline, then follow up with Tangle by Isaac Saul, which lays out “what the left says / what the right says / editor’s take,” sometimes including perspectives from affected communities. It’s a lightweight form of personal knowledge management that improves my product judgment and my citizenship.
Another idea that stuck with me is swapping media proxies for human connection. In product, we don’t ship based on secondhand opinions—we run customer interviews, co-create with users, and build empowered product teams. The same principle applies in community: talk to someone directly affected, ask real questions, and stay curious. When conversations get heated, I try to build bridges, reduce proxies, and look people in the eye.
I’m also reflecting on platform responsibility. Even a “small” platform can snowball through weak ties inside a company or community. I’m asking: When should I speak up? Where should I draw lines? And when is “staying in your lane” actually a way to avoid necessary leadership? These are the same stakeholder management questions we navigate in product strategy—assess impact, clarify intent, and act with integrity.
Local grounding matters, too. I’ve found energy and clarity in community-level action: voting, attending public protests when it feels right, mentoring, and supporting nonprofits like World Pulse. I love the framing of “don’t mess with my neighbors”—it keeps me focused on tangible care when the internet starts to feel like reality. I’ve also seen leaders use angel investing in agriculture-related efforts as a counterbalance to “internet reality,” channeling resources into durable, real-world outcomes.
If you want to experiment this week, pick one small lever you control: where you spend money, time, attention, or your platform. Add nuance by reading at least two different perspectives before reacting. Replace proxies with people by talking to someone with lived experience. Reduce polarization by asking, “what shaped that view?” before judging it. And go local—connect with neighbors or a community group and let small actions compound.
If you’d like to hear the full conversation that inspired these reflections, you can listen on Spotify or Apple Podcasts. Here are the direct links: Spotify: https://open.spotify.com/episode/1sxEFquu73ZB9fL9gGk6Om and Apple Podcasts: https://podcasts.apple.com/kh/podcast/staying-sane/id1794203808?i=1000755696295
Resources I’m exploring and recommend: World Pulse (https://www.worldpulse.org/), The New York Times (https://www.nytimes.com/), The Wall Street Journal (https://www.wsj.com/), and Tangle by Isaac Saul (https://www.readtangle.com/ and https://www.readtangle.com/author/isaac-saul/). For builders and writers, I also appreciate Ghost (https://ghost.org/) as an open-source publishing platform. If you work in or with the MENA ecosystem, take a look at MENA Product Summit ’26 (https://www.prdkt.plus/summit26). Colleagues like Jeff Merrell (https://jeffdmerrell.com/) and grassroots efforts such as No Kings Protest (https://www.nokings.org/) offer additional perspectives and ways to get involved.
If this resonates, share it with a teammate who’s been feeling the weight of the world. I’d love to hear one small, values-aligned action you’re taking this month—what “something” will you try next?