Tag: platform scalability

  • How I Champion Platform Excellence: Lessons in Analytics, Scalability, and Product-Led Growth

    How I Champion Platform Excellence: Lessons in Analytics, Scalability, and Product-Led Growth

    I’m continually inspired by platform specialists who champion their analytics platforms end to end. When I study their work, I look for the connective tissue between strategy and execution—how behavioral analytics informs decisions, how a unified analytics platform reduces tool sprawl, and how great documentation and enablement convert insights into habit across product, engineering, and go-to-market teams.

    What consistently stands out is the rigor behind the scenes: clear data governance, privacy-by-design, and instrumentation standards that keep events trustworthy as products evolve. Platform scalability isn’t just about throughput; it’s about guardrails—naming conventions, schema versioning, and lineage—that let teams move quickly without sacrificing integrity. These are the unsung details that make insights reliable and repeatable at scale.

    I also pay close attention to how experimentation gets operationalized. Thoughtful A/B testing, well-scoped feature flags, and crisp definitions of “minimum detectable effect (MDE)” ensure that experiments produce signal instead of noise. Driver trees, opportunity solution trees, and continuous discovery keep teams anchored on outcomes, while retention analysis translates curiosity into durable growth. This is the backbone of product-led growth: small, fast bets tied to measurable behavioral shifts.

    Reliability and insight quality go hand in hand. Observability for event pipelines, anomaly detection to surface data drift, and targeted session replay help teams debug both product experience and analytics instrumentation. Paired with Web Vitals and clear ownership models, these practices shorten feedback loops, reduce blind spots, and keep platform credibility high—because trust is the real KPI behind every dashboard.

    In my own practice, I translate these lessons into roadmaps that balance discovery with delivery, and align solutions engineering, product, and design around the same north-star metrics. The result is a culture where platform champions don’t just advocate for tools—they enable outcomes. If you’re scaling an analytics stack or elevating your product strategy, these principles will help you move faster, with confidence, and make every insight count.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Built for Your Biggest Days: How We Engineer Fair, Reliable Scale Without Downtime

    I’m getting sharper, more specific questions about scale from enterprise customers every quarter, and that’s exactly how it should be. Teams want to know how our platform behaves during their highest-volume moments — the Black Friday sales, the sporting events, the production incidents — and they want confidence their growth won’t outpace the systems they depend on. We welcome those questions. They’re the right ones to ask of any critical component of your business. Today, our systems handle serious scale. At daily peak, we see over 150,000 customer requests per second coming into the platform, with more than 70,000 asynchronous requests per second flowing through the background systems. During our busiest days of the week, we handle over five million conversations and more than 100 million comments being added across the platform. We also design for individual customer spikes, not just aggregate platform traffic. We can handle a single customer workspace spiking with hundreds of comments per second, or around 100 new conversations per second. Sustained over a full day, that would map to millions of conversations from a single customer. While those numbers matter, they age quickly. Every growing software company can publish a bigger number every year, month, week. What ultimately matters is whether the architecture has clear scaling levers, whether we understand the pressure points in the system, and whether we can add capacity before customers need it. Every system has limits. Competence is knowing where they are, measuring them, and moving them before customers reach them. Here’s how we do that in practice. We build on boring foundations because at the edges, we try hard not to be clever. We use AWS for the infrastructure primitives AWS is very good at running. We do not want our engineers spending their best energy recreating S3, load balancers, queues, or commodity infrastructure patterns. We want that energy spent on the parts of the system that are specific to our customers and our product. “That is a deliberate trade-off. It gives us fewer systems to understand, deeper expertise in the ones we do run, and more leverage when we need to scale.” This extends a principle I’ve embraced for years: run less software. The point isn’t to minimize the stack for its own sake; it’s to compound expertise. When many teams build on the same small set of technologies, our tooling, observability, and operational practice all improve together. Boring technology choices aren’t a lack of ambition — they reserve our ambition for the nuanced scaling challenges that matter. The source of truth is the hard part. You can scale stateless web traffic by adding machines, add queue consumers, and add cache. Those are real problems — just not the hardest ones. The source-of-truth database is where the most important data lives, where the hardest correctness guarantees exist, and where maintenance windows often come from. It has to be correct, fast, resilient to failover, capable of large migrations, and able to keep serving traffic while we improve it. As customers grow, it cannot require a full re-architecture every time the next ceiling appears. That is why we moved to Vitess, managed by PlanetScale. The goals were clear: improve availability, reduce operational complexity, make large table migrations safer, simplify MySQL scaling, and eliminate customer downtime from routine database maintenance and failovers. When we first laid out this direction, the largest part of the migration was still ahead of us. We completed that migration in 2025, and the benefits are now part of how we operate the platform day to day. Today, our highest-scale source-of-truth data is spread across 128 shards. The database layer handles around two million requests per second, with more than ten million cache reads per second in front of it. For the largest customers, we can isolate and scale database capacity independently, including dedicating a shard to a single customer when needed. We have not come close to needing that, which is significant. The goal of architecture like this is not to run every system at the edge of its capacity, but rather to have room to move before customers need it. Vitess gives us native sharding, query routing, online schema change capabilities, connection pooling, and resharding primitives built for this kind of workload. Instead of application code carrying all of the sharding complexity, the database layer can do more of the work. That reduces cognitive load for engineers and removes whole classes of operational risk. Ultimately, this gives us practical scaling options instead of hard architectural rewrites, and lets us do routine database improvement without planned customer-impacting maintenance windows. Search is not a hidden bottleneck for us. Search underpins core product surfaces across the platform — from vector search in our AI features to realtime reporting — and if it’s slow or unhealthy, customers feel it. Scaling isn’t just adding more machines; often the better approach is making the product do less unnecessary work. Today, our Elasticsearch clusters support a much higher-throughput product than in the past, with more than 650TB of storage, more than 1.7 trillion documents, and peaks above 40,000 requests per second. We’re serving a larger product surface more efficiently, not just running a bigger cluster. More importantly, when an index gets too large or traffic distribution turns unhealthy, we don’t want a high-risk, manual migration. We reshape Elasticsearch indexes online by partitioning by customer ID, dual-writing to old and new indexes, backfilling, validating, gradually moving customers with feature flags, and deleting the old index only when we’re confident. We’ve used this pattern for years to make large search migrations safer and more incremental — a core playbook in our platform scalability and SRE practices. Fairness is non-negotiable in a multi-tenant system. A single customer’s high-volume moment should not quietly become everyone else’s latency problem. We design for this at multiple layers. For asynchronous work, we use overflow queues and queueing strategies that prevent one high-volume workload from consuming shared capacity in a way that hurts quieter tenants. AWS SQS fair queues are one example of a primitive we use extensively. They’re designed for exactly this class of problem. When one tenant creates a backlog in a shared queue, fair queues help reduce the dwell-time impact on other tenants. We also build application-level guardrails so customer isolation doesn’t depend on every engineer remembering every rule in every code path. In a large multi-tenant Rails application, the safe path must be built into the system. The focus is primarily about correctness and customer data separation, but the broader operating principle is the same: important customer boundaries should be enforced by infrastructure and application frameworks. The same thinking applies to scale. We want customer-specific load to be visible, attributable, and controlled. When a customer spike happens, we should be able to understand it as that customer’s workload, protect the rest of the platform, and add capacity where it’s actually needed. Fin adds a new dimension to scaling. Our AI Agent Fin introduces a new set of infrastructure challenges. To provide reliable AI-powered support at scale, we need to operate across multiple model providers, route across them based on capacity and latency, and protect customer-facing workloads from lower-priority work. The details differ from traditional SaaS infrastructure, but the principle is the same: understand the bottlenecks, build clear scaling levers, and monitor the customer outcome. AI providers are not commodity storage systems, and we do not design as if they are. That is why we have invested in Fin-specific reliability systems. Fin now fully resolves over two million conversations per week. At that scale, high availability cannot depend on a single model, a single provider, a single region, or a single pool of capacity. Our LLM routing layer supports cross-vendor failover, cross-model failover, latency-based routing, capacity isolation, and load testing. We also maintain buffer capacity with major providers, with headroom to handle 2x to 3x normal Fin traffic at any point. For enterprise customers, this matters because AI support volume can spike just like human support volume — and the AI layer must absorb that spike without relying on one fragile upstream path. When customers depend on Fin to absorb a spike in support demand, the AI layer needs the same operational discipline as the rest of the platform. Performance tests help, but production traffic is reality. Real customers use products in ways no synthetic test will perfectly predict: launches, incidents, seasonal patterns, gaming events, sudden changes in end-user behavior. Those moments give us data that no lab can fully reproduce. Often, a large customer event barely moves the platform-wide graphs because our customer base is broad enough that one industry’s peak aligns with another’s quiet period. Black Friday and Cyber Monday are good examples. Many ecommerce customers are at their busiest, while many B2B SaaS customers are quieter. At the aggregate platform level, the change can be much less dramatic than people expect. “That does not mean those events are unimportant. It means we need to look at both levels: the health of the overall platform and the experience of the individual customer having the spike.” Sometimes, these events teach us something specific. In one case, a very large customer used the Messenger in a way that exercised the full Messenger lifecycle even though the visible user experience did not require it. Under normal traffic, this was fine. During a major customer-side incident, their users refreshed aggressively, generating a much larger burst of Messenger traffic than the integration actually needed. The platform stayed available, but the event exposed unnecessary work in that integration path. We built a lighter-weight integration path that served the customer’s actual use case with far less work per request, making future spikes easier to absorb. We treat large customer events this way even when there’s no broad customer impact. They’re opportunities to understand real scaling properties and make the next event safer — a habit that anchors our incident management, observability, and FinOps practices. Scale is also an operating model. The infrastructure matters, but it’s not enough. You can have the right database architecture and still hurt customers if you detect issues late, recover slowly, communicate poorly, or fail to learn from incidents. “That is why our operating model starts with customer outcomes. If the customer cannot do the job they came to do, the system is unhealthy. It does not matter how many dashboards are green.” Heartbeat metrics tell us whether customers can do the core jobs they hire us to do. They cut through infrastructure noise and answer the question that matters most during an incident: are customers able to use the product successfully? This shapes how we ship. Today, we average around 250 ships to production per workday, with an average merge-to-production time under 10 minutes. That isn’t a vanity metric — it’s part of the safety model. Smaller changes are easier to understand, easier to observe, and easier to roll back. Feature flags let us separate deployment from release. Automatic rollback and heartbeat-driven detection help us recover quickly when a change hurts customers. These are the very DORA metrics we hold ourselves to in order to balance CI/CD speed with stability. “Fast shipping is not the opposite of reliability. Done properly, it is one of the ways you stay in control of change.” The bar is high. Engineers are expected to understand the impact of their changes, watch them go live, and act quickly if something looks wrong. Resuming service is not the end of an incident. We expect teams to understand the root cause, fix the contributing systems, and prevent recurrence. That’s how scale stays safe over time. Scheduled maintenance should be extraordinary. Historically, database maintenance was a main reason for maintenance windows: upgrading a database, changing instance sizes, performing failovers, or moving large tables could require customer-impacting downtime. With the move to Vitess and PlanetScale, we changed what routine database improvement looks like. We can upgrade, scale, and improve critical database infrastructure without turning that work into planned customer-impacting downtime — and we do this in practice, not just as a goal. This matters because customers rely on our platform for live operations. If their support team, Messenger, Help Desk, or AI Agent is unavailable, the impact is immediate. Scheduled maintenance cannot be treated as a casual operational convenience. “Our posture is simple: routine infrastructure improvement should not require planned customer-impacting downtime.” Scheduled maintenance should be exceptional, non-routine, clearly communicated, and minimized in frequency, duration, and customer impact. That’s the practical benefit of the architecture work: better scaling is not only about handling more traffic, but also reducing the operational moments that might inconvenience customers. What this means for customers is simple: be skeptical of vague scale claims. The question isn’t whether a vendor says they can scale — it’s whether they can explain how, where the limits are, what they measure, how they recover, and what they’ve changed after learning from production. We understand the scaling properties of our systems, have clear levers to add capacity at the right layers, design for customer isolation and fairness, monitor customer outcomes directly, and use real production events to make the next one safer. Scale is never finished. Every large customer event, traffic spike, migration, and incident teaches us something about the real behavior of the system — and we use that data to keep improving. That’s what you should expect from a platform you depend on during your busiest moments.

    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Escape the “It’s Just an LLM” Trap: Inside Operator, a Reliable, Actionable AI Agent

    Escape the “It’s Just an LLM” Trap: Inside Operator, a Reliable, Actionable AI Agent

    We just launched Operator, an Agent for your customer operations that helps you understand, manage, and improve your entire customer experience. I’ve spent years shipping AI-driven products at production scale, and this one reflects the lessons I’ve learned the hard way about what it really takes to go from a flashy demo to a dependable system your team trusts.

    To give you a clear view of just how powerful this Agent is, I want to share the technical infrastructure and engineering choices that make Operator work reliably at production scale across thousands of customer workspaces. My goal is to demystify the gap between a well-prompted LLM and a true, production-grade Agent—so you can make an informed build vs. buy decision.

    If you’re a technical leader evaluating whether to build something like this yourself, or trying to understand the difference between a well-prompted LLM and a production Agent system, this is for you.

    Escaping the “it’s just an LLM” trap

    Most engineering teams in this space start the same way: a prototype. You take a foundation model, give it API access to your support data, add a system prompt with some domain context, and you’ve got something that queries your database, summarizes tickets, and generates reports that look right. It demos convincingly—and I’ve been there, impressed in the moment, only to watch it buckle under real-world complexity.

    The problem with that prototype is that it obscures the scope of what’s actually required. It demonstrates the 10% of the system that’s straightforward to build, and it’s easy to assume the rest is just as straightforward. It isn’t. The gap between a working demo and a production system your team depends on daily is where most of the engineering investment lives. That’s precisely the gap we focused on closing.

    With Operator, we’ve invested deeply in every layer: tooling, reasoning, how the Agent takes action, and the infrastructure that makes it reliable at scale. Here’s a closer look at the architecture and why it matters for agentic AI, platform scalability, and observability.

    The tooling layer

    The first thing we had to confront was that the obvious approach (giving a model access to your APIs and letting it figure things out) doesn’t hold up in production. The model makes reasonable decisions for simple queries, but operating across thousands of customer workspaces with different configurations, data models, and usage patterns, a “figure it out” approach isn’t nearly precise enough.

    What you need is purpose-built tooling: tools that encode decisions about what data to fetch, how to structure it, what context to include, and what to leave out. Operator has over 50 of these tools and 10 skills.

    A tool is a single action that Operator takes (search content, run a query, look up a conversation). A skill chains multiple tools together to complete a whole job, like debugging a conversation end-to-end, rolling out a content update across an entire help center, and identifying the next automation opportunity. This is where AI workflows move from abstract prompts to dependable, repeatable outcomes.

    The difference between using thin wrappers around API endpoints and purpose-built tooling shows up in something as seemingly simple as a performance question. When you ask “how did Fin perform last week?”, a naive implementation runs a query and hands back a table. Operator runs a reporting tool that determines which metrics are relevant for your specific workspace, which are meaningful for your particular question, and what the numbers actually mean in context, giving you a much richer answer that you can do something tangible with.

    Developing that behavior took months of engineering. Not because any individual piece is conceptually hard, but because getting it right across the full range of customer workspaces, configurations, and edge cases is an iterative process. You build it, you test it against real conversations, you find the cases where it breaks, you fix those, and you repeat. There’s no shortcut—and in practice, this is where most DIY efforts stall.

    The intelligence layer

    The tooling layer solves what to do, but beneath it is a harder problem: understanding what’s worth doing, and why. This is the layer that makes Operator understand your business rather than just query it. Three components go into it, and in my experience they’re non-negotiable for a reliable Agent.

    1. Semantic search

    Unlike solutions that rely on keyword matching, Operator uses a system that understands what content is about, not just what words it contains. When it searches your help center, it’s using the same semantic search engine we’ve spent years optimizing for Fin itself. This is a retrieval system that’s been tuned against millions of real support conversations, with precision and recall characteristics we’ve measured and improved continuously. This retrieval-first pipeline is the backbone of grounding and dramatically reduces hallucinations.

    2. Attribute awareness

    Operator has access to your data and knows what is meaningful for different questions. It knows which metrics are actually in use in your workspace, which custom attributes carry signals, and which fields are populated versus effectively empty. We’ve built specific skills that give Operator this meta-knowledge, so when it’s investigating a performance question, it’s looking at the right things, not hallucinating insights from sparse data.

    3. Intelligent reasoning

    A well-built Agent can answer your question and anticipate what you should ask next. If you ask Operator about escalations spiking, it doesn’t just say, “escalations increased 23% week-over-week.” It’ll continue on to tell you why this happened by examining the escalated conversations and identifying that a disproportionate number involved a specific product area, before moving on to check whether the relevant help content is up to date, and, if it isn’t, proposing an update. That chain of reasoning isn’t prompt engineering. It’s encoded in the skills we’ve built, refined against the patterns we see across our entire customer base.

    The action layer

    This is where the engineering complexity increases by an order of magnitude because instead of just analyzing problems and recommending solutions, Operator takes action to solve them itself. It can update Guidance rules, draft and publish help articles, create Procedures, configure data connectors, and modify your Fin configuration. Moving from read-only insights to write-capable actions is a fundamentally different class of product and infrastructure problem—one that demands rigorous SRE practices and rock-solid safeguards.

    Every one of these actions has to be safe, reversible, and auditable. An analytics tool that occasionally returns a wrong number is frustrating. but an Agent that occasionally applies a wrong configuration change to a live support system is a different category of problem. To prevent this, we built a robust proposal system, whereby every change Operator suggests is presented as a reviewable diff. You see exactly what will change before anything is applied, with the option to accept, reject, or refine. Nothing goes live without your explicit approval.

    What else sets Operator apart

    A UI that’s both conversational and graphical, not one or the other. Operator blends conversational interaction with purpose-built graphical components. Proposal diffs show exactly what will change in an article. Inline charts visualize performance trends. Dashboards render directly inside the conversation thread. In practice, that means a knowledge manager reviews a structured diff—not a wall of LLM-generated text—and a team lead asking about weekly performance gets an accurate chart with context, not a paragraph approximating data.

    Building this hybrid experience is extremely difficult outside of a native platform integration. In a chat interface or CLI, you’re limited to text output; in a standalone dashboard, you lose conversational context. Operator does both in the same thread, so every interaction is detailed and context-rich—and importantly, actionable in the flow of work.

    It lives where your team already works. Operator is built into the same platform your team uses every day. It’s not a separate tool with a separate login, nor is it a Slack bot your engineer set up that only three people know about. It operates exactly where you are, alongside the conversations, help center articles, workflows, and data you’re working with. That tight integration closes the gap between finding a problem and fixing it: spot an outdated article while reviewing a Fin conversation, and Operator can surface the fix in the same session. Notice an escalation spike in the morning, and you can ask Operator to investigate without switching tools, waiting for a data pull, or filing a ticket.

    The compounding advantage

    Every customer using Operator teaches us something. We see which debugging approaches work across different types of support operations, learn which content structures perform better, and identify automation strategies that consistently land. Those patterns get encoded back into Operator’s skills and tools. When we discover that a particular sequence of investigation steps reliably identifies the root cause of a spike in escalations, we build that into Operator’s diagnostic skill. When we find that a specific way of structuring help articles leads to higher Fin resolution rates, we encode that into the content creation skill. Our engineering team is continuously shipping improvements based on what we observe across the entire customer base.

    A custom-built solution gives you exactly what you built, meaning it doesn’t get smarter unless you invest engineering resources into making it smarter. And that usually means taking time and talent away from your core product. I’ve watched teams underestimate the ongoing cost of eval-driven development, model upgrades, and API churn—costs that only grow as your footprint expands.

    We’re not locking the door

    Some teams want to build their own Agents. Some of our most technical customers do this. But when you do, you’re working with raw APIs and building your own tooling on top of them. When you use Operator, you’re working with a system that already knows what questions to ask, understands your data, and encodes the best practices we’ve learned from thousands of support teams. We recently launched the Fin CLI, which means you can use third-party agents like Claude Code or Cursor to interact with your Fin data and configuration. That door is open. What I hope this post has clarified is everything that goes into the build of Operator: Over 50 tools and 10 skills, purpose-built for support operations. Years of investment in semantic search. Deep integration with every layer of Fin’s stack. The proposal system. The intelligence layer. The reliability infrastructure.

    If you’d still like to move ahead with building a custom solution, here’s an honest assessment. You can build a useful read-only tool in weeks. It’ll query your data, summarize tickets, and generate reports, but turning it into a production system will take quarters. Reliability, security, edge case handling, multi-tenant data isolation, and graceful degradation are all important architectural decisions that you’ll need to get right from the start. The action layer is also where you might risk stalling out. Going from “here’s what’s wrong” to safely making changes in a production system is a fundamentally different engineering problem than analysis. Most DIY projects never get there. Finally, you’ll be maintaining it forever. Every model upgrade, API change, and new capability in your support platform means updating your custom tooling. We have a team dedicated to this. You’ll need one too.

    The economics still favor buying when a vendor has invested more in the problem than you can justify internally. What I hope this post adds is a clearer picture of what that investment actually looks like from an engineering perspective—and why it compounds into a durable advantage for your support organization.

    The investment is ongoing. The problems we’re solving at the infrastructure level today are harder than the ones we solved a year ago, and that trajectory isn’t slowing down. If you’re ready to see the difference a production-grade Agent can make, explore Operator.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • From FinOps to Customer FDEs: How AI Agents and Platforms Unlock Smarter Cloud Spend

    From FinOps to Customer FDEs: How AI Agents and Platforms Unlock Smarter Cloud Spend

    I see the rise of Customer Forward Deployed Engineering (FDE) as a pivotal bridge between FinOps engineering, AI strategy, and measurable customer outcomes. When we align internal platforms and agentic AI with real-world use cases, we don’t just reduce cloud costs—we accelerate adoption, de-risk deployments, and create durable product value that compounds over time.

    "Hac Phan leads FinOps engineering at Amplitude, where he builds internal platforms and AI agents that help teams understand and optimize cloud spend. He now heads Amplitude's Customer Forward Deployed Engineering team." That evolution—from building internal capabilities to leading a customer-facing FDE function—captures a pattern I’ve seen repeatedly: the skills that tame complexity inside the company are exactly the skills customers need most at the edge.

    In my experience, Customer FDEs thrive when they embed with strategic accounts to translate product capabilities into concrete outcomes: lower unit economics, faster time-to-value, and cleaner governance. They partner closely with solutions engineering, product management, and customer success, using platform building blocks and AI workflows to illuminate the cost drivers that matter—then engineering the shortest path to savings and scale.

    The operating model is straightforward but disciplined. Set a clear mission (optimize cost-to-value while expanding usage), define a small set of leading indicators (time-to-first-value, cost per active workload, deployment frequency, NRR lift on FDE-supported accounts), and establish crisp handoffs with core product teams. When FDEs surface repeatable patterns, those insights should flow back into the roadmap as native features, guardrails, and in-product guidance—so every customer benefits, not just the lighthouse few.

    Tooling matters. Internal platforms that unify telemetry, usage metering, and pricing logic give FDEs the levers to diagnose and fix issues quickly. Layering AI agents on top of that foundation enables proactive recommendations—think unit-economics dashboards, anomaly detection on spend spikes, and automated playbooks that right-size workloads. With agent analytics in place, we can measure the value of each recommendation and continuously tune the system.

    I’ve seen this model turn tense, cost-focused conversations into strategic planning sessions. Instead of debating line items, we co-design architectures that scale efficiently, with platform scalability and governance built in from the start. Customers appreciate the candor and the engineering rigor; teams appreciate how those field insights sharpen product strategy.

    For leaders considering this path, start small and design for leverage. Stand up a single FDE pod focused on 2–3 high-potential customers. Codify playbooks for cloud cost optimization, instrument agent analytics from day one, and publish a weekly learning loop back to product. Within a quarter, you’ll know which interventions to automate, which to turn into product features, and which require deeper solutions engineering support.

    The broader lesson is simple: when we merge FinOps discipline with customer-embedded engineering and AI-driven insights, we create a force multiplier. Customer FDEs don’t just help accounts spend less; they help them achieve more—sustainably, transparently, and with the confidence that comes from a platform (and a team) built to scale.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • We Rebuilt Session Replay Delivery for Blazing Speed—Lighter Pages, Richer, More Reliable Data

    We Rebuilt Session Replay Delivery for Blazing Speed—Lighter Pages, Richer, More Reliable Data

    Session replay should illuminate user behavior, not slow it down. That belief drove us to rebuild the delivery layer behind our Session Replay from the ground up so it’s lighter on your pages while capturing richer, more reliable signals for behavioral analytics and product insights.

    Our objective was clear: preserve page performance and Core Web Vitals while improving data completeness under real-world conditions. We focused on reducing client-side overhead, smoothing network bursts, and scaling the pipeline so it performs consistently during long sessions, high-traffic spikes, and complex interactions—without compromising observability or user experience.

    To get there, we redesigned how events flow from the browser to our edge and storage layers. We decoupled capture from delivery, introduced adaptive batching and backpressure-aware controls, tightened compression strategies, and prioritized critical events to reduce jitter and dropped packets. The result is a delivery path that’s resilient to network variance, efficient in payload size, and friendlier to the main thread—key ingredients for platform scalability and SRE-grade reliability.

    Get a glimpse into how we overhauled Session Replay’s data delivery, and how you can expect more complete data, lower payload sizes, and more. In practice, that means steadier capture across long sessions, fewer gaps during rapid DOM changes, and leaner, faster uploads that respect the constraints of modern browsers and mobile networks. It’s an upgrade designed to protect page speed while strengthening the fidelity of what you see in replay.

    These changes elevate how product teams, analysts, and support engineers diagnose issues and optimize funnels. With higher-fidelity replay and lighter page impact, you can connect the dots faster—from anomaly detection and conversion bottlenecks to subtle UX friction—within a unified analytics platform. It’s a meaningful step forward for data-driven product strategy and for keeping your observability toolkit both accurate and performance-aware.

    While performance guided every decision, privacy and governance stayed first-class. Our delivery patterns work hand-in-hand with data governance practices to help teams maintain responsible capture boundaries while still achieving the completeness and granularity they need. This balance lets you scale replay confidently across surfaces and teams.

    We’ll continue monitoring downstream impact across Web Vitals, long tasks, error rates, and event integrity—iterating as we learn. If you rely on session replay to inform roadmaps, triage incidents, or accelerate product-led growth, you should feel the difference: a lighter footprint on your page and a stronger foundation for trustworthy insights.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Unlocking Session Replay at Scale: How Amplitude Elevates UX, Observability, and Trust

    Unlocking Session Replay at Scale: How Amplitude Elevates UX, Observability, and Trust

    I build products to translate noisy interaction data into clear, actionable decisions. Few capabilities deliver that clarity like session replay. It closes the gap between what analytics tells us and what users actually experience, empowering product, design, and SRE teams to learn faster, resolve issues sooner, and improve customer trust.

    Lew Gordon is a Senior Staff Engineer at Amplitude focusing on Session Replay. He was formerly an engineer at Twilio.

    In my practice, session replay complements Amplitude analytics and behavioral analytics by adding rich context to the unified analytics platform—turning charts into stories we can act on. When I can see the precise clicks, hesitations, and error states behind a spike or a drop, prioritization becomes straightforward and the path to product-market fit becomes easier to navigate.

    Operationally, replay deepens observability. I correlate console errors, network traces, and layout shifts with user intent, then tie those signals to Web Vitals, performance budgets, and SRE workflows. The result is a tighter feedback loop from incident to insight—one that shortens mean time to resolution and raises the bar on reliability without guesswork.

    Privacy-by-design is non-negotiable. I start with strong data governance: selective capture and redaction, explicit consent and retention policies, role-based access, and environment-aware sampling. These controls keep sensitive data protected while still providing the fidelity product and engineering need to diagnose issues and improve experiences responsibly.

    Strategically, I deploy replay where it moves the needle most: onboarding and activation moments, high-friction conversion flows, and critical paths with outsized revenue or trust impact. I track signals like rage clicks, dead clicks, scroll depth, and error states to inform product strategy and reduce UX debt, while linking improvements to activation and retention analysis, time to resolution, and DORA metrics.

    At scale, success requires platform scalability: efficient indexing, low-latency retrieval, and smooth playback across browsers and devices—all while maintaining tight CPU, memory, and bandwidth budgets. When integrated with CI/CD and experimentation, replay becomes a force multiplier for continuous discovery and confident, rapid iteration.

    My takeaway: session replay is not just a debugging tool—it’s a shared language across product, engineering, and design. With the right guardrails and operating model, it elevates decision quality, accelerates learning, and builds the trust customers feel with every interaction.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • From 70 Employees to Dominance: My Playbook for Hypergrowth, Focus, and Top-Down Goals

    From 70 Employees to Dominance: My Playbook for Hypergrowth, Focus, and Top-Down Goals

    Scaling a real-world marketplace from scrappy to dominant takes a different kind of product leadership. Reflecting on Christopher Payne’s decade leading DoorDash as President and COO — growing from roughly 70 employees to the dominant food delivery platform in the US — I’m struck by how much of that success hinged on mastering an atoms-based business while still operating with software-level rigor. As a VP of Product Management, I see the same patterns in my own work: relentless clarity on inputs, a bias for builder-executives, and a cadence that keeps leaders close to product details without becoming bottlenecks.

    Running an atoms-based business versus a pure software company forces you to obsess over operational physics: unit economics, quality control, on-time reliability, and dense local liquidity. It’s precisely where traditional “bits” executives can stumble. What’s worked for me is a simple “plate spinning” framework for executive attention: identify the five or six plates that must never stop — customer experience, marketplace health, quality and safety, product velocity, platform reliability, and P&L — then schedule recurring deep dives to keep those plates spinning. If a plate wobbles, I drop in, fix the root cause, re-instrument the inputs, and zoom back out.

    Hiring at hypergrowth speed only works when you bias toward a “builder mentality.” I look for executives who run toward fuzzy problems, write clearly, and can prove they’ve shipped value with incomplete information. Prior industry experience can be a liability when you’re reinventing the market; first-principles thinkers outlearn domain experts who try to port yesterday’s playbooks. In executive hiring, I’ve found structured work samples and narrative memos far more predictive than marathon interview loops — companies routinely spend too much time on job interviews and too little time evaluating how candidates think and execute.

    Great executives never outgrow the details. Staying close doesn’t mean micromanaging — it means sampling the customer journey and instrumenting the system so you can feel where it hurts. In my own practice, I rotate through frontline touchpoints weekly: support transcripts, NPS verbatims, failed checkout sessions, and reliability dashboards. Small signals often reveal systemic issues. A single ciabatta bread moment — the kind of edge-case substitution that seems trivial — can expose broken handoffs, unclear policies, and misaligned incentives across the marketplace.

    Top-down goal setting beats bottom-up when you’re aiming for category leadership. Bottom-up targets tend to regress to comfort; they calibrate to today’s constraints, not tomorrow’s possibilities. I set ambitious, top-down outcomes (not output), frame the non-negotiables, and map driver trees to clarify the input metrics that matter. Then I ask empowered product teams to pressure-test the plan, propose approaches, and own the how. This preserves ambition while unlocking creativity — a practical balance of clarity and autonomy that outcomes vs output OKRs were designed to achieve.

    One-size-fits-all management is a myth. Early-stage teams need hands-on coaching and fast decisions; later-stage teams need mechanisms that scale: crisp PRDs, pre-mortems, and operating cadences that separate strategy, planning, and execution. The mark of a high-functioning executive team is not uniform style — it’s high candor, fast escalation paths, and visible commitment after debate. In tough moments, a little charisma goes a long way; in practice, that’s not theatrics, it’s steady optimism, simple language, and consistent follow-through that keeps people moving forward.

    The hypergrowth skill stack for executives is surprisingly learnable: ruthless prioritization under uncertainty, narrative writing that aligns cross-functionally, structured delegation with clear “inspection points,” and a weekly rhythm that protects maker time. I leverage a cadence of business reviews (inputs > outputs), customer-scent checks, and decision logs so we can move fast without losing the thread. CEO and executive time management is the ultimate forcing function — if we can’t show where our attention maps to goals, the team won’t either.

    Some of my enduring lessons echo the best of Amazon and eBay: customer obsession beats competitor obsession, input metrics beat lagging vanity metrics, and simple mechanisms beat heroics. From Jeff Bezos’s playbook I borrow the insistence on written narratives, single-threaded ownership, and clarity on what will not change. Those principles remain the backbone of platform scalability and resilient product strategy, especially when markets get noisy.

    AI is about to flatten organizations. With agentic AI, retrieval-first pipelines, and AI workflows embedded into product development, managers can widen their span without losing fidelity. I see LLMs for product managers accelerating discovery, PRD drafting, and experiment analysis — while raising the bar on decision quality. The implication for leadership: fewer layers, more transparency, and even greater pressure to define sharp, top-down outcomes that teams can autonomously pursue.

    If I had to compress this into a playbook, it’s this: set audacious, top-down goals; keep your “plate spinning” calendar sacred; write more than you talk; hire builders, not resume archetypes; sample the customer journey every week; and build mechanisms that make the right thing easier than the heroic thing. That’s how you scale product management leadership from dozens to thousands — in atoms, in bits, and in the messy, exhilarating space where they meet.


    Book a consult png image
  • Inside Amplitude’s AI Platform: Powerful Lessons for Product Leaders Shaping Analytics

    Inside Amplitude’s AI Platform: Powerful Lessons for Product Leaders Shaping Analytics

    Every so often, a single line captures the essence of platform thinking at scale. "Vinay is a Staff AI Engineer at Amplitude. He builds the foundational AI platforms that empower internal innovation and help define the future of AI analytics." That statement crystallizes the mandate many of us share: create durable AI capabilities that compound value across teams, products, and customers.

    When I think about "foundational AI platforms" in the context of Amplitude analytics and behavioral analytics, I see more than infrastructure. I see a product strategy choice: invest in a unified analytics platform that lowers the cost of experimentation, increases the trustworthiness of insights, and speeds time-to-learning for empowered product teams. That’s the engine behind sustainable product-led growth.

    For me, the platform blueprint starts with three layers: high-quality data foundations (schema design, governance, lineage), model lifecycle rigor (evaluation, observability, versioning), and safe, self-serve interfaces that meet teams where they work. Without strong data governance and clear accountability, even the smartest gen ai features struggle to gain adoption. With them, platform scalability and reliability become a competitive advantage—not just an operational checkbox.

    Empowering internal innovation requires thoughtful constraints. I’ve seen the best teams pair self-serve tooling with guardrails: templates for use cases, bias and risk checks, and well-documented pathways from prototype to production. This balance turns AI Strategy from a slide into a system—one that helps teams decide when to build vs buy, how to measure value, and how to retire what no longer serves the roadmap.

    Looking ahead, the future of AI analytics is about making intelligence ambient. That means stitching together event data, product usage, and customer context so insights surface exactly when decisions are made. It also means bringing gen ai responsibly into the workflow—summarizing behavior, explaining anomalies, and suggesting next best actions—while maintaining transparency and auditability.

    My practical takeaways: invest early in shared components that everyone can use (feature stores, evaluation harnesses, data contracts); standardize interfaces so teams ship faster with fewer handoffs; and measure platform outcomes with product metrics, not just infrastructure metrics. Done well, this approach compounds: faster cycles, higher confidence, and a steady drumbeat of wins that reinforce a culture of learning.

    In short, building the right AI foundations is how we unlock scale, create leverage for every team, and keep our edge in a dynamic market. That one line about building foundational AI platforms isn’t just a role description—it’s a north star for any product leader serious about shaping the next era of analytics.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Why MCP Is Transforming Product Management: Field-Tested Lessons from Miro, Atlassian & More

    MCP is the acronym I keep hearing in every product conversation—and for good reason. When teams like Miro and Atlassian lean in, it signals a real shift in how we design, ship, and scale value. From my vantage point leading product at HighLevel, I see MCP less as a feature and more as an operating advantage: a way to align strategy, execution, and governance so product teams move faster with higher confidence.

    When I evaluate a platform like MCP, I start with three questions. First, does it advance our product strategy and sharpen competitive differentiation? Second, does it strengthen product-led growth by improving activation, onboarding, and retention? Third, does it help us drive outcomes vs output OKRs so we consistently measure what matters, not just what ships?

    Execution discipline makes or breaks any MCP investment. I design measurement upfront: instrument A/B testing, define activation milestones, and monitor retention cohorts. In parallel, I use Pendo for in-app guides and product tours to accelerate adoption and reduce time-to-value, then connect this data back to roadmap decisions so each release compounds learning instead of creating noise.

    On the operating model, I apply a rigorous build vs buy lens and stress-test platform scalability, reliability, and integration surfaces. Stakeholder management is critical—security, SRE, and solutions engineering must be partners from day one. I anchor teams in product trios and continuous discovery so we learn with customers in the loop, not after the fact.

    At Pendomonium 2026, Pendo CPO Rahul Jain brought together four product leaders who are building with MCP. Read or watch their conversation to learn more.

    My practical playbook for MCP: choose one high-signal use case, define clear success metrics, and run a tightly scoped pilot with visible executive sponsorship. Treat governance and data hygiene as first-class requirements. Close the loop weekly with qualitative insights from customer interviews and quantitative telemetry from experiments. Only then scale to adjacent workflows, keeping a steady focus on measurable customer value and repeatable delivery.

    Whether you’re an emerging startup or an established enterprise, the opportunity is the same: turn MCP curiosity into durable capability. With disciplined measurement, thoughtful stakeholder alignment, and a relentless outcomes mindset, MCP can become a lever for product management leadership—not just another acronym in the stack.


    Inspired by this post on Pendo – Best Practices.


    Book a consult png image
  • Commercial vs. Internal Products: Hard Truths, High Leverage, and How I Make the Call

    Commercial vs. Internal Products: Hard Truths, High Leverage, and How I Make the Call

    Internal Products Are Hard; Commercial Products Are Harder. That line captures years of hard-won lessons from leading both internal platforms and market-facing SaaS at HighLevel. I’ve seen how the two demand different muscles—even when the tech stack, talent, and timelines look the same on paper.

    When I talk about internal products, I mean services and solutions that our own employees use to take care of customers—customer-enabling tools and services, agent consoles, fulfillment and billing workflows, operations dashboards, and the underlying platforms that keep them fast, compliant, and resilient. These tools don’t generate revenue directly, but they quietly determine customer experience, gross margin, and how quickly we can ship, resolve issues, and scale.

    Commercial products, by contrast, add a second challenge layer. Beyond discovery, usability, and reliability, we must conquer positioning, pricing and packaging, competitive differentiation, sales enablement, procurement hurdles, and ongoing customer success motion. The surface area for failure is bigger, and the time-to-signal on product-market fit is slower and noisier.

    Here’s how I decide where to invest. First, I anchor on outcomes, not output. If the business priority is net revenue retention, faster onboarding, or reduced cost-to-serve, internal products often provide the highest-leverage path. If the priority is new revenue, new market entry, or a must-have differentiator, we lean commercial. I make the trade explicit in outcomes vs output OKRs so we can defend the decision when pressure mounts.

    Second, I run a clear build vs buy calculus. For internal needs, the default is buy if a mature, configurable solution exists that meets our security, data governance, and integration requirements. I only build when the workflow is core to our differentiation, the TCO of customization is lower than vendor sprawl, or we can capture unique proprietary advantage. For commercial products, I avoid embedding third-party IP in a way that caps differentiation or compresses margins as we scale.

    Third, I insist on continuous discovery. Internal audiences are not a captive market—they’re discerning experts with real jobs to do. I treat them like customers, with structured customer interviews, journey mapping, and opportunity solution trees. I rely on empowered product teams and product trios to validate problems and reduce solution risk before we commit engineering time.

    Fourth, I frame commercial vs internal work with capacity guardrails. In most planning cycles, I reserve explicit allocation for platform scalability and internal tooling, separate from feature bets. Without this, internal products become backlog filler, which guarantees we’ll pay the interest later in churn, SLA breaches, and slower delivery.

    Execution differs too. For internal products, change management is the make-or-break. I plan enablement as a first-class deliverable: clear rollouts, in-app guides, training, and feedback loops with frontline champions. I track adoption, time-to-resolution, error rate, and satisfaction for internal users with the same rigor we apply to external users.

    For commercial products, I design the discovery-to-GTM handshake early. Pricing and packaging must reflect value drivers discovered in research, not what’s easiest to meter. Sales and solutions engineering need crisp narratives, objection handling, and proof points. Customer success needs activation plans and health signals tied directly to leading indicators of retention.

    Across both, I instrument the product and process. I lean on feature flags and progressive delivery to manage risk, and I protect SLOs with error budgets so teams balance reliability with iteration speed. CI/CD isn’t a badge—it’s how we earn the right to ship continuously without eroding trust.

    Common pitfalls recur. Teams skip UX for employee tools because “they have to use it”—which backfires as shadow workflows and rework. Leaders underfund internal platforms, then wonder why velocity stalls. On the commercial side, teams over-index on features and under-invest in positioning and onboarding, leading to poor activation and elongated sales cycles.

    What’s the payoff? When we treat internal products as products, we unlock scale: shorter handling times, fewer escalations, clearer accountability, and higher customer satisfaction. When we approach commercial products with the same discovery rigor plus smart GTM, we compress time-to-value and amplify differentiation. The craft is knowing which lever to pull when—and having the discipline to measure what matters.

    My rule of thumb is simple. If the goal is operational excellence that compounds across the entire customer journey, invest in internal products with the same intensity you reserve for revenue-generating features. If the goal is market expansion or category leadership, invest in commercial products with a tight discovery-to-GTM loop. In either case, clarity of outcomes, disciplined discovery, and empowered teams win the day.


    Inspired by this post on SVPG.


    Book a consult png image
  • Never Stop Disrupting: Why the Fin API Platform Signals a New Era for Agentic AI

    Never Stop Disrupting: Why the Fin API Platform Signals a New Era for Agentic AI

    Disruption is the only sustainable strategy in product. When a platform meaningfully changes how we build and operate, I pay attention—not just as a product leader, but as someone accountable for turning AI Strategy into durable competitive differentiation. That’s why the launch of the Fin API platform stands out: it’s a concrete step toward agentic AI at enterprise scale.

    Today, I’m diving into what this launch includes, why it matters for product strategy, and how I’d navigate the build vs buy decision in this new landscape. My goal is to translate the announcement into actionable guidance for product teams, CX leaders, and forward-deployed engineers who are building the next generation of customer support and product-led experiences.

    Fin is a customer agent platform that at present resolves over 2M customer issues a week, growing at a rapid exponential pace. It’s relied on by the best brands, large and small, in every vertical you can imagine. From Atlassian and Riot Games, to smaller hot upstarts like Mercury and Polymarket. It runs on a family of models trained by its AI group. Last week, they announced Apex, which is the world’s first specialized customer service LLM. In production tests over the last 6 months, it beat every single frontier model, including those from Anthropic and OpenAI, on resolution rate, latency, hallucination rate, and cost.

    With this launch, teams can access the platform’s core capabilities and underlying models directly via API, with contracts starting at $250k per year, and usage rates that are by far the cheapest in the industry for each of the model’s subcategories. For leaders evaluating total cost of ownership, this is a meaningful data point: it shifts the economics of scaled automation from experimental to operational.

    Why now? Because builders want options. I hear from teams daily that want to design their own agents, tune prompts and policies, and integrate with bespoke CRMs, data lakes, and product surfaces. The Fin announcement meets that demand with three clear build-paths, each mapping to a different operating model and maturity stage.

    First, for the vast majority of companies, the Fin Agent Platform is the pragmatic starting point. Fin reports ~8k companies on it today. It addresses 99% of customer needs out of the box—without exhausting consulting engagements—while delivering top-tier resolution rates. If your priority is time-to-value, governance, and platform scalability, this route de-risks implementation and accelerates outcomes.

    Second, for teams that need custom surfaces or channels, the Fin Agent API lets you present Fin in unique contexts. You get the Fin platform’s orchestration and controls, but you’re free to bypass the default messenger, email, voice, or any prebuilt channel and embed the agent natively in your product. I see this as the sweet spot for product-led growth motions where conversation design and UX writing are strategic levers.

    Third, for companies building hyper-specific agents—think service plus in-product actions—the new API access to Apex and the broader collection of models is the obvious move. Unlike generalized models, these are purpose-trained for customer service scenarios and operational policies. If you have strong in-house solutions engineering, a retrieval-first pipeline, and eval-driven development in place, this path maximizes control without reinventing the model layer.

    This also opens the door for vertical specialists. Fin-like businesses focused on deep domains can emerge quickly—Fin for dentists? Why not? Fin for car dealerships? Sure. I expect startups and modern CX providers (including players like Decagon and Sierra) to carve out niches where domain data, workflows, and compliance are the real moats. That’s where differentiated AI beats generic capability.

    There’s a defensive reason to pay attention here. The software landscape is shifting fast: the moat is no longer feature parity—it’s the quality of your agents and the data flywheels powering them. Building software is simply less hard now, and I’ve watched engineering teams more than double measurable productivity as they adopt AI-assisted development. The implication is clear: the interface-and-features era is giving way to an agents-and-outcomes era.

    Serious software companies must evolve from being a features company to an agents company—and build those agents on differentiated AI. More value will accrue at the model and orchestration layers, where safety, latency, cost, and resolution quality are won. That puts a premium on prompt engineering discipline, policy routing, continuous discovery of edge cases, and rigorous offline/online evals to keep hallucination rates low while maintaining speed.

    How would I choose among the three build-paths? If you’re early or resource-constrained, start with the Fin Agent Platform to validate outcomes and align stakeholders. If you need branded experiences and tighter product integration, use the Fin Agent API to control surfaces without owning the heavy lifting. If you have strong ML ops and a mature customer support ai strategy, go model-level with Apex and companions, layering in your own guardrails, context window management, and test harnesses. In each case, balance velocity, control, and risk—your build vs buy decision should be grounded in clear metrics and an explicit product strategy.

    Where does this lead? We’ll see more companies expose specialized model families with clearer economics and stronger governance. For now, I’m excited to see what teams build with the Fin API platform—and how they turn agentic AI into measurable improvements in resolution rate, CSAT, cost-to-serve, and ultimately, customer loyalty.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • How We Built PR Review Bots In‑House for a Fraction of the Cost—and How You Can Too

    How We Built PR Review Bots In‑House for a Fraction of the Cost—and How You Can Too

    PR review bots are all the rage, but they cost a premium. We built our own for cheap that work just as well, if not better. Here's how.

    As a VP of Product Management, I care deeply about the velocity and quality of our software delivery. The decision to build our own pull request (PR) review agents came from a simple calculus: we needed tighter control over developer experience, CI/CD integration, and cost—without sacrificing accuracy or reliability. The result was a pragmatic system that accelerates reviews, improves code quality, and pays for itself through faster feedback loops.

    Before we wrote a line of code, we defined success. Our objectives were to shorten review cycles, reduce back-and-forth on style and test coverage, and surface risks earlier—measured against DORA metrics like lead time and deployment frequency. That focus aligned the team, guided our build vs buy decision, and anchored scope to the highest-impact use cases.

    We started rules-first, AI-optional. The initial release enforced guardrails that are universally valuable: linting and formatting checks, required test coverage thresholds, commit message standards, ownership validation (CODEOWNERS), and basic security scans. These automated gates eliminated predictable review friction, freeing engineers to focus on logic and architecture rather than style debates.

    Then we layered intelligence where it mattered. We added lightweight, explainable checks for common code smells and dependency risks, plus optional natural-language summaries that turn large diffs into concise context. Where appropriate, we introduced agentic AI workflows to triage PRs by risk, draft review comments, and suggest missing tests—always keeping humans in the loop. This hybrid approach kept costs low and outcomes high.

    Integration with our CI/CD pipeline was non-negotiable. We wired GitHub/GitLab webhooks to a stateless service that queued work, executed checks in containerized workers, and posted results back as status checks and review comments. Caching, parallelization, and smart diff-scoping ensured we only computed what changed, keeping the experience snappy even on large repos.

    Adoption hinged on developer experience. We made the bot’s feedback fast, specific, and actionable, with clear remediation steps and links to documentation. Feature flags allowed teams to opt into new checks gradually. ChatOps commands enabled quick overrides for emergencies, while policy-as-code kept rules visible, versioned, and auditable.

    We treated this like any product: eval-driven development for accuracy, ongoing telemetry for false-positive rates, and explicit SLAs for response times. We instrumented outcomes end-to-end—tracking PR cycle time, comment-to-merge ratios, and rework—so we could prove the ROI and tune the system without guesswork.

    The outcome: a reliable PR review companion that runs on a shoestring budget, integrates cleanly with our workflows, and measurably improves engineering throughput. If you’re weighing build vs buy, start small with rules that deliver immediate value, then layer intelligence where it earns its keep. With a clear product strategy, you can stand up capable PR review bots quickly—and scale them as your needs grow.

    If you’re ready to try this yourself, begin with your top three friction points in code reviews, wire them into your CI/CD checks, and pilot with a single team. Iterate weekly, measure relentlessly, and let your developers be your strongest signal. You’ll be surprised how far a pragmatic, product-led approach can take you.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image