Why does the post argue that speed improves software delivery safety?

The post argues that accumulating code creates risk, while shipping small batches minimizes it. Frequent deployments keep the feedback loop tight so engineers can validate changes while the context is still fresh.

How does the deployment pipeline move from merge to production in about 12 minutes?

After code is merged to GitHub, the Rails application is built into a deployable slug in about four minutes while CI runs in parallel. The slug is deployed to pre-production, checked by automated gates, and then promoted through a staggered rollout across the fleet.

What safeguards stop unsafe code from reaching production?

The pipeline verifies clean application boot, confirms CI has passed, and runs Datadog Synthetics against critical flows before production promotion. If any approval gate fails, the release is halted before it reaches customers.

How do feature flags reduce deployment risk?

Feature flags decouple deployment from release, allowing code to reach servers without immediately turning features on for everyone. The post says flags can target all customers, a subset, or no one in under 60 seconds, which shrinks the blast radius of change.

Why are engineers expected to stay present during deploys?

A 12-minute deployment cycle lets engineers remain focused on the change they just merged and validate it as it goes live. Slack notifications, observability links, and verification prompts help engineers watch production behavior instead of relying only on green builds.

How does the recovery model support high deployment frequency?

Because releases are small and previous versions remain available on virtual machines, the system can automatically roll back when heartbeat metrics drop or critical anomalies appear. Manual rollback is also available and locks the production pipeline while engineers investigate.

Why does the post argue that speed improves software delivery safety?

The post argues that accumulating code creates risk, while shipping small batches minimizes it. Frequent deployments keep the feedback loop tight so engineers can validate changes while the context is still fresh.

How does the deployment pipeline move from merge to production in about 12 minutes?

After code is merged to GitHub, the Rails application is built into a deployable slug in about four minutes while CI runs in parallel. The slug is deployed to pre-production, checked by automated gates, and then promoted through a staggered rollout across the fleet.

What safeguards stop unsafe code from reaching production?

The pipeline verifies clean application boot, confirms CI has passed, and runs Datadog Synthetics against critical flows before production promotion. If any approval gate fails, the release is halted before it reaches customers.

How do feature flags reduce deployment risk?

Feature flags decouple deployment from release, allowing code to reach servers without immediately turning features on for everyone. The post says flags can target all customers, a subset, or no one in under 60 seconds, which shrinks the blast radius of change.

Why are engineers expected to stay present during deploys?

A 12-minute deployment cycle lets engineers remain focused on the change they just merged and validate it as it goes live. Slack notifications, observability links, and verification prompts help engineers watch production behavior instead of relying only on green builds.

How does the recovery model support high deployment frequency?

Because releases are small and previous versions remain available on virtual machines, the system can automatically roll back when heartbeat metrics drop or critical anomalies appear. Manual rollback is also available and locks the production pipeline while engineers investigate.

The Safety of Speed: 180 Deploys a Day, 12‑Minute Releases, 99.8%+ Availability

Q: What are heartbeat metrics in this operating model?

Heartbeat metrics are customer-outcome signals that represent whether the product is delivering core value, such as the rate at which messages and comments are created. The post emphasizes that if customer outcomes drop, infrastructure dashboards being green is not enough.

“Speed is not the enemy of safety; it is the prerequisite for it.” I live by this principle. In our organization, the average time from merging code to it being used by customers in production is just 12 minutes, and that short window is fundamental to how we build, ship, and learn.

In January 2026, we are averaging 180 ships per workday – roughly 20 deployments every hour. Conventional wisdom suggests that to increase stability, you must slow down. I believe the opposite. Speed is not the enemy of safety; it is the prerequisite for it. Accumulating code creates risk; shipping small batches minimizes it. Shipping is our company’s heartbeat.

Maintaining this frequency while targeting 99.8+% availability has required over a decade of focused investment in systems, principles, and processes. We protect the integrity of our systems through three layers of defense: an automated pipeline that is simple, reliable, and removes the need for manual intervention, a shipping workflow that promotes ownership and uses guardrails as accelerants, and a recovery model that optimizes for mitigating inevitable failures. Here’s how we’ve built each layer so that velocity is our greatest source of stability.

While our platform consists of various services and frontend applications, I’ll focus here on our Ruby on Rails monolith. It is our core application and the one we deploy most frequently; we also deploy it to three different data‑hosting regions with independent pipelines. Our other services follow similar pipeline principles and safeguards, but the Rails monolith is the clearest example of how we ship at scale.

The automated pipeline is designed to move code from merge to production as fast as possible while enforcing strict safety checks. It is fully automated, and the vast majority of releases require no human intervention—critical for CI/CD at high deployment frequency.

Once an engineer merges code to GitHub, two things happen immediately. First, the build: we compile the Rails application and its dependencies into a deployable asset (a slug) in about four minutes. Second, parallel CI: our test suite runs alongside the build; through extensive optimization, parallelization, and test selection, the vast majority of CI builds finish in under five minutes.

As soon as the slug is built, it’s deployed to a pre‑production environment. CI does not block the progression of the slug to pre‑production. Deploying to pre‑production takes around two minutes. This environment serves no customer traffic, but it is connected to our production datastores, mirrors our production infrastructure variants (e.g., web serving, asynchronous worker), and is configured so that requests exercise the pre‑release code and workers.

Immediately after deployment, we run and await several automated approval gates. We verify that the application boots cleanly on hosts (boot test), confirm the parallel test suite passed (CI check), and execute functional synthetics using Datadog Synthetics on critical flows—such as loading or editing a Fin workflow. If any gate fails, the release is halted and does not go to production.

Once approved, we promote the code to thousands of large virtual machines. A deployment orchestrator triggers these deployments simultaneously, while a decentralized, staggered rollout avoids changing the state of the entire fleet at the same millisecond. Within each machine, a rolling restart mechanism removes a process with old code from the serving path, lets it drain gracefully, and replaces it with a fresh process running the new code. From the moment a deployment starts, first requests are served by new code within roughly two minutes, and the vast majority of the global fleet updates transparently within six minutes. When restarts trigger on every machine, production unblocks so the next deployment can begin.

We treat a stalled pipeline as a high‑priority incident. If the automated system rejects three consecutive release attempts, it pages an on‑call engineer. These are pre‑production blocks, but if the shipping lane stops moving, changes pile up—and our stability relies on building and shipping in small steps. The on‑call’s job is to restore flow so that tiny, safe, frequent updates continue to keep risk low.

Our shipping workflow is built on extreme ownership: tools assist, but the engineer is accountable for quality and the decision to merge. I insist that you are present when you ship. The practical benefit of a 12‑minute deployment cycle is that engineers remain in the zone, focused on the problem they just solved, and ready to validate behavior as it goes live.

Stylized rocket launch piercing dramatic cumulus clouds at sunrise, glowing vapor trail symbolizing fast yet controlled delivery; overlaid headline text reads 'The safety of speed' in the sky. — A rocket lifts into a luminous sky, a metaphor for shipping code fast without breaking things, where precision, automation, and guardrails power 180 safe deployments a day.

To support this, our deployment system sends Slack notifications the moment code is submitted and as it advances through stages, embeds direct observability links to relevant dashboards and logs in every PR and message, and prompts verification so engineers actively watch the dials and test features in production. It is not acceptable to rely on green builds. You’re expected to watch your change go live and if you’re not prepared to rollback, you’re not prepared to ship. We maintain a no‑blame culture: quick rollbacks and immediate reverts are signs of vigilance and ownership, not failure.

We make extensive use of feature flags to turn deployment into a non‑event. By decoupling deployment (moving code to servers) from release (turning features on), we shrink the blast radius of change. Flags can be enabled for all customers, a specific subset, or disabled for everyone in under 60 seconds through our backend UI. Engineers can group flags into beta features and run phased rollouts; we also ensure flags work consistently across non‑monolith applications. In the past three months, we created over 560 flags—and we actively manage them to avoid permanent complexity.

For complex refactors—especially when behavior should not change—we leverage GitHub Scientist, an open‑source experimentation library. It runs candidate logic (new code) in parallel with existing logic (old code) in production, instruments both paths for result and timing comparisons, and keeps existing behavior user‑visible. That means we can iterate on and validate new code under real load without risking the experience, then switch seamlessly when confident.

When engineers need to go deeper before merging, they can generate a slug and deploy it to a virtual machine, detaching a running production host from the serving path and connecting for manual testing. They can also put a pre‑release slug on a serving machine that handles a small percentage of jobs or web requests. Single‑host validation lets us slice observability to those hosts, compare against the main release, and make low‑level changes safer. Staging is a simulation; production is reality. Testing on a single production host validates assumptions with real‑world data without risking the fleet.

Our recovery model starts from a simple principle: stop monitoring systems; start monitoring outcomes. Traditional monitoring tells you if a server is healthy; we care whether customers are healthy. We rely on heartbeat metrics—vital signs that represent the core value our product provides—such as the rate at which messages and comments are created.

Unlike standard uptime checks, heartbeat metrics are binary in spirit. If message send rates dip below baseline, it does not matter if infrastructure dashboards are green. Down is down, and if customers can’t do their job, uptime percentages are irrelevant. By tracking real‑world success rates as a high‑level signal, we catch subtle degradations that traditional alerting either misses or over‑alerts on.

Because we ship in small, incremental steps and maintain previous releases on our virtual machines, our Time to Recover (TTR) is generally very fast. If a heartbeat metric drops or a critical anomaly is detected right after a ship, the system can trigger an automatic rollback, reverting to the release that was running 20 minutes ago—often restoring service before an engineer responds. For complex issues, engineers can initiate a manual rollback through our deployment UI; doing so also locks the production pipeline to prevent further releases while we investigate and remove problematic code.

Resumption of service is not the end. Every incident prompts an incident review, and we don’t just fix the bug. We ask, “How did the machine allow this to happen?” Then we harden the system so it cannot happen again. This loop—fast shipping, fast recovery, rigorous learning—compounds resilience over time.

This operating model aligns to DORA metrics: high deployment frequency, short lead time for changes, low change failure rate, and rapid time to restore service. It’s a CI/CD and SRE‑informed approach that converts speed into a defensive advantage rather than a liability.

Shipping 180 times a day isn’t a vanity metric; it’s a deliberate choice to protect the customer experience. With a 12‑minute window from code to customer, the feedback loop is tight and engineers retain context—and accountability—for the immediate impact of their work. Maintaining this pace requires more than fast CI; it requires judgment, extreme ownership, disciplined use of feature flags, and a recovery model that monitors outcomes. We rely on human expertise, augmented by these layers of defense, to catch issues before they turn into customer pain. We don’t ship fast despite our need for stability; we ship fast to stay in control of change.

Inspired by this post on The Intercom Blog.