Category: IT Leadership

The Safety of Speed: 180 Deploys a Day, 12‑Minute Releases, 99.8%+ Availability

“Speed is not the enemy of safety; it is the prerequisite for it.” I live by this principle. In our organization, the average time from merging code to it being used by customers in production is just 12 minutes, and that short window is fundamental to how we build, ship, and learn.

In January 2026, we are averaging 180 ships per workday – roughly 20 deployments every hour. Conventional wisdom suggests that to increase stability, you must slow down. I believe the opposite. Speed is not the enemy of safety; it is the prerequisite for it. Accumulating code creates risk; shipping small batches minimizes it. Shipping is our company’s heartbeat.

Maintaining this frequency while targeting 99.8+% availability has required over a decade of focused investment in systems, principles, and processes. We protect the integrity of our systems through three layers of defense: an automated pipeline that is simple, reliable, and removes the need for manual intervention, a shipping workflow that promotes ownership and uses guardrails as accelerants, and a recovery model that optimizes for mitigating inevitable failures. Here’s how we’ve built each layer so that velocity is our greatest source of stability.

While our platform consists of various services and frontend applications, I’ll focus here on our Ruby on Rails monolith. It is our core application and the one we deploy most frequently; we also deploy it to three different data‑hosting regions with independent pipelines. Our other services follow similar pipeline principles and safeguards, but the Rails monolith is the clearest example of how we ship at scale.

The automated pipeline is designed to move code from merge to production as fast as possible while enforcing strict safety checks. It is fully automated, and the vast majority of releases require no human intervention—critical for CI/CD at high deployment frequency.

Once an engineer merges code to GitHub, two things happen immediately. First, the build: we compile the Rails application and its dependencies into a deployable asset (a slug) in about four minutes. Second, parallel CI: our test suite runs alongside the build; through extensive optimization, parallelization, and test selection, the vast majority of CI builds finish in under five minutes.

As soon as the slug is built, it’s deployed to a pre‑production environment. CI does not block the progression of the slug to pre‑production. Deploying to pre‑production takes around two minutes. This environment serves no customer traffic, but it is connected to our production datastores, mirrors our production infrastructure variants (e.g., web serving, asynchronous worker), and is configured so that requests exercise the pre‑release code and workers.

Immediately after deployment, we run and await several automated approval gates. We verify that the application boots cleanly on hosts (boot test), confirm the parallel test suite passed (CI check), and execute functional synthetics using Datadog Synthetics on critical flows—such as loading or editing a Fin workflow. If any gate fails, the release is halted and does not go to production.

Once approved, we promote the code to thousands of large virtual machines. A deployment orchestrator triggers these deployments simultaneously, while a decentralized, staggered rollout avoids changing the state of the entire fleet at the same millisecond. Within each machine, a rolling restart mechanism removes a process with old code from the serving path, lets it drain gracefully, and replaces it with a fresh process running the new code. From the moment a deployment starts, first requests are served by new code within roughly two minutes, and the vast majority of the global fleet updates transparently within six minutes. When restarts trigger on every machine, production unblocks so the next deployment can begin.

We treat a stalled pipeline as a high‑priority incident. If the automated system rejects three consecutive release attempts, it pages an on‑call engineer. These are pre‑production blocks, but if the shipping lane stops moving, changes pile up—and our stability relies on building and shipping in small steps. The on‑call’s job is to restore flow so that tiny, safe, frequent updates continue to keep risk low.

Our shipping workflow is built on extreme ownership: tools assist, but the engineer is accountable for quality and the decision to merge. I insist that you are present when you ship. The practical benefit of a 12‑minute deployment cycle is that engineers remain in the zone, focused on the problem they just solved, and ready to validate behavior as it goes live.

A rocket lifts into a luminous sky, a metaphor for shipping code fast without breaking things, where precision, automation, and guardrails power 180 safe deployments a day.

To support this, our deployment system sends Slack notifications the moment code is submitted and as it advances through stages, embeds direct observability links to relevant dashboards and logs in every PR and message, and prompts verification so engineers actively watch the dials and test features in production. It is not acceptable to rely on green builds. You’re expected to watch your change go live and if you’re not prepared to rollback, you’re not prepared to ship. We maintain a no‑blame culture: quick rollbacks and immediate reverts are signs of vigilance and ownership, not failure.

We make extensive use of feature flags to turn deployment into a non‑event. By decoupling deployment (moving code to servers) from release (turning features on), we shrink the blast radius of change. Flags can be enabled for all customers, a specific subset, or disabled for everyone in under 60 seconds through our backend UI. Engineers can group flags into beta features and run phased rollouts; we also ensure flags work consistently across non‑monolith applications. In the past three months, we created over 560 flags—and we actively manage them to avoid permanent complexity.

For complex refactors—especially when behavior should not change—we leverage GitHub Scientist, an open‑source experimentation library. It runs candidate logic (new code) in parallel with existing logic (old code) in production, instruments both paths for result and timing comparisons, and keeps existing behavior user‑visible. That means we can iterate on and validate new code under real load without risking the experience, then switch seamlessly when confident.

When engineers need to go deeper before merging, they can generate a slug and deploy it to a virtual machine, detaching a running production host from the serving path and connecting for manual testing. They can also put a pre‑release slug on a serving machine that handles a small percentage of jobs or web requests. Single‑host validation lets us slice observability to those hosts, compare against the main release, and make low‑level changes safer. Staging is a simulation; production is reality. Testing on a single production host validates assumptions with real‑world data without risking the fleet.

Our recovery model starts from a simple principle: stop monitoring systems; start monitoring outcomes. Traditional monitoring tells you if a server is healthy; we care whether customers are healthy. We rely on heartbeat metrics—vital signs that represent the core value our product provides—such as the rate at which messages and comments are created.

Unlike standard uptime checks, heartbeat metrics are binary in spirit. If message send rates dip below baseline, it does not matter if infrastructure dashboards are green. Down is down, and if customers can’t do their job, uptime percentages are irrelevant. By tracking real‑world success rates as a high‑level signal, we catch subtle degradations that traditional alerting either misses or over‑alerts on.

Because we ship in small, incremental steps and maintain previous releases on our virtual machines, our Time to Recover (TTR) is generally very fast. If a heartbeat metric drops or a critical anomaly is detected right after a ship, the system can trigger an automatic rollback, reverting to the release that was running 20 minutes ago—often restoring service before an engineer responds. For complex issues, engineers can initiate a manual rollback through our deployment UI; doing so also locks the production pipeline to prevent further releases while we investigate and remove problematic code.

Resumption of service is not the end. Every incident prompts an incident review, and we don’t just fix the bug. We ask, “How did the machine allow this to happen?” Then we harden the system so it cannot happen again. This loop—fast shipping, fast recovery, rigorous learning—compounds resilience over time.

This operating model aligns to DORA metrics: high deployment frequency, short lead time for changes, low change failure rate, and rapid time to restore service. It’s a CI/CD and SRE‑informed approach that converts speed into a defensive advantage rather than a liability.

Shipping 180 times a day isn’t a vanity metric; it’s a deliberate choice to protect the customer experience. With a 12‑minute window from code to customer, the feedback loop is tight and engineers retain context—and accountability—for the immediate impact of their work. Maintaining this pace requires more than fast CI; it requires judgment, extreme ownership, disciplined use of feature flags, and a recovery model that monitors outcomes. We rely on human expertise, augmented by these layers of defense, to catch issues before they turn into customer pain. We don’t ship fast despite our need for stability; we ship fast to stay in control of change.

Inspired by this post on The Intercom Blog.

January 26, 2026

Real-Time Analytics for Financial-Services Contact Centers

Your contact center can have excellent reporting and still react too late. A weekly chart may explain why transfers rose, authentication failed, or members called again. It cannot recover the interaction that is already going wrong.

That is the practical case for real-time analytics in financial services: detect a useful signal while there is still time to change the outcome, then deliver a safe action to the person or system that can take it. The goal is not a faster dashboard. It is a shorter path from behavior to decision to resolution.

Key takeaways

Define real time against the decision window. A signal is timely only if it arrives before the next useful action expires.
Start with journeys that create material cost or dissatisfaction, such as lost cards, fraud disputes, loan-status requests, password resets, and payment issues.
Instrument the outcome as carefully as the interaction. Otherwise, you can see that an alert fired without knowing whether it helped.
Activate insights inside routing, agent, supervisor, and follow-up workflows. A separate analytics destination creates another queue for people to monitor.
Measure resolution, repeat demand, and guardrails. Activity metrics such as alerts generated or prompts displayed are diagnostics, not business outcomes.
Build privacy controls, consent handling, access restrictions, and auditability into the decision loop before expanding its reach.

Define real time as a decision contract

Real time is not a universal refresh rate. It is a promise that a signal will reach its decision point while an effective response is still possible. An agent-assist prompt must arrive before the conversation moves past the relevant step. A routing signal must arrive before the interaction enters the wrong queue. A proactive follow-up must arrive before the member has to contact you again.

This distinction prevents an expensive architecture mistake: streaming every event without deciding what any event should change. Some information needs immediate activation. Some belongs in a supervisor review. Some is useful only for longer-term journey redesign. Treating all three as equally urgent increases cost and noise without improving service.

Before building a pipeline, write a decision contract for each use case. The contract should connect the signal to an owner, action, deadline, guardrail, and measurable outcome.

Decision-contract field	Question to answer	Illustrative fraud-routing example
Trigger	What observable event or state starts the decision?	A potential fraud signal appears during an active interaction.
Decision	What choice becomes possible because of the signal?	Whether the interaction should receive specialized handling.
Action	What should the workflow do?	Prioritize the appropriate route and carry the available context forward.
Owner	Who or what is accountable for acting?	The routing workflow, with a supervisor responsible for defined exceptions.
Action window	When does the intervention stop being useful?	Before the interaction is transferred or the relevant verification step is completed.
Guardrail	What must never be bypassed?	Required compliance steps, authorized data access, and a clear human override.
Outcome	How will you know whether the action helped?	Resolution without an avoidable transfer, escalation, or repeat contact.

A contract also exposes weak use cases early. If nobody can name the action, the signal is probably reporting data rather than real-time decision data. If the action has no owner, it will become an ignored alert. If the outcome is merely that a prompt appeared, the team has confused delivery with impact.

The underlying platform still needs to bring together behavior across voice, chat, IVR, email, and in-app journeys. But unification is useful only when identity, journey state, and timing remain coherent across those channels. A member who fails authentication in the app and then calls should not look like two unrelated problems.

Instrument five costly journeys before the whole contact center

A complete contact-center data program is too broad a starting point. It invites months of taxonomy work before anyone changes an outcome. Begin with the five journeys most likely to concentrate cost or dissatisfaction: lost card, fraud dispute, loan status, password reset, and payment issue.

This is not a mandate to automate all five at once. Rank them using the evidence you already have: contact demand, transfers, repeat contacts, unresolved cases, authentication failures, and escalations. Choose the journey where a specific intervention is both valuable and operationally feasible.

For the chosen journey, create an outcome card before defining events:

Member intent: What is the person actually trying to complete?
Observable start: Which event shows that the journey has begun?
Resolution state: What evidence means the need was completed, not merely that the interaction ended?
Failure states: Where can authentication, routing, handoff, self-service, or follow-up break down?
Intervention: Which failure can the contact center change while the journey is active?
Outcome and guardrails: Which result should move, and which compliance or experience measures must not deteriorate?

The event model should then describe the journey rather than mirror the screens of each tool. At minimum, preserve a pseudonymous member reference, interaction reference, channel, event time, journey, journey step, authentication state, transfer or escalation state, intervention, and outcome. If intent or risk is inferred, record the version and confidence associated with that inference. If an agent accepts, dismisses, or overrides guidance, capture that response too.

Consistent definitions matter more than a large event count. Decide what a transfer is, when a new contact belongs to an existing journey, and what qualifies as resolution. Version those definitions. Otherwise, a changed IVR flow or CRM configuration can appear to improve performance simply because the instrumentation changed.

Instrument the negative space as well. If the member disappears from a self-service flow, the absence of a completion event is not enough to explain why. Capture the last meaningful step, the failure category when it is available, and whether the member moved to another channel. That is how you distinguish successful deflection from abandonment followed by a call.

Do not copy every transcript, recording, credential, or financial value into a broadly accessible analytics stream merely because the technology allows it. Use minimized attributes and controlled references where they are sufficient. Keep restricted evidence behind narrower permissions. Availability is not the same as permission.

Put the decision inside the workflow

The last mile determines whether real-time analytics changes performance. An insight that requires an agent to open another application, interpret a graph, and decide what it means has already lost much of its value. Activation belongs in the systems where agents, supervisors, and automated workflows already act.

Four activation patterns cover most of the useful surface area:

Routing: Use intent, journey state, or a potential risk signal to direct the interaction to the appropriate skill. High-risk transactions can be prioritized for specialized handling, but the signal should not silently become a final financial or fraud decision.
Agent guidance: Surface the next relevant step, missing compliance action, or known journey context during the interaction. Explain why the guidance appeared, avoid conflicting prompts, and give the agent a defined way to dismiss or override it.
Supervisor intervention: Alert on a material pattern with an attached playbook. The notification should identify what changed, which interactions are affected, which action is available, and when the alert expires.
Member follow-up: Trigger a relevant message or next step after an unresolved interaction. The follow-up should close a known gap, not merely create another generic communication.

Self-service requires particular care. If balance inquiries or password resets are overwhelming queues, routing eligible demand to self-service may help. But containment is not the same as resolution. Measure whether the member completed the task and whether another contact followed. A journey that exits the IVR but returns through chat has changed channels, not disappeared.

Each activation needs a safe fallback. If identity is uncertain, the signal is stale, or a dependency is unavailable, revert to the normal approved workflow. Do not let a broken analytics path invent a route or compliance step. Log the fallback so operational teams can distinguish a bad recommendation from a recommendation that never reached its destination.

Alert design deserves the same product discipline as customer-facing design. Deduplicate repeated signals, suppress guidance after the relevant action window, and route exceptions to a named owner. A queue full of low-value alerts trains people to ignore the important ones.

The technology choice comes after these workflow requirements. CRM integration should carry member and journey context forward, while the analytics layer captures behavior and evaluates interventions. Products such as Amplitude, Pendo, and Intercom may instrument digital touchpoints, but the build-versus-buy decision should turn on your decision contracts: identity reconciliation, activation latency, workflow integrations, experimentation, access control, auditability, and operational reliability.

I would not approve a platform solely because its dashboards are polished. Ask the vendor or internal platform team to demonstrate an end-to-end loop using one of your journeys: signal received, decision evaluated, workflow changed, outcome captured, and audit record produced. That sequence is the product you are buying or building.

Measure outcomes, experiment carefully, and govern the loop

Real-time analytics does not reduce operating cost by itself. It changes a decision, which changes a journey, which may change demand and resolution. Your measurement model has to preserve that chain.

Use a scorecard that separates outcomes from activity

Choose a primary outcome that matches the journey. Useful candidates include first-contact resolution, repeat-contact reduction, containment, and average time to resolution. Define the eligible population and exclusions explicitly so the metric cannot drift when channel mix changes.

Then organize the remaining measures by purpose:

Journey outcome: Was the member’s need resolved, and did it stay resolved?
Operational mechanism: Did transfers, escalations, routing failures, or authentication failures change?
Intervention delivery: Was the recommendation generated, delivered in time, accepted, dismissed, or overridden?
Experience and compliance guardrails: Were required steps completed, and did complaints, corrections, or manual exceptions increase?
System health: Was the signal complete, timely, correctly joined to the journey, and available when the workflow needed it?

Average handle time can be diagnostic, but it should not become the automatic objective. A shorter interaction that leaves the member unresolved may simply move cost into a repeat contact. Resolution and repeat demand tell you whether the system removed work or postponed it.

Test the intervention, not the existence of the data

Controlled experiments can show whether a changed IVR path, authentication step, or post-contact follow-up improves the chosen outcome. Define the minimum detectable effect before the test so the team knows which improvement would justify a decision and whether the eligible volume can support a useful result.

Choose the unit of assignment deliberately. If the same member can return during the measurement window, assigning different experiences by interaction can contaminate the comparison. A member-level assignment may be cleaner. If the intervention changes an entire queue or supervisor workflow, individual assignment may be impractical; use a rollout design that reflects how the operation actually works.

Do not randomize away mandatory compliance controls. When an intervention affects fraud handling, sensitive disclosures, or consequential routing, begin in observe-only mode, review false positives and overrides, and use an approved rollout. Experiment with the delivery or operational design only where compliance and legal owners confirm that variation is permissible.

Make governance part of the product

Privacy and compliance cannot sit downstream of activation. A real-time system makes decisions from live member behavior, so access controls, consent management, and audit trails belong in the initial architecture.

For every decision contract, document the permitted purpose of the data, who can access it, where it is retained, how consent is honored, what enters the audit record, and who approves changes. Do not infer that an attribute is lawful to use because it exists in the CRM. The relevant compliance and legal owners must determine acceptable use for the jurisdiction, product, and member context.

Auditability should reach beyond data access. Preserve enough context to reconstruct what signal arrived, which rule or model version evaluated it, what action was recommended, what the workflow did, whether a person overrode it, and what outcome followed. That record supports incident investigation, performance review, and defensible change management.

Run the operating cadence through a product trio spanning operations, data, and compliance. In each review, ask which decisions fired, which arrived too late, which actions were ignored, which outcomes changed, and which guardrails moved. Retire noisy signals. Refine ambiguous definitions. Promote successful interventions gradually. This keeps the program focused on decision quality instead of dashboard volume.

Your next step is small and concrete: choose the highest-cost or highest-friction journey among the initial five, write its decision contract, and run the signal in observe-only mode. When the team can trace the path from trigger to approved action to outcome, activate the narrowest useful intervention. Expand only after that loop is measurable, reliable, and governable.

References

Shivam.Consulting Blog – Stop Drowning in Dashboards: Real-Time Digital Analytics for Finserv Contact Centers

January 23, 2026

Why Codeless Product Analytics Wins: Faster Insights, Fewer Bottlenecks, Bigger PLG Results

Every quarter, I watch product teams move from gut feel to data-informed decisions—until instrumentation bottlenecks slow them to a crawl. That’s why I’ve become an advocate for codeless analytics: it removes the dependency on engineering sprints for basic event tracking and lets teams answer product questions in hours, not weeks.

We explain what codeless analytics are, why (and how) Pendo supports them, plus responses to the top three myths about low-code/no-code solutions.

Here’s how I frame it with my teams: codeless analytics enables product managers, designers, and customer success to tag features visually, track interactions, and analyze adoption without shipping code. The goal isn’t to replace engineered events; it’s to accelerate discovery, speed up iteration, and reduce context-switching for developers. In practice, this means cleaner prioritization, faster validation of hypotheses, and tighter product-led growth loops.

Why Pendo? In my experience, Pendo’s codeless model shortens the distance from question to insight. Visual tagging makes event setup accessible, in-app guides and product tours let us experiment with onboarding and activation, and governance controls ensure data remains trustworthy across teams. The result is a unified analytics approach where we reserve custom instrumentation for complex logic while using codeless tracking for everyday product questions.

Let’s address the top three myths I hear most often. Myth 1: “No-code is only for simple use cases.” In reality, most decisions we make weekly—feature adoption, path analysis, funnel drop-offs, and retention analysis—do not require custom code. Codeless analytics handles these well, and when we need deeper context (like server-side events), we complement it with engineered tracking. It’s a both/and, not an either/or.

Myth 2: “Codeless data isn’t accurate.” Accuracy comes from governance, not the method. I set clear standards: naming conventions, tagging reviews, ownership, and periodic audits. With disciplined process, codeless tracking yields consistent, decision-grade data. The added benefit is visibility—non-technical stakeholders can validate the instrumentation themselves, reducing misalignment.

Myth 3: “Engineers must instrument everything to scale.” Engineering time is precious; we should spend it on differentiated capabilities, not on routine click tracking. Codeless analytics scales by empowering product teams to self-serve, while engineering focuses on back-end, performance, and edge cases. When paired with a unified analytics platform and clear data contracts, this model scales cleanly across product lines.

For teams adopting this approach, I recommend a simple operating model: define your core product questions up front, tag features aligned to those questions, connect insights to in-app guides for experiments, and measure user activation and retention continuously. Whether you run Pendo alongside Amplitude analytics or within a broader unified analytics platform, the key is to keep the insight-to-action loop tight.

The future of product analytics is codeless because it puts insights where they belong—directly in the hands of the people designing the experience. When we remove bottlenecks, we learn faster, ship smarter, and drive measurable PLG impact. That’s how we turn product analytics from a reporting function into a competitive advantage.

Inspired by this post on Pendo – Best Practices.

January 22, 2026
I Built a ‘Pendo Wrapped’ in 10 Minutes with Pendo MCP to Boost Adoption and Delight Users

I set out to create a lightweight, high-impact “Pendo Wrapped” experience for our users—and I did it in under 10 minutes with Pendo MCP. As a VP of Product Management, I’m constantly looking for fast, pragmatic ways to turn product insights into moments that drive engagement. This experiment was about transforming raw analytics into a concise, celebratory year‑in‑review that motivates customers to explore more value. When I say “Pendo Wrapped,” I mean a simple, narrative-style summary of usage highlights: what got adopted, which moments mattered, and where value showed up most clearly. Framed well, that story reinforces product‑led growth by reminding users why they chose us, nudging them toward the next best action, and strengthening activation and retention without heavy development work. My approach was straightforward: define a clear objective (celebrate milestones and prompt the next step), choose a focused set of metrics (adoption, engagement, and activation), and target relevant segments. Then I layered the narrative on top of existing analytics using in‑app guides and product tours to deliver the experience where it matters most—inside the product. The reason it took minutes, not hours, is that Pendo MCP let me work with what we already had—segments, saved reports, and proven guide templates—so I could spend time on the story, not the scaffolding. No code, minimal configuration, and a crisp call to action made it feel polished without being heavy. Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement. If you want to replicate this quickly, start by selecting one user segment and three metrics that matter to them, write a two‑sentence narrative that connects those metrics to outcomes, and ship a short in‑app guide with a single, purposeful CTA. That’s enough to deliver a personalized year‑in‑review feel and spark immediate exploration—no new infrastructure required. What surprised me most was how a small, story‑driven touch created outsized alignment across customers and internal teams. It turned analytics into advocacy, reminded our users of the value they’re already getting, and opened the door to deeper adoption. If you’re pursuing product‑led growth, a fast “Pendo Wrapped” is one of the highest‑leverage experiments you can run this week.

Inspired by this post on Pendo – Perspectives.

January 15, 2026
4 Proven Ways to Keep Employees Informed and Engaged—from Onboarding to Lasting Adoption

Keeping employees informed and engaged isn’t just a communications challenge—it’s a product challenge. When we treat internal tools like products with clear activation moments, measurable outcomes, and continuous discovery, adoption moves from hope to habit. Over the years, I’ve seen small changes in how we onboard, communicate, and measure compound into dramatically higher engagement, better compliance, and faster time-to-value.

“How to improve onboarding, compliance, and internal communications within your employee tools.” That question guides my approach end to end—from the moment someone logs in for the first time to the day they become an expert, championing best practices across their team.

First, I personalize onboarding to accelerate user activation. I map the critical first actions and design a lightweight sequence of product tours and in-app guides that surfaces only what matters right now. Progressive disclosure, clear UX writing, and thoughtful tooltip design reduce cognitive load. I measure time-to-first-value, A/B test checklist microcopy to remove friction, and use Intercom or Pendo to deliver contextual walkthroughs by role, location, and permission level. Amplitude analytics helps me validate that the guided path leads to the intended activation event and sustained usage.

Second, I make compliance effortless and measurable. Instead of long trainings, I embed micro-learnings and policy nudges directly in the flow of work, with just-in-time prompts and short, scenario-based confirmations. I segment by role to avoid alert fatigue and localize where regulations require nuance. Completion rates, quiz accuracy, and time-to-complete are tracked alongside qualitative feedback. When compliance messaging underperforms, I run A/B testing on tone, timing, and format, then iterate until adherence is both higher and faster.

Third, I orchestrate internal communications as lifecycle messaging—not announcements. Employees get targeted release notes, role-specific tips, and in-app reminders aligned to their stage: new, adopting, proficient, or champion. I avoid channel sprawl by making the primary source of truth available in the product, then reinforcing it via email or chat only when necessary. CRM integration and audience rules ensure relevance, while a champions network and office hours create human touchpoints that deepen trust and accelerate adoption.

Fourth, I close the loop with analytics and continuous discovery. I instrument key events and run retention analysis to understand which behaviors predict long-term engagement. I look at cohorts before and after a new guide or product tour, and I compare lift in user activation and feature adoption over 14-, 28-, and 90-day windows. Amplitude analytics provides the behavioral picture; surveys, interviews, and passive feedback widgets explain the why. Together, these inputs power a product-led growth approach for internal tools—observable, repeatable, and improvable.

When teams ask where to start, I pilot one persona, one workflow, and one high-value outcome. I define the activation event, instrument it, launch a single targeted in-app guide through Pendo or Intercom, and A/B test the onboarding microcopy. Two weeks later, I review retention cohorts and completion data, talk to users, and either scale the pattern or iterate. That cadence builds credibility quickly because it ties every communication to a measurable result.

The payoff is tangible: faster onboarding, higher compliance, clearer internal communications, and employees who feel supported rather than overwhelmed. With disciplined messaging, smart instrumentation, and ongoing discovery, we can turn internal tools into catalysts for performance—and transform engagement from a campaign into a culture.

Inspired by this post on Pendo – Best Practices.

January 6, 2026

How to Build AI-Enabled Cybersecurity Operations Safely

You have an alert queue full of low-context signals, analysts spending time assembling evidence, and pressure to show that AI can improve the operation. The tempting move is to add a copilot to the security console and call the problem solved.

The harder leadership decision is where AI may influence a security decision, where it may take action, and how you will know it is helping. The right goal is not an autonomous security operations center. It is a shorter, more reliable path from signal to containment, with explicit limits on what a model can do.

Design the decision loop before choosing the AI

AI-enabled cybersecurity operations are easier to manage when you separate three capabilities that vendors often bundle together:

Detection models identify patterns, anomalies, or risk signals in security telemetry.
Generative AI explains evidence, summarizes an incident, retrieves a relevant playbook, and proposes a next action.
Orchestration performs a deterministic operation such as collecting evidence, updating a ticket, isolating an endpoint, or rotating a credential.

These components should not share the same authority. An anomaly score is not proof of compromise. A fluent explanation is not an approved response. A tool call is not safe merely because the model produced valid syntax.

Map the operational loop before you evaluate a model:

Observe: collect the endpoint, identity, network, and application signals relevant to the use case.
Detect: rank suspicious activity without hiding the underlying evidence.
Enrich: add asset criticality, identity context, recent changes, and the applicable response procedure.
Decide: show the recommended action, its prerequisites, and the reason for escalation.
Act: send the approved instruction to deterministic automation with narrowly scoped permissions.
Learn: record the analyst’s disposition, edits, approval, execution result, and any reversal.

For each stage, name the owner, permitted inputs, expected output, failure mode, and fallback. If the AI service becomes unavailable, established detections and response paths should continue to work. If the model produces a poor recommendation, an analyst should be able to reject it without fighting the workflow.

This map is also the product specification. It gives security engineering, SRE, product management, and risk owners a shared object to review. It prevents the initiative from collapsing into a feature list such as summarization, chat, and automation without a defined operational result.

Start with one detection decision, not another alert stream

A strong first use case has frequent decisions, usable feedback, and enough context to evaluate the model. It should improve an existing analyst workflow instead of creating a separate queue that someone must remember to check.

Behavioral models can examine endpoint telemetry, identity signals, and network flows to find activity that fixed signatures may miss. The useful product is not the anomaly itself. It is a ranked case that tells the analyst what changed, which evidence drove the score, what asset or identity is exposed, and what decision is required.

Use these criteria to choose the first workflow:

The decision is specific. “Investigate unusual authentication behavior for a privileged identity” is testable. “Use AI to detect threats” is not.
The evidence is available at decision time. If analysts must leave the workflow and search several systems before judging the recommendation, the AI is working with incomplete context.
The disposition is captured. Confirmed threat, benign activity, insufficient evidence, and duplicate are more useful than a generic closed status.
The existing path remains visible. Analysts should be able to compare the AI-ranked case with the evidence they already trust.
A wrong answer is recoverable. Begin with prioritization and investigation support, not an irreversible action.

Do not treat a smaller alert queue as proof of better detection. A model can reduce noise by suppressing useful signals. Measure precision and recall together: precision asks how much surfaced work was relevant, while recall asks how much relevant activity the workflow found. Because missed incidents may become visible only later, define how labels will be corrected when an investigation changes the original disposition.

Mean time to detect also needs a precise starting point. Decide whether the clock begins when the event occurs, when telemetry reaches the platform, or when an existing control first observes it. Otherwise, a faster model can appear to improve detection while ingestion or analyst queue time remains untouched.

The launch question is therefore not “Did the model find anomalies?” Ask whether it moved the right cases forward sooner, preserved the evidence needed for judgment, and avoided pushing material risk below the analyst’s line of sight.

Give the response copilot context, not unchecked authority

Incident response is a natural place for generative AI because analysts repeatedly assemble timelines, summarize evidence, search runbooks, draft ticket updates, and prepare remediation steps. Those tasks are language-heavy, but the actions they inform can disrupt production or destroy evidence.

Use a retrieval-first flow for response recommendations:

Retrieve the approved playbook and the version that applies to the incident type.
Assemble the facts the model is permitted to see, including the alert evidence and relevant asset context.
Generate a recommendation tied to a named playbook step rather than relying on the model’s general memory.
Check prerequisites, identity permissions, environment, and action scope through policy code outside the model.
Present the evidence, proposed action, expected impact, and rollback path to the designated approver.
Execute the approved operation through a deterministic orchestration layer.
Log the retrieved material, prompt, output, approval, tool arguments, result, and subsequent reversal or escalation.

This architecture makes an important distinction: the model can propose an action, but policy and people grant authority. The model should never be able to expand its own permissions or substitute a different tool when the approved operation fails.

An authority ladder gives that distinction operational force. Use the following as a starting policy and adapt it to the blast radius of your environment:

Action class	Examples	AI role	Required control
Read-only support	Summarize evidence, retrieve a runbook, collect approved diagnostics	Generate or execute within a fixed scope	Least-privilege access, complete logging, and no mutation permissions
Reversible operational change	Update a ticket, isolate an endpoint, rotate a credential	Recommend and prepare the action	Named human approval, validated target, impact warning, and tested rollback
High-blast-radius or irreversible change	Block a production network segment, alter broad access policy, delete data or evidence	Explain and escalate only	Incident command process and approval from the responsible system owner

Endpoint isolation can interrupt legitimate work. Credential rotation can break services when dependencies are unknown. Deleting data can permanently remove forensic evidence. Put those consequences beside the approval button, and provide a safe alternative such as collecting more evidence or opening an incident bridge.

Test the copilot as a security product, not as a conversational demo. Your evaluation set should cover correct recommendations, missing prerequisites, conflicting evidence, obsolete playbooks, requests outside the user’s permission, sensitive data, malformed tool arguments, and situations that require refusal or escalation. Measure whether the recommendation is grounded in the approved playbook, whether the action is appropriate, and whether the system preserved the required approval boundary.

Begin in shadow mode, where recommendations are evaluated but cannot change systems. Move next to draft-only assistance. Permit bounded execution only after the team has defined promotion criteria, rollback behavior, and an owner who can stop the workflow.

Prompt and output logs deserve the same access discipline as other sensitive security records. They may contain identities, indicators, configuration details, or incident evidence. Apply contextual data policies before information reaches the model, restrict access to the logs, and make retention a deliberate governance decision rather than a vendor default.

Counter AI-enabled attacks by changing the process

Attackers can use generative AI for targeted spear-phishing, deepfake executive voice messages, and more evasive malware. Trying to make every employee reliably identify synthetic content is a weak control. The appearance and quality of the lure will keep changing.

Change the process that turns a convincing message into access, money movement, or sensitive disclosure:

Require an out-of-band verification step for unusual executive requests, especially when the request changes credentials, access, payment details, or normal procedure.
Do not let familiarity with a voice, writing style, profile image, or caller ID serve as identity proof.
Harden identity controls with multifactor authentication, conditional access, and continuous risk scoring.
Give help-desk and operations teams a defined escalation path when a requester applies urgency or asks them to bypass verification.
Train employees with realistic AI-generated lure patterns, then measure reporting behavior and successful compromise rather than course completion alone.
Use AI-assisted red-team exercises to test the process, and use deception controls where they can divert attacker effort without putting production data at risk.

This reframes awareness training. Employees are not expected to become media-forensics experts. They need to notice when a request crosses a risk boundary and know the exact verification step to take. Product leaders can help by removing friction from the safe path: make reporting easy, make escalation visible, and avoid punishing someone who pauses a suspicious request.

The same principle applies to detection. Do not build the defense around whether content “looks AI-generated.” Build it around identity, behavior, privilege, asset sensitivity, and the actions an attacker is attempting.

Use a 90-day plan with measurable promotion gates

A focused 90-day plan is enough to establish an operating model if you keep the scope narrow: one high-signal detection decision, one mature response playbook, and one employee risk path such as phishing. The purpose is not to automate the security operation in a quarter. It is to prove that the decision loop can become faster without weakening control.

Days 1-30: define the workflow and baseline

Map the current signal-to-action path and identify where time, context, or consistency is lost.
Name a product owner, security owner, model-risk owner, and operational approver for the workflow.
Select the detection decision, response playbook, and employee risk process in scope.
Record baseline mean time to detect, mean time to recover, queue time, disposition quality, and the existing failure modes.
Define the data the model may access, the data it must not access, and the identity under which each tool operation runs.
Write the authority ladder, fallback behavior, stop condition, and rollback procedure before connecting production tools.

Days 31-60: evaluate in shadow mode

Run the detection model beside the existing workflow and compare ranked cases with analyst dispositions.
Test response recommendations against approved playbooks, including ambiguous and adversarial cases.
Review false positives and false negatives with analysts instead of reducing model quality to one aggregate score.
Confirm that sensitive-data policies, model access controls, prompt and output logging, and audit access work as designed.
Run a tabletop exercise covering model failure, unavailable retrieval, unsafe recommendations, excessive permissions, and orchestration failure.
Set promotion criteria for model quality, operational benefit, privacy, access control, and reversibility. Use thresholds appropriate to the risk of the chosen workflow rather than copying a generic benchmark.

Days 61-90: release bounded capability

Release the detection workflow to a defined analyst group while preserving the established fallback.
Enable draft-only response assistance before allowing any system mutation.
Permit only the actions covered by the approved authority policy; keep high-blast-radius changes outside model execution.
Review analyst edits, rejections, approvals, reversals, and escalations to find where the workflow lacks context.
Compare mean time to detect and recover with the baseline, while checking that precision, recall, privacy, and control failures have not regressed.
Make the next release decision explicitly: expand, hold, narrow the scope, or stop. A pilot that exposes an unsafe assumption has still produced a useful result.

The dashboard should separate outcomes from guardrails. Detection and recovery time tell you whether the operation improved. Precision, recall, recommendation correctness, and playbook grounding tell you how the model behaved. Rejections, manual edits, reversals, unauthorized-action attempts, and sensitive-data policy violations tell you whether the workflow is safe enough to scale.

Acceptance rate alone is not a quality metric. Analysts may accept a recommendation because it is correct, because the interface makes editing difficult, or because workload encourages quick approval. Review the resulting action and later incident outcome, not only the click.

Governance must continue after launch. Assign an owner to every model-enabled workflow, control access by role and context, version the model and retrieved playbooks, retain an auditable decision record, test for drift and bias, and repeat tabletop exercises when permissions or orchestration change. A model update is a security-product release, even when it arrives through a managed vendor.

Key takeaways

Optimize the full signal-to-action loop; do not add a disconnected AI queue.
Let models detect, summarize, and recommend, while policy and named people control authority.
Ground response guidance in approved, versioned playbooks before generating remediation steps.
Use shadow mode, draft-only assistance, and bounded execution as separate promotion stages.
Measure operational outcomes alongside precision, recall, overrides, reversals, privacy failures, and unauthorized-action attempts.
Defend against convincing AI-generated lures by hardening identity and verification processes, not by expecting perfect human detection.

Your next operating review should end with three named decisions: the detection workflow you will improve, the response action the AI may only recommend, and the metric that would stop the release. Once those are explicit, AI becomes a governable capability instead of an open-ended security experiment.

References

Pendo – 3 Powerful Ways AI Is Rewriting Cybersecurity: Smarter Defense, Faster Response, Fewer Breaches

January 4, 2026

Healthcare Product Benchmarks That Matter: Actionable Metrics and Playbooks From Our Report

I rely on product benchmarks to align teams, sharpen strategy, and accelerate outcomes—especially in healthcare, where stakes are high and complexity is real. Over the years, I’ve learned that the right metrics create clarity across product, engineering, compliance, and go-to-market, enabling faster, safer decisions that translate into measurable impact.

Discover exclusive data and strategies from our Product Benchmark Report. Compare the healthcare technology industry’s performance across key product metrics.

When I evaluate a healthcare product’s health, I focus on a few essentials: activation rate and time-to-value for new users, weekly active usage and feature adoption for clinicians and admins, and cohort-based retention analysis to understand whether value compounds over time. I also look at funnel friction (onboarding drop-off, failed setup steps), support load per account, and reliability signals that influence trust—because in healthcare, trust fuels growth.

Benchmarks turn those metrics into context. They help me answer, “Are we good, or just lucky?” By comparing our numbers to industry peers, I can prioritize the few bets that matter, set outcomes vs output OKRs, and guide empowered product teams to focus on the highest-leverage improvements.

Operationally, I instrument products with a unified analytics platform and tools like Amplitude analytics and Pendo to track user activation, feature adoption, and in-product journeys. Pairing that with continuous discovery keeps insights fresh, while A/B testing and clear minimum detectable effect (MDE) thresholds ensure we ship with statistical confidence.

In practice, my playbook for healthcare product-led growth is straightforward: simplify onboarding with targeted product tours and in-app guides, tighten the first-win loop to reduce time-to-value, and eliminate blockers surfaced by behavioral analytics. Then, reinforce the loop with lifecycle messaging, role-specific education, and clear value propositions for clinicians, operations teams, and executives.

Of course, none of this works without strong governance. Data governance and regulatory compliance aren’t just guardrails; they’re growth enablers. Clear audit trails, privacy-by-design, and reliable incident management build the trust that keeps adoption high and churn low.

If you’re ready to benchmark your roadmap against the market, this report gives you the clarity to spot gaps, the language to align stakeholders, and the metrics to execute with precision. Use it to calibrate your product strategy, guide your next set of experiments, and confidently scale what works across the healthcare technology ecosystem.

Inspired by this post on Amplitude – Perspectives.

December 29, 2025

Amplitude Browser SDK: Turn Web Vitals Into Product Decisions

You have Web Vitals in a dashboard, but the hard question is still unanswered: does a slower or less stable experience materially change activation, conversion, or retention? If your instrumentation cannot answer that, collecting more performance data will only make the dashboard busier.

The useful setup is not simply Browser SDK plus LCP, INP, and CLS. It is a measurement system that preserves the user’s real experience, attaches enough product context to explain the result, and connects performance to an outcome your team can improve.

Build the measurement contract before the dashboard

Start with the decision you want to make. A good Web Vitals implementation should tell you which experience is degraded, who encounters it, whether it is associated with a meaningful product outcome, and which intervention deserves engineering time.

I would use one normalized event, such as web_vital_observed, rather than inventing event names for every metric and route. The metric, value, page context, and audience context then become properties. That keeps the taxonomy manageable while preserving the dimensions needed for analysis.

Retain the raw measurement

Record LCP, INP, and CLS as distinct metric names with their raw values and units. LCP and INP are timing measures, while CLS represents visual stability, so combining their values in one aggregate would be meaningless. A separate metric-name property lets one event schema support all three without pretending that they are interchangeable.

Do not put labels such as good, acceptable, or poor into the event name. If you want performance bands, derive them from the raw value during analysis or store the band as an additional property. Keeping the underlying value allows you to change a threshold without rewriting history.

Add context that leads to a decision

The minimum useful context is not the maximum available browser context. Attach only properties that help you isolate a problem or compare an outcome:

page_group: a stable product category such as landing page, pricing, signup, checkout, or application workspace.
device_class: enough detail to separate materially different experiences without creating a fragmented taxonomy.
geography: the approved regional level, not unnecessarily precise location data.
traffic_source: useful when acquisition channels land users on different page experiences.
user_cohort: new, returning, activated, subscribed, or another state that matters to your product.
experiment_variant and release_id: the connection between a performance change and the product change that may have caused it.
measurement_timestamp: when the experience occurred, kept separate from the time Amplitude received the event.
sampling_policy: whether the event came from full collection or a documented sample.

Prefer a controlled page group over an unrestricted URL. Raw URLs can create excessive cardinality, split one product surface across many records, and expose identifiers or query-string data that should not enter analytics. Normalize the route and redact sensitive values before transmission.

Your event contract is ready when an analyst can move from a weak metric distribution to a specific page group, audience, release, and business outcome without asking engineering to reconstruct the session.

Protect the experience from the code measuring it

A Browser SDK runs in the same environment whose performance you are trying to understand. That makes collection overhead part of the product decision. An analytics implementation that worsens loading or responsiveness is not merely inefficient; it contaminates its own measurement.

Treating the Amplitude Browser SDK as a product surface leads to five practical requirements.

Keep the client-side footprint and payload focused. Collect properties that support segmentation or governance, not every value the browser can expose.
Make telemetry fail safely. Rendering, navigation, and interaction must continue if analytics initialization, collection, or delivery fails.
Use offline queuing and retry behavior without confusing delivery time with experience time. A delayed event still belongs to the session and release in which it was measured.
Sample consistently when full collection is unnecessary. A stable sampling policy is more defensible than selectively collecting only certain devices, routes, or observed performance states.
Put schema validation and compatibility checks in CI/CD. Product releases should not silently rename properties, change units, or remove the context that existing dashboards depend on.

Sampling deserves particular care. If slow sessions are more likely to be abandoned, a delivery mechanism that captures only completed journeys can underrepresent the experience you most need to see. Keep collection independent of the outcome wherever possible, document the sampling rule, and monitor coverage by page group and device class. A sample is useful only when you know what population it represents.

Retries create a different risk: duplicate or chronologically misplaced observations. Use a stable measurement identifier when your implementation needs deduplication, and preserve the original measurement timestamp. Otherwise, a recovered connection can make an earlier performance problem appear to belong to a later release.

Make privacy part of the event design

Consent-aware collection, edge redaction, and regional routing should be decided before rollout. Do not send a property and hope to clean it later. Once sensitive data enters an analytics pipeline, deletion and access obligations become harder to manage across queues, retries, exports, and downstream reports.

Review each property with a simple test: does this value materially change a product decision? If a precise URL, identifier, or location does not pass that test, replace it with a stable category or leave it out.

Analyze distributions alongside product outcomes

An average Web Vital hides the pattern product teams need. One page can look acceptable on average while a valuable device segment or acquisition cohort has a consistently poor experience. Start with distributions, then segment them by page group, device, geography, traffic source, and user cohort.

Next, pair those performance distributions with funnels and cohorts. Compare activation, conversion, retention, or revenue outcomes across ranges of LCP, INP, and CLS. Keep the metrics separate, because load speed, responsiveness, and visual stability can affect different moments in a journey.

Question	Amplitude view	Decision it supports
Where is the experience degraded?	Metric distribution by page group and device class	Select the surface and audience to investigate
Does the degradation matter to the product?	Outcome rate across performance ranges	Estimate the strength and shape of the association
Which change caused an improvement?	Experiment variant compared on both the vital and the outcome	Ship, revise, or reject the intervention
Did a release create a regression?	Performance distribution trended by release	Escalate, roll back, or investigate the affected page group

Look for a cliff rather than assuming a smooth relationship. Conversion might remain similar across much of the distribution and then deteriorate after a particular range. That pattern gives you a more useful target than a site-wide average: move the affected population away from the range where the outcome changes.

Do not confuse that pattern with causation. Device capability, network conditions, geography, traffic source, and user intent can affect both performance and conversion. Segmentation reduces obvious confounding, but it does not eliminate it.

Use experiments to prove the product effect

Once you find an important association, test an intervention. Image optimization, lazy-loading changes, and navigation changes are useful candidates because each can alter a specific part of the experience. Randomize the intervention, not the Web Vital, and measure two results together:

Did the treatment improve the intended LCP, INP, or CLS distribution?
Did the same treatment improve activation, conversion, retention, or another declared outcome?

A treatment that improves a performance score but leaves the product outcome unchanged may still be worthwhile for experience quality or regression prevention. It should not, however, be presented as a proven growth lever. Conversely, an outcome lift without the expected Web Vital movement means your proposed mechanism was probably incomplete.

Prioritize opportunities using four factors: the size of the affected population, the outcome gap associated with the performance range, your confidence that the relationship is actionable, and the team’s ability to change the relevant surface. This keeps a dramatic problem on a low-traffic page from automatically outranking a smaller but widespread problem in signup or checkout.

SEO can be a compounding benefit, but it should not replace the product case. Improve the experience for real users, verify the effect on their behavior, and treat search performance as a downstream outcome rather than the sole reason to optimize a synthetic score.

Turn the first week into an operating loop

Start with your top three entry pages. A one-week diagnostic is a sensible time box for establishing visibility, not a promise that you will prove causality in seven days. The first goal is to expose the distribution, validate the event quality, and identify one segment worth investigating.

Choose three entry pages and assign each to a stable page group.
Instrument LCP, INP, and CLS with the same normalized contract.
Verify coverage, missing properties, sampling behavior, timestamps, consent handling, and unexpected values before interpreting a chart.
Plot each metric’s distribution by page group and device class.
Overlay one outcome that occurs close enough to the experience to support a useful decision, such as signup completion or activation.
Select one high-impact segment and define an intervention that could plausibly change its experience.

Keep the first scope narrow. Adding every route, cohort, and outcome at once creates an instrumentation program before you have proven that the model produces decisions. Once the first three pages generate a credible hypothesis, extend the same event contract instead of creating a new one for every squad.

Define ownership before the first regression

Product should own the page groups, business outcomes, and prioritization logic. Engineering should own collection performance, delivery resilience, release metadata, and regression guardrails. Data or analytics should own schema quality, coverage checks, and the analytical definitions used in dashboards. The appropriate privacy owner should approve consent behavior, PII controls, and regional routing.

Then define product-level service objectives for LCP, INP, and CLS by key page group. Review performance distributions beside activation and retention in QBRs, and add release guardrails so a feature cannot quietly trade away responsiveness or stability. A site-wide objective is too blunt if signup and a low-traffic support page carry different user and business consequences.

Your instrumentation is operational when it has all of the following:

A versioned event contract with documented metric units and required properties.
Automated checks that catch schema drift during CI/CD.
Known coverage and sampling behavior across important page and device groups.
Consent, redaction, and routing rules applied before data leaves the browser.
A distribution view for each Core Web Vital rather than one blended score.
At least one product outcome connected to the performance experience.
A named owner and a release response for regressions.

This is where Web Vitals stop being a periodic performance project. They become a shared decision system for product, engineering, analytics, and privacy.

Key takeaways

Use one normalized Web Vitals event and preserve the raw metric value; derive performance bands without discarding the underlying measurement.
Attach stable page, audience, experiment, release, timestamp, and sampling context only when it supports analysis or governance.
Keep analytics collection lightweight, failure-tolerant, consent-aware, and protected by schema checks.
Analyze distributions by meaningful segments, then connect them to activation, conversion, retention, or revenue.
Treat correlations as hypotheses. Use an experiment to verify that a performance intervention also changes the intended product outcome.
Begin with three entry pages, one nearby outcome, and one actionable segment before expanding coverage.

On your next instrumentation ticket, require three fields beyond the SDK task: the decision the data will support, the outcome it will be joined to, and the owner who will respond when it regresses. That small change turns Web Vitals collection from telemetry into product management.

References

December 18, 2025

Inside the Engine Room: How I Drive Scalable Analytics APIs, Reliability, and Performance

I build and scale analytics platforms with a product mindset, and the work starts with the "middleware and compute systems that power analytics at scale." In platforms like Amplitude analytics and other unified analytics platform architectures, that foundation is what makes everything else possible.

Day to day, I oversee the "APIs behind charts, cohorts, and metrics—driving performance, reliability, and platform scalability." When those APIs are fast and resilient, every product team—from growth to customer success—can trust the insights they use to ship, learn, and iterate.

From an engineering leadership standpoint, I partner closely with SRE to define SLOs and error budgets, wire CI/CD pipelines for safe deploys, and track DORA metrics so we improve speed without compromising quality. This combination reduces incident management toil and shortens MTTR while keeping data freshness and query latency within strict thresholds.

From a product management leadership lens, the goal is clarity: crisp APIs, predictable contracts, and transparent stakeholder management across data, engineering, and GTM teams. That alignment empowers product teams with reliable cohorts and metrics, accelerates experimentation, and de-risks roadmaps.

If you’re scaling analytics, invest first in the platform layer: middleware and compute, schema governance, caching strategies, and cost-aware compute. Do that well, and the visible experience—charts, cohorts, and metrics—feels effortless, even as you grow to serve billions of events with confidence.

Inspired by this post on Amplitude – Best Practices.

December 12, 2025

A Practical Governance Model for Enterprise AI Support Agents

Your AI customer service agent can pass a polished demo and still fail the first serious compliance question: Why did it give that answer, which data did it use, what did it change, and could the customer reach a person? If reconstructing one interaction requires guesswork across several systems, the deployment is not governed.

For enterprise support, governance has to live inside the product and its operating model. You need explicit limits on autonomy, deterministic routes for regulated workflows, release gates, human handoffs, and evidence that survives an audit. The goal is not to eliminate every possible failure. It is to know which failures matter, prevent the unacceptable ones, detect the rest, and respond without losing control of the customer case.

Give every decision an owner before the agent gets autonomy

An AI agent is not just a model. The governed system includes its instructions, approved knowledge, retrieval settings, identity checks, connected tools, routing rules, human workflow, logs, and vendor dependencies. Reviewing the model while ignoring those components leaves most operational risk untouched.

Start with a deployment register. Create an entry for every production agent, channel, and materially different configuration. Each entry should identify:

The customer jobs the agent may handle and the outcomes it may produce.
The countries, business units, brands, languages, and channels covered by the deployment.
The tasks the agent must refuse, defer, or transfer to a person.
The customer and company data it can read, create, update, or disclose.
The tools and system permissions available to it.
The business owner accountable for the service outcome.
The product owner accountable for behavior, evaluation, and change control.
The security, privacy, legal, and operational owners responsible for their respective controls.
The people authorized to approve a release, accept a known risk, restrict an intent, or stop the agent.

Several roles can belong to the same person in a smaller organization. Accountability still cannot be shared so broadly that nobody can make a decision during an incident.

Then build a control register beside the deployment register. For every material risk, record the control, the test that proves the control works, the evidence retained, and the owner who reviews a failure. A statement such as “the agent should avoid inappropriate refunds” is a policy aspiration. A scoped refund permission, an approval rule, a test set, and a logged decision form a control.

My practical test is simple: if a team cannot name the owner, test, and evidence for a claimed safeguard, that safeguard should not be used to justify greater autonomy.

Translate service obligations into controls the agent can prove

Compliance requirements usually describe customer outcomes, not model architecture. Your control design has to connect those outcomes to specific events in the support journey.

Spain offers a useful stress test. A customer-service measure described while still moving through final approval stages includes a three-minute call-answer target for 95% of calls, access to a person on request, complaint deadlines of 15 days and five days for undue charges, centralized complaint tracking, annual external audits, and language and accessibility obligations. Those provisions do not automatically apply to every company or jurisdiction. Counsel must confirm the measure’s current status, scope, and application before you treat any of them as a legal requirement.

The broader design lesson is durable: the obligation follows the customer journey across automation and human support. It does not disappear because an AI agent handled the first interaction.

Service obligation	Product control	Evidence to retain
Reachability and response time	Measure the full journey from contact initiation through automated handling, queueing, and human connection. Define overflow behavior for outages and demand spikes.	Channel timestamps, queue events, routing outcomes, abandoned contacts, and performance segmented by incident period.
Human access on request	Recognize an explicit request for a person, expose a visible handoff path, and provide a fallback when the primary human channel is unavailable.	Handoff test results, transfer attempts, completion status, queue time, callback records, and failed-transfer alerts.
Complaint deadlines	Create a case immediately, apply the correct policy-based category and due date, assign an owner, and escalate before the deadline.	Case identifier, classification, policy version, creation time, due date, ownership changes, customer communications, and resolution time.
Unified complaint tracking	Carry one system-of-record identifier across chat, voice, email, messaging, and human follow-up instead of creating disconnected cases.	A linked timeline of every automated and human interaction, action, status change, and final disposition.
Language and accessibility support	Maintain a capability matrix by channel and route unsupported needs to an appropriate alternative rather than improvising.	Evaluation results by supported language and accessibility path, routing outcomes, and unresolved coverage gaps.
Separation of service and sales	Restrict promotional content and sales tools in workflows where service calls cannot be used for selling.	Tool permissions, prompt and policy versions, sampled interactions, blocked-action records, and exception approvals.
External auditability	Version releases, preserve control tests, document changes, and connect incidents to corrective action.	A release evidence package containing scope, approvals, risk decisions, evaluation results, configurations, incidents, and remediation.

Do not ask the language model to infer the applicable legal rule from a customer’s free-text message. Resolve jurisdiction, account type, service category, contractual status, and channel through trusted account data and deterministic policy logic. The agent can explain the resulting process, but it should not invent the rule that governs it.

Set autonomy by consequence, not conversational fluency

A natural answer can make a workflow feel safer than it is. Fluency says little about whether the agent authenticated the customer, selected the right policy, disclosed protected information, or performed the intended system action.

Assign autonomy at the intent-and-action level. A workable classification looks like this:

Inform: The agent answers from approved, versioned knowledge without changing customer data. Outage information, published policies, and basic troubleshooting often fit here.
Prepare: The agent gathers details or drafts a request, but a trusted system or person validates it before anything is committed.
Execute with confirmation: The agent performs a permitted, recoverable action only after authentication, validation, and an explicit customer confirmation. The interface should show what will change before execution.
Human approval required: The action has material financial, contractual, privacy, safety, or service-continuity consequences. The agent may collect context and recommend a next step, but it cannot make the final decision.
Prohibited: The task falls outside the approved purpose, requires inaccessible evidence, or carries a consequence the organization is unwilling to automate.

For each intent, evaluate four separate failure paths: a wrong answer, an inappropriate disclosure, an unauthorized action, and a missed escalation. They need different controls. Approved retrieval can reduce unsupported answers, but it does not enforce account authorization. A confirmation screen can prevent accidental execution, but it does not make a prohibited action acceptable.

Use least-privilege tool access as the hard boundary. If an agent only needs to read shipment status, do not give it a general customer-record role. If it can issue a bounded credit, encode the allowed conditions and limit in the transaction service rather than relying only on a prompt. Instructions shape behavior; permissions limit impact.

Vendor assurance belongs in this assessment, but it answers only part of the question. AIUC-1 certification, for example, includes independent third-party audits and quarterly adversarial testing across more than a thousand enterprise risk scenarios, with coverage spanning areas such as security, customer safety, reliability, privacy, and accountability. That can provide useful evidence about a vendor’s control environment. It does not certify your prompts, connected systems, customer policies, permissions, or human escalation design.

Procurement should therefore collect evidence and define the shared-responsibility boundary. Ask which products, models, subprocessors, and hosting arrangements are in scope; how material changes are communicated; what interaction and administrative logs can be exported; how customer data is retained and protected; what happens when a model or safety layer changes; and which incident information the vendor will provide. Keep the answers with the deployment record. A certification logo without scope and current evidence is not an operating control.

Run releases, evidence, and incidents as one control loop

A launch review is necessary, but it cannot carry the full governance load. Agent behavior can change when the model, system instructions, knowledge base, retrieval settings, safety classifiers, tool APIs, routing logic, or customer policies change. Every material change needs an owner, a risk assessment, proportionate regression testing, and a recoverable release.

Use the following release loop:

Freeze the scope. Record supported intents, prohibited tasks, data access, tools, regions, languages, channels, human routes, and known limitations.
Build evaluations from the control register. Include normal cases, ambiguous requests, missing information, authentication failures, conflicting policies, attempts to obtain protected data, adversarial instructions, tool failures, repeated requests for a person, unsupported languages, and downstream-system outages.
Define pass and fail before testing. Mark unacceptable outcomes explicitly. An average quality score can hide a rare but severe privacy disclosure or unauthorized action.
Gate production on evidence. Require the named approvers to review failed cases, accepted residual risks, fallback behavior, monitoring coverage, and rollback readiness.
Release with bounded exposure. Limit the first deployment by intent, permission, channel, customer population, or geography according to the risk. Expand only when production evidence supports it.
Monitor behavior and control health. Track not just answer quality, but handoff completion, prohibited-action attempts, tool errors, unsupported requests, complaint-clock failures, overrides, repeated contacts, and missing audit events.
Feed failures back into the system. Connect every meaningful incident or near miss to a corrected control, a new evaluation case, and a documented release decision.

Periodic adversarial testing matters because the threat and model landscape changes. AIUC-1 itself is described as evolving quarterly alongside new threat patterns and technical progress. Your internal cadence does not have to copy a certification program, but it should be driven by system risk, material changes, observed failures, and emerging attack paths rather than by the anniversary of the original approval.

Make each consequential interaction reconstructable

For a consequential interaction, an authorized reviewer should be able to determine what the customer asked, which identity and policy context applied, which knowledge version was used, what the agent produced, which tools it called, what changed, whether a person became involved, and how the case ended.

A useful event record normally includes the channel and timestamps; authenticated account context; resolved policy or jurisdiction context; intent and risk class; instruction, model, retrieval, and knowledge versions; tool requests and responses; the customer-facing answer; confirmation events; escalation requests and outcomes; case identifiers and due dates; safety or policy decisions; human overrides; and final disposition.

Do not respond by retaining every raw conversation forever. A larger data store is not automatically a better compliance system. Apply purpose limitation, access controls, redaction, approved retention periods, deletion rules, and legal holds to the evidence itself. Security and privacy owners should be able to explain both why an event is captured and when it is removed.

Package the evidence by release, not only by department. The package should connect the approved scope, risk assessment, control register, evaluation results, configuration versions, vendor evidence, exceptions, monitoring, incidents, and corrective changes. That structure lets an auditor trace a requirement to a control and then to proof without assembling the story from scattered screenshots.

Treat an AI failure as an operational incident

Your incident process should cover more than security breaches. A privacy disclosure, unauthorized account change, systematically wrong billing answer, missing human transfer, broken complaint timer, or unsupported-language dead end can all require containment.

Pre-authorize the response team to disable a tool, intent, channel, or release without waiting for a full governance meeting. The playbook should preserve relevant evidence, identify affected interactions, protect unresolved customer cases, route demand to a safe alternative, assess notification or remediation obligations with the appropriate legal and privacy owners, correct the control, add regression tests, and require approval before autonomy is restored.

Do not silently patch the prompt and delete the trail. That may make the next conversation look better while leaving impacted customers, complaint deadlines, and the underlying control failure unresolved.

Key takeaways

Govern the complete support system – model, knowledge, tools, permissions, routing, people, and evidence – rather than reviewing the model in isolation.
Map each applicable service obligation to a product control, a repeatable test, retained evidence, and a named owner.
Assign autonomy by the consequence of each intent and action. Fluency is not evidence that an action is safe.
Use deterministic policy logic and least-privilege permissions for hard boundaries; do not expect prompts to carry legal or transactional controls alone.
Treat vendor certifications as scoped evidence about vendor controls, not as certification of your deployment.
Retest material changes and convert production failures into new controls and regression cases.
Preserve enough evidence to reconstruct consequential interactions while still enforcing privacy, access, and retention rules.

Start with one high-volume intent that already reaches customer data or a business system. Trace it from the first message through authentication, policy selection, answer or action, human handoff, case closure, and retained evidence. Assign an owner, control, test, and evidence record at every consequential step. Where you cannot complete that chain, reduce the agent’s autonomy before you increase its reach.

References

December 8, 2025

25 High-Impact Career Paths for Software Engineers Beyond Coding: My Real-World Playbook

I’ve spent years helping talented engineers explore what’s next when pure coding no longer feels like the only—or best—path. From hiring across cross-functional teams to mentoring career pivots, I’ve seen firsthand how engineering strengths translate into high-leverage roles that shape product, strategy, and growth.

Software engineers have alternative career options leveraging their skills in roles like product manager, data scientist, business analyst, and 22 more.

When an engineer moves into product management, they’re not starting from scratch—they’re redirecting problem-solving, systems thinking, and customer empathy toward outcomes. In practice, that means mastering product discovery, strengthening stakeholder management, and getting fluent in product roadmapping and sprint planning, so decisions are guided by impact rather than “outputs vs outcomes” confusion. I’ve watched this transition unlock empowered product teams and clearer prioritization across complex backlogs.

Data-oriented paths are equally compelling. If you enjoy experimentation and evidence-based decisions, roles in analytics or data science reward rigor. Think A/B testing, identifying the minimum detectable effect (MDE), and using tools like Amplitude analytics to translate behavioral signals into product bets. Pair that with retention analysis and you’ll become indispensable to growth conversations.

Business-facing roles such as business analyst or product marketing manager are ideal if you’re energized by customer problems and market narratives. Your engineering fluency sharpens value propositions, product positioning, and go-to-market strategy in a way that resonates with both buyers and builders. In my teams, the best bridges between product and revenue often came from former engineers who could articulate trade-offs with clarity.

If operational excellence is your edge, consider SRE, DevOps, or cybersecurity. The same instincts that push you toward clean CI/CD pipelines and resilient architectures translate well into incident management, threat detection and response, and privacy-by-design practices. These roles reward systems thinking and the ability to balance reliability with delivery speed.

For engineers who love community and storytelling, developer evangelism is a natural fit. You’ll translate complex concepts into actionable guidance, from in-app guides and product tours to UX writing and documentation. The best evangelists I’ve worked with turn feedback loops into product insight, strengthening activation and product-led growth without heavy sales pressure.

Customer-facing technical roles—solutions engineer, forward deployed engineer, or technical consultant—let you stay close to the product while solving real-world problems. You’ll drive onboarding quality, user activation, and adoption while surfacing insights that influence roadmaps. Done well, this work tightens the loop between customer outcomes and product decisions.

AI-centered roles are expanding rapidly. If you’re curious about AI Strategy, retrieval-first pipelines, or the practical use of LLMs for product managers, you can bring an engineer’s discernment to a noisy space. The most valuable contributors here pair pragmatic architecture choices with clear risk management and measurable business value, not hype.

Leadership tracks remain a strong option too. The IC to manager transition isn’t about title; it’s about raising the ceiling for others. You’ll coach empowered product teams, shape organizational development, and align initiatives to defensible metrics—think DORA metrics for flow, leading indicators for value, and OKRs that measure outcomes over output.

If you’re exploring a pivot, start small and intentional. Run “career A/B tests” by taking on cross-functional projects, shadowing adjacent roles, or shipping a lightweight portfolio that demonstrates the new muscle. Join a ProductCon session, practice conference networking, and refine a narrative that links your engineering foundation to the outcomes your target role owns.

Finally, map your personal unfair advantages—domain knowledge, systems thinking, customer empathy, or operational rigor—to the roles that value them most. With focus, you can reposition your engineering experience into a differentiated story that accelerates your next chapter. The breadth of options is real, and with a deliberate plan, you’ll turn curiosity into conviction—and conviction into impact.

Inspired by this post on Product School.

November 24, 2025
Mastering Data Governance in the AI Era: Move Fast, Reduce Risk, and Unlock Trusted Insights

Every week, I’m in conversations with product leaders, engineers, and security teams who are trying to ship AI features faster without compromising trust. The tension is real: stakeholders want velocity, customers want transparency, and regulators want accountability. That’s exactly where modern data governance earns its keep.

New AI pressures are redefining what good governance takes. Learn how to build better frameworks, move fast with confidence, and keep your data from being a black box.

In my role leading product management, I’ve learned that robust data governance isn’t a compliance checkbox—it’s a strategic capability. When we treat governance as a product, we architect for clarity, safety, and speed. That means aligning AI Strategy with day-to-day delivery so teams know what they can ship, when, and why.

Here’s the practical blueprint I rely on. First, establish ownership and a shared language. Create a living data catalog, lineage maps, and clear data classifications so teams know which assets are sensitive, regulated, or eligible for training LLMs. Second, harden privacy-by-design and least-privilege access. Bake PII detection, secrets management, and role-based policies directly into your workflows. Third, bring quality and observability to the forefront: instrument data contracts, monitor drift, and track model performance across environments. Finally, implement model governance end to end—dataset cards, model cards, bias testing, human-in-the-loop review, and a repeatable evaluation harness.

To move fast with confidence, make governance invisible and automated. Treat policies as code in CI/CD, gate deployments with pre-merge checks, and fail builds that violate data contracts. Log prompts and outputs responsibly, route unsafe patterns to red-teaming, and use a retrieval-first pipeline to anchor models on verified sources rather than fragile context stuffing. This is how we scale AI product development while keeping audit trails complete and costs in check.

Avoiding the black-box problem starts with transparency. Document assumptions, training data sources, and known limitations—then expose explanations where it matters in the product experience. Pair this with a unified analytics platform to tie telemetry, feature flags, and user feedback to model changes. When something goes sideways, your observability, incident management playbooks, and threat detection and response processes should make root-cause analysis fast and defensible.

If you’re building your program from scratch, use a 30-60-90 approach. In the first 30 days, inventory systems, classify data, and map high-risk use cases. By day 60, formalize RACI for governance, deploy access controls, and set up your evaluation pipeline with golden datasets and measurable acceptance thresholds. By day 90, operationalize incident response, conduct tabletop exercises, and wire governance outcomes into OKRs—think time-to-approval for high-risk changes, reduction in production incidents, and model evaluation pass rates.

This playbook pays off in board conversations and with customers. You can articulate your AI risk management posture, show measurable progress on regulatory compliance, and demonstrate how governance accelerates—not hinders—delivery. Most importantly, your teams gain the confidence to experiment, knowing there’s a safety net that protects users, the brand, and the business.

If your organization is wrestling with how to balance innovation and control, start small, codify what works, and scale with intent. With the right foundations in data governance, AI becomes an engine for durable advantage—not a source of sleepless nights.

Inspired by this post on Amplitude – Perspectives.

November 21, 2025