Author: Shivam Tiwari

How I Use Novus, the First Product Agent, to Turn Rapid Releases into Measurable Wins

In a world of relentless CI/CD and accelerating release trains, product leaders like me can’t afford lagging signals or fuzzy readouts on what’s truly moving the needle. I need immediate, trustworthy feedback that connects code shipped to outcomes achieved and customer value created.

Coding agents compress weeks of development into hours, but the faster your codebase changes, the harder it is to know what’s actually helping end-users.

That tension is exactly why I brought Novus into my product toolbox. To keep up with the pace of development, over 600 product teams are already using Novus, the first-of-its-kind product agent, to automatically set itself up, monitor product data, and tell you what to do next.

From my chair, that promise matters only if it translates into clear decisions. With Novus, I’ve been able to tighten the loop between experimentation and learning: it pairs eval-driven development with behavioral analytics and observability so I can see how a release influences activation, engagement, and retention—without spelunking through fragmented dashboards. The agentic AI backbone reduces the manual stitching I used to do across events, cohorts, and funnels, letting me focus on prioritization and product strategy instead of report wrangling.

Day to day, Novus fits naturally into our AI workflows. It surfaces anomalies early, clarifies trade-offs, and frames next-best actions in the language of outcomes. Because it plugs into a unified analytics platform approach, I can maintain continuous discovery at scale while preserving the rigor of Agent Analytics: hypotheses are explicit, telemetry is consistent, and results are traceable. That’s the operating cadence I expect from modern product management leadership.

If your roadmap moves faster than your learning loops, a product agent can be the missing link between speed and certainty. Novus helps me convert rapid releases into measurable wins, keeping the team aligned and confident about what to build next—and just as importantly, what to stop doing.

Inspired by this post on Pendo – Best Practices.

June 17, 2026
Stop Forcing Organizational Change: How I Create Impactful Product Habits Without Burnout

Organizational change is exhausting—so I stopped trying to force it. After years of leading product teams, I’ve learned that trying to fix the people and processes around me is almost always wasted energy. If you’re eager to champion a better way of working inside a resistant organization, there’s a more sustainable path that actually drives results.

Here’s my starting point: individuals can’t change their organizations. I’m often asked to “train the PMs” or “install discovery practices,” but without executive sponsorship, organizational pain, and urgency, nothing moves. I now decline those well-intentioned requests and focus instead on creating the conditions for change.

My readiness check is simple and ruthless. Pain — organizational pain felt by leadership, not just you. Urgency — there has to be a cost to inaction. Awareness — people need to know solutions exist. If I can’t articulate these three clearly, I narrow the scope to what my team and I can control and demonstrate.

Practically, I elevate organizational pain by making it visible and quantifiable: missed outcomes vs output OKRs, customer churn tied to unmet needs, increased operational load from legacy workflows, or cycle time and deployment friction that slow learning. I create urgency by modeling cost-of-delay and showing the trade-offs we’re already making. And I build awareness by running small, transparent experiments that show there’s a credible alternative—continuous discovery, empowered product teams, and product trios solving for outcomes, not output.

“Organizational change starts with you — but it starts with you changing you, not your organization.” I take that literally. I refine my own discovery habits, make my assumptions explicit, and raise the quality bar on evidence. Whether it’s adopting AI responsibly in our workflow or redesigning how we do customer interviews, I change me first and let the results speak.

Show your work, don’t advocate your conclusions. Instead of arguing for “the right way,” I surface the pain, share how I reached my conclusion, and let others draw their own insights. I circulate decision logs that link customer evidence to product decisions, include short snippets from interviews, and map outcomes to proposals. That transparency lowers defenses, builds stakeholder buy-in, and shifts the conversation from opinion to observable facts.

Working within constraints, not against them. Stuck in a rigid, feature-factory process? You don’t have to change quarterly planning to do great discovery. Add customer context. Frame features around outcomes. Layer in the habits without touching the formal process. I’ve embedded discovery into existing rituals: adding customer insights to PRDs, tying features to measurable outcomes, and using thin-slice experiments that fit inside current delivery cadences. Over time, those habits compound.

The ripple effect is real. Teams that do great work and show it publicly become the ones everyone wants to emulate. That’s how influence actually spreads. I make results visible—brief Looms walking through our reasoning, dashboards that track outcome movement, and internal write-ups that highlight how the work changed a customer behavior. Visibility turns quiet wins into organization-wide momentum.

If you want a place to start this week, try this: define a sharp outcome, run three quick customer interviews, share your notes and decision rationale openly, and ship one small experiment tied to that outcome. Use the data to refine your next step and repeat. In a month, you’ll have a trail of evidence, not a pitch deck—and that’s what shifts minds.

In the end, sustainable change comes from consistent practice, not fiery advocacy. Focus on outcomes, make the pain and cost-of-inaction undeniable, and keep showing your work. The organization will move when it’s ready—your job is to make “ready” happen sooner by modeling what good looks like and making it impossible to ignore.

Inspired by this post on Product Talk.

June 16, 2026
Why Product Engineers Are Transforming Software Delivery: Ownership, Speed, and Real Impact

I’ve watched the rise of product engineering up close, and it’s reshaping how we build software. The old model of rigid handoffs and separate functions is giving way to small, empowered product teams where engineers own the customer problem end to end. That shift isn’t just cultural—it’s a performance advantage that compounds with every release.

I often summarize it this way: “Product engineers are taking over. They ship code, talk to users, and own outcomes—no handoff required. Here’s what the role is, and why it matters now.”

When I say “product engineer,” I’m describing a builder who goes beyond writing code. I expect them to partner in product trios with product management and design, participate in continuous discovery, and make decisions grounded in product strategy and real customer insight. They don’t toss features over a wall; they own the problem, the solution, and the measurable outcome.

Why now? Modern delivery practices like CI/CD and feature flags compress feedback loops, while behavioral analytics and session replay make customer friction visible in real time. As expectations rise for quick iterations and clear value, teams that reduce handoffs and align around outcomes outperform on DORA metrics such as deployment frequency and lead time for changes.

Day to day, a strong product engineer blends discovery and delivery. They join customer interviews, review support tickets, analyze usage patterns, and run A/B testing to validate hypotheses. Then they ship code in small, safe increments, instrument telemetry, and watch adoption and retention signals to confirm they’re moving the numbers that matter.

Team shape matters. I favor compact, cross-functional squads anchored by product trios, each with explicit outcomes vs output OKRs. Product engineers often operate like forward deployed engineers, partnering with customer success and solutions engineering to learn at the edge of real-world usage. This proximity to customers turns ambiguity into insight—and insight into product leverage.

Accountability is concrete. We track DORA metrics for delivery health and pair them with product outcomes such as activation, time-to-value, and Net Recurring Revenue (NRR) drivers. The combination keeps us honest about both how fast we move and whether what we ship truly works for customers.

The hiring profile is distinct. I look for engineers who are curious about the “why,” comfortable with trade-offs, and energized by customer conversations. They can navigate architectural complexity, but they also translate user feedback into crisp product bets. Many grow into natural facilitators of discovery rituals and developer evangelism across the organization.

If you’re getting started, pilot a single squad. Establish clear outcomes vs output OKRs, invest in CI/CD and feature flags, and commit to continuous discovery with weekly customer interviews. Give the team ownership of a KPI tied to product strategy, and measure progress with DORA metrics plus usage and retention signals. The early wins—fewer handoffs, faster learning, tighter feedback loops—build momentum quickly.

In short, product engineers thrive where accountability, autonomy, and user empathy meet. They reduce wasteful coordination, shorten the path from insight to impact, and ensure we ship code that customers actually adopt. That’s why this role is reshaping how software gets built—and why the teams that embrace it will set the pace for everyone else.

Inspired by this post on Pendo – Perspectives.

June 15, 2026
Salesforce to Acquire Fin for ~$3.6B: Powerful AI Synergy, Product Strategy Takeaways

I’m processing a milestone moment for SaaS, AI strategy, and product leadership. One statement captures the news with clarity: “We’re excited to share that we just signed an agreement for Salesforce to acquire Fin for ~$3.6B. The transaction is expected to close in the fourth quarter of Salesforce’s fiscal year 2027.” As a product leader, I see this as a high-conviction bet on agentic AI, Customer Agents, and CRM integration at massive scale.

The backstory matters, and it’s remarkable: “Fin started as Intercom 15 years ago. We changed our name to cap our transformation just weeks ago. We were a darling of the SaaS era and invented so many of the patterns you see in software today. Nearly four years ago, in need of a reboot, we jumped on weeks-old modern LLMs to create and define the category we know as Customer Agents today.” That arc—from SaaS pioneer to LLM-powered category creator—illustrates how bold pivots, shipped with urgency and clear product strategy, can reset the trajectory of a company and a market.

From a product management lens, this deal reinforces a few truths: category creation rewards those who move first with conviction; “reboots” succeed when they’re anchored in genuine customer value; and modern LLMs, applied through disciplined roadmapping and eval-driven development, can unlock step-change outcomes in customer support ai strategy and product-led growth. It also signals the rising centrality of agentic AI and operational AI workflows inside the CRM.

The leadership dimension is just as instructive. As the announcement framed it: “Salesforce invented modern software and SaaS. And Marc Benioff is like the final boss of tech founder CEOs. In seat for 27 years, he’s one of the last of his era. Still pushing, pivoting, placing big bets.” That ethos—placing big, principled bets while adapting the operating model—sets the tone for what sustained product management leadership looks like at scale.

Customer continuity and acceleration are clearly emphasized: “To our customers: Over the past few years we’ve been shipping intensely. Including recently our groundbreaking model, Apex, and our paradigm-defining internal agent, Operator. With the resources of Salesforce this will only accelerate. And yet little will practically change. I’ll still be CEO, Des will still be running R&D, we’ll both still be committed to continuing to lead this category. Thank you very sincerely and deeply for your belief in us.” For practitioners, the signal is strong: continued focus on shipping, sharper execution readiness, and tighter integration paths inside the Salesforce ecosystem.

Smiles, clinking glasses, and a roundtable toast in a cozy private room capture the energy of a big day—celebrating Salesforce's definitive agreement to acquire Fin and the teams joining forces for what's next.

There’s a human heartbeat here too: “While this is not the end, it is a major, pivotal, special, and emotional moment for us.” Moments like this remind me that building enduring products is equal parts craft and courage—powered by teams who commit to the long game, navigate uncertainty, and still ship relentlessly.

Strategically, I expect near-term priorities to center on secure data flow and governance, deep CRM integration, and unifying telemetry for Agent Analytics across channels. On the roadmap, I’d anticipate tighter alignment between LLM safety, retrieval-first pipelines, and enterprise-grade observability—plus thoughtful go-to-market strategy enabling sales-led growth to complement product-led growth. The real unlock comes when Customer Agents are natively orchestrated with Service, Sales, and Marketing workflows—measured with clear outcomes vs output OKRs and reinforced by robust knowledge management.

For fellow product leaders, the takeaways are actionable: define category boundaries with crisp value propositions; balance speed with governance; invest in eval-driven development and continuous discovery; and keep your product trios aligned around measurable customer outcomes. Above all, build the operating cadence—metrics, rituals, and talent—that lets you compound small wins into durable differentiation.

And I appreciate the spirit of this closing line: “And now, time to get back to work. See you at our next product launch in a few weeks. (:” That’s the mindset that turns a headline into execution: celebrate briefly, then ship the next proof point.

Inspired by this post on The Intercom Blog.

June 15, 2026
Claude Code for Product Managers: Accelerate Prototypes, Validate Faster, Ship with Confidence

I build products under constant pressure to learn faster without breaking trust. Claude Code has become a pragmatic addition to my AI product toolbox because it helps me move from idea to evidence with less friction—while keeping engineering, design, and compliance in the loop.

“Claude Code for Product Managers explained: what it is, why it matters, and how it helps PMs prototype, validate, and move faster.” That line captures the essence. In practice, I use it to turn ambiguous problem statements into tangible artifacts—API stubs, SQL queries, test data, and lightweight prototypes—that sharpen conversation and accelerate decision cycles.

What is it in PM terms? A code-aware assistant that helps me prototype safely and quickly. I can generate example API calls, transform messy CSVs for retention analysis, draft instrumentation plans for Amplitude analytics, or spin up a mock service to validate an integration. Because it understands structure, it’s effective at scaffolding small utilities (e.g., a data cleaner or a CLI harness) that make discovery and validation faster.

Day to day, Claude Code reduces handoffs. If I’m exploring a new partner integration, I’ll have it produce a curl library and a Postman collection, then annotate each step with acceptance criteria and expected responses. When I’m shaping a feature, I lean on it to outline event taxonomies and feature flags so that engineering can wire telemetry without guesswork. For insights work, I’ll ask it to propose SQL for cohort, funnel, and retention analysis—always verifying against source schemas before anything touches production.

Speed is only useful when it improves signal quality. I anchor the workflow in continuous discovery: small hypotheses, thin-slice prototypes, and fast instrumentation. Claude Code helps me estimate A/B testing readiness (including minimum detectable effect), generate smoke tests for critical user paths, and structure an eval-driven development loop so we learn from every iteration. It also supports context window management by summarizing long PRDs into the few constraints a prototype must respect.

Governance matters. I apply AI readiness and AI risk management principles: never paste secrets or PII, isolate sandboxes, and log prompts as docs-as-code for auditability. I prefer a retrieval-first pipeline that feeds approved product docs, OpenAPI specs, and design tokens so generations stay grounded. When tools are integrated, I favor the Model Context Protocol (MCP) to constrain capabilities and maintain least-privilege access. Human-in-the-loop review is non-negotiable—especially for anything that might influence customer data or pricing.

The best outcomes show up in product trios. I’ll facilitate a live session with design and engineering: we co-create prompts, compare alternatives, and converge on a thin slice we can ship. That collaboration keeps us empowered, reduces interpretation drift, and turns Claude Code into an accelerant rather than a sidecar. Over time, the trio curates a reusable prompt library for PRD outlines, experiment checklists, and integration playbooks.

Getting started is straightforward: define a safe environment, assemble your authoritative corpus (requirements, specs, taxonomies), and codify a few high-value templates—API exploration, instrumentation plans, sandbox data generators, and acceptance tests. Track impact with simple, objective metrics: cycle time from hypothesis to instrumented prototype, time-to-first-signal, and the proportion of decisions made with data versus opinion.

There are pitfalls. Hallucinated fields can creep into API calls, schema drift can break generated queries, and “clever” refactors may miss edge cases. I mitigate this by grounding generations in current specs, asking for unit tests alongside any code, and validating against a staging environment before anyone talks about production. Treat Claude Code as a collaborator, not an oracle.

If your mandate is to learn faster, de-risk bets, and ship with confidence, Claude Code is worth adopting. Used thoughtfully, it compresses the distance between questions and answers, elevates product discovery, and lets teams validate more ideas with fewer meetings—without compromising on governance or quality.

Inspired by this post on Product School.

June 12, 2026
Beyond Black‑Box Scores: Custom AI That Elevates Trust & Safety Without Burnout

What do you do when off-the-shelf moderation scores aren't good enough—and the alternative is paying human contractors to spend their days reviewing traumatizing content at scale? I’ve wrestled with that exact trade-off in enterprise environments, and it’s why I was eager to unpack how custom AI can raise the bar on trust and safety without compromising accuracy, latency, or the well-being of our teams.

In this episode of Just Now Possible, I sit down with Nikki Marinsek (Data Scientist), Brian McCaffrey (Software Engineer), and Dan Means (Machine Learning Engineer) from Musubi, an AI-native trust and safety toolkit for content platforms. Musubi builds custom-trained ML models and LLM-powered moderation tools that adapt to each platform's unique policies—from dating apps to social networks to AI inference endpoints. As a product leader, I’m drawn to their blend of eval-driven development, agentic AI, and pragmatic deployment pipelines that actually meet real-world SLAs.

We walk through their full journey—starting with a first prototype on tabular data—then discovering the system was sometimes catching issues human moderators missed. That insight became a forcing function to formalize evaluation, calibrate thresholds, and design feedback loops that help humans and models converge. Just as importantly, they built a policy optimizer that uses agentic flows so non-technical trust and safety teams can iterate on LLM moderation policies without needing a data scientist in the room.

If you’ve ever had to balance latency, accuracy, and cost at scale, you’ll appreciate how Musubi tests trade-offs across traditional ML, embedding-driven classification, and LLMs. Their approach mirrors the patterns I expect in high-throughput stacks: cache and pre-compute where possible, contain worst-case latencies, and push evaluation tooling to customers so policy changes are safe, observable, and fast to deploy.

What resonated most with me is their core product strategy: put eval tools directly in customers’ hands. When teams can benchmark AI against humans, referee disagreements using “LLM as judge,” and make policy gaps visible, trust increases and operational drift decreases. That’s the foundation for durable product strategy in sensitive domains like content moderation, fraud management, and risk scoring.

Listen to this episode on: Spotify | Apple Podcasts

Guests: Nikki Marinsek, Data Scientist, Musubi; Brian McCaffrey, Software Engineer, Musubi; Dan Means, Machine Learning Engineer, Musubi.

In this episode: Why off-the-shelf moderation scores fail and how custom-trained models fix that; How Musubi combines traditional ML with LLMs for different moderation tasks; The discovery that AI can outperform human moderators—and how to communicate that to clients; Using AI as a judge to referee disagreements between AI and human decisions; How Musubi onboards new customers with "reverse demos"; What custom model training actually means: fine-tuning, feature engineering, and reusable deployment pipelines; The policy optimizer: an agentic flow that helps customers iterate on their LLM moderation policies; Why pushing eval tools directly to customers is a core product strategy; How Musubi is building flexible orchestration workflows for non-technical trust and safety teams.

From a product management lens, a few highlights stand out. First, the disciplined separation of concerns: use traditional ML for high-precision, low-latency pattern detection and LLMs for nuanced policy interpretation. Second, invest in golden sets and policy loops early so you can quantify improvement and avoid subjective debates. Third, productize customization—create reusable deployment pipelines, parameterized policies, and self-serve evaluation—so each customer’s “custom model” still scales like a platform.

I also appreciated the onboarding tactic of "reverse demos." Rather than a canned walkthrough, the team invites customers to bring real policies and edge cases, then instruments the workflow live. That move builds credibility, accelerates discovery, and surfaces the fastest paths to value—an approach I recommend whenever you’re selling complex AI workflows to non-technical stakeholders.

If you’re navigating cost and latency trade-offs, the conversation goes deep on techniques like embedding-driven classification, fine-tuning vs. training, and when to route decisions through LLM adjudication. My takeaway: treat the router, the evaluator, and the policy as first-class products. When those elements are observable and testable, you can raise quality without exploding compute costs or creating operational bottlenecks.

Resources & Links: Musubi — AI-powered trust and safety toolkit for content platforms. Maven AI Evals Course — AI evals course.

Chapters: 00:00 Meet the Team; 01:18 Why Everyone Wears Product; 02:32 What Musubi Builds; 04:51 AI for Human Moderation; 09:59 Adversaries and Asymmetry; 11:48 Early Days and Low Latency; 13:35 First Prototype Slice; 15:33 Traditional ML Meets LLMs; 19:52 Benchmarking Against Humans; 23:09 LLM as Judge and Policy Gaps; 29:53 From Prototype to Platform; 31:15 Customer Onboarding Reverse Demos; 36:08 Custom Models Per Customer; 38:05 Fine Tuning vs Training; 39:14 Embedding Driven Classification; 40:04 Cost and Latency Tradeoffs; 43:21 Productizing Customization; 49:16 Scaling Prototypes to Production; 51:58 Golden Sets and Policy Loops; 56:17 Coaching Customers Safely; 01:02:06 Gamified Feedback Signals; 01:06:19 Agentic Toolkit Roadmap; 01:09:05 Workflow Orchestration Future; 01:12:06 Wrap Up and Thanks.

Ultimately, this is a playbook for modern trust and safety: align your models to your policies, make evals a habit not an event, and empower non-technical teams with agentic workflows and transparent metrics. That’s how we move beyond black-box scores to systems we can measure, manage, and trust.

Inspired by this post on Product Talk.

June 11, 2026

A Practical Model for Amplitude Behavioral Web Intelligence

Amplitude behavioral web intelligence is most useful when it is treated as a connected evidence system, not a collection of isolated visualizations. Aggregate analytics can locate a problem, page-level overlays can narrow it to an interface region, and session evidence can show the surrounding user experience.

The practical payoff is a shorter path from an observed performance gap to a focused experiment. The two supplied articles support that model from different angles: one describes the combined use of analytics, session replay, heatmaps, and zoning, while the other concentrates on placing engagement and revenue context directly over the page being evaluated.

Behavioral web intelligence works as an evidence stack

The broader Shivam.Consulting Blog overview of Session Replay, Heatmaps, and Zoning Insights presents the capabilities as complementary. Funnels, cohorts, and driver analysis reveal quantitative patterns; heatmaps summarize where attention concentrates or fades; zoning connects defined interface regions with outcomes; and replay supplies contextual evidence about individual sessions.

The companion article about Zoning Insights overlays examines a more specific part of that stack. It reports that engagement and revenue metrics can appear over a live site, placing behavioral information in the same visual frame as calls to action, navigation paths, and high-intent sections. It also recommends pairing this view with session replay and Web Vitals to consider behavioral, experiential, and performance signals together.

Taken together, the articles describe a progression from detection to diagnosis. Analytics identifies where a journey or outcome appears weak. Zoning and heatmaps focus attention on relevant page areas. Replay and performance signals provide possible explanations. A controlled experiment then determines whether the proposed change improves the defined outcome. No individual layer completes that chain by itself.

Match each lens to the question it can answer

A common analytical mistake is asking one tool to provide a conclusion beyond its evidence. The following decision map separates the roles reported in the two articles from the judgments a team still has to make.

Evidence lens	Question it helps answer	Appropriate use	Important limit
Funnels, cohorts, and drivers	Where does behavior differ or an outcome underperform?	Locate a journey stage, segment, or event that merits investigation.	An aggregate pattern does not explain the user experience behind it.
Heatmaps	Where does attention concentrate or dissipate?	Identify engagement hotspots and areas that may deserve design scrutiny.	Visible concentration alone does not establish user intent or business impact.
Zoning Insights	How are specific interface regions associated with engagement or outcomes?	Compare page areas and focus discussion on elements tied to activation, conversion, retention, or revenue context.	An observed association is not, by itself, proof that the region caused the outcome.
Session replay	What happened around a moment of friction?	Inspect representative sessions for confusing copy, a mismatched call to action, or an unexpected path.	A small set of sessions should not be treated as prevalence data.
Web Vitals	Could page performance be part of the experience?	Consider technical performance alongside behavioral friction.	A performance signal does not automatically explain the user’s decision.
A/B testing	Does a proposed change improve the predefined result?	Validate a focused intervention against a success measure.	An experiment is only as useful as its hypothesis, instrumentation, and outcome definition.

Turn page observations into testable product decisions

A disciplined workflow begins with an outcome rather than a page element. Both articles anchor analysis to goals such as activation and retention, while the zoning-focused post also emphasizes conversion and revenue context. This prevents a visually prominent interaction from being mistaken for a strategically important one.

The next move is to locate the behavioral break in the relevant funnel or journey. Teams can then examine the associated page through zoning and heatmap evidence, looking for interface regions whose engagement patterns are relevant to the selected outcome. Replay can be sampled around the same step or segment to identify plausible friction in context. Where appropriate, Web Vitals can indicate whether performance deserves a place in the hypothesis.

The resulting hypothesis should connect an observed behavior, a proposed explanation, and a measurable change. For example, a team might observe weak progression at a value-related step, find limited engagement with its primary action, and see replay evidence suggesting that the action is unclear. That combination justifies a targeted test; it does not yet prove the explanation.

Success should be defined before the experiment is run. The first source describes instrumenting events and setting success criteria upfront, while both sources position A/B testing as a way to validate improvements rather than merely confirm opinions. Keeping the intervention narrow also makes the result easier to interpret and connect back to the original evidence.

Shared context improves alignment, but not automatically rigor

The zoning-focused article argues that placing metrics over the live interface reduces tab-switching and gives growth, product, design, marketing, engineering, and conversion stakeholders a common frame of reference. The broader article similarly links the combined evidence to product trios and continuous discovery. The synthesis is organizational as much as analytical: the interface becomes a shared workspace for discussing behavior and prioritizing experiments.

That proximity can accelerate decisions, but it can also make a visual association feel more conclusive than it is. A revenue figure displayed beside a page region remains context, not automatic causal attribution. Heatmap intensity does not reveal why attention occurred, and a memorable replay does not show how often the same behavior happens. Teams still need aggregate measures, representative sampling, clear event definitions, and experiments that can challenge the preferred explanation.

The supplied articles are favorable practitioner-oriented accounts rather than comparative evaluations. They provide no benchmarks, experimental results, or comparisons with alternative platforms. They also do not discuss implementation governance. In practice, teams evaluating replay and detailed behavioral data should separately define appropriate privacy controls, access rules, retention practices, and instrumentation ownership before making the workflow routine.

Key takeaways

Use aggregate behavioral analytics to find the problem before inspecting individual pages or sessions.
Treat heatmaps and Zoning Insights as prioritization and diagnostic lenses, not standalone proof of causation.
Use session replay to develop explanations for a measured pattern, then return to quantitative evidence to assess their scope.
Connect page regions and experiments to predefined activation, conversion, retention, or revenue-related goals.
Give cross-functional teams the same visual evidence while preserving clear distinctions between observation, hypothesis, and validation.

The next step for a web team is to choose one consequential journey, connect its aggregate pattern to page and session evidence, and test the smallest change capable of resolving the uncertainty. Repeating that loop can turn behavioral web intelligence into a decision practice rather than another reporting layer.

References

June 11, 2026

Secure System Access for AI Agents: A Phased Control Model

An AI agent becomes operationally valuable when it can move beyond explaining a process and complete the underlying work. That same transition gives the agent access to sensitive data and consequential actions, so integration must be designed as both a product capability and a security boundary.

The practical objective is not maximum access. It is the smallest dependable set of permissions that lets an agent resolve a well-defined workflow, supported by deterministic controls, observable outcomes, and a clear path to human intervention.

System access changes both the value and the risk

Without backend access, an agent can describe how to update an account, check a renewal, or report a damaged order. With access to a CRM, billing platform, or order-management system, it can potentially retrieve the relevant record and complete the request during the conversation. The Intercom article presents this shift from answering to acting as a central difference between basic AI adoption and mature deployment.

The article cites Intercom’s 2026 Customer Service Transformation Report, reporting improved metrics among 87% of teams with mature AI deployments, compared with 62% overall. It also reports that 82% of senior leaders said their teams had invested in AI during the preceding year, while only 10% said they had reached mature deployment. These source-reported figures suggest an integration gap, but they do not independently establish that system access caused the reported improvements or that an integration is secure.

Security therefore cannot be added after the workflow succeeds. A customer-facing interface may remove the need to visit a separate application, but it must not remove identity and authorization checks. The agent still needs a trustworthy way to associate the request with the correct customer, determine what that customer is permitted to do, and constrain the backend operation accordingly.

Choose workflows where access justifies its complexity

Not every automated conversation benefits equally from deeper integration. Intercom reports the results of rebuilding four fixed, scripted Tasks as Procedures with system access. Over the 12 months through May 2026, the reported resolution rate for its bounce-list workflow rose from 9.3% to 79.9%, while bug reporting increased from 9.2% to 66.5%. Email forwarding moved from 44.9% to 66.5%, but Messenger installation rose only from 67% to 69.2%.

The variation is more instructive than the headline gains. According to the article, the bounce-list process required multi-step reasoning, dynamic branches, and error recovery. Bug reporting still ended in a human handoff, but the procedure improved that handoff by pre-triaging the issue, surfacing possible GitHub matches, extracting relevant URLs, and requesting impersonation access. Messenger installation was already a comparatively linear process, leaving less room for improvement.

A suitable first integration is therefore not merely a popular support topic. It should be high-volume and repeatable, have an identifiable system owner, and depend on live data or actions that materially change the outcome. Existing APIs improve feasibility, but the security review should also consider data sensitivity, reversibility, authorization complexity, and the consequences of acting on an ambiguous request.

Use an access ladder instead of a single launch

The phased approach described by Intercom can also serve as a security model. Each stage expands capability only after the workflow and its controls have produced enough evidence to justify the next step.

Stage	Agent capability	Appropriate use	Control emphasis
No integration	Guide, troubleshoot, check policy, triage, and route	Discover where explanations repeatedly lead to manual work	Evaluate answer quality, routing accuracy, and escalation behavior
Read-only access	Retrieve approved fields such as order or subscription status	Resolve information requests without changing a record	Restrict endpoints, records, and fields; verify customer authorization
Write access	Update records or initiate actions such as cancellations or refunds	Complete bounded workflows after earlier stages are dependable	Validate inputs, limit action scope, record outcomes, and require approval where consequences warrant it

Mock responses can test branching logic before an API is ready, as the Intercom article notes. It also proposes a temporary human-in-the-loop step when an integration is still several engineering sprints away. These methods can validate the workflow and expose missing requirements, but simulated success should not be treated as proof that production identity, authorization, failure recovery, and audit controls are ready.

Put deterministic controls around probabilistic decisions

Plain-language workflow instructions can guide an agent, but security-critical constraints should not depend solely on the model interpreting those instructions correctly. A safer architecture places enforceable controls between the agent and each backend system.

Control	Practical design implication
Dedicated identity	Give the agent its own service identity rather than borrowing a staff account, so permissions and activity remain attributable.
Least privilege	Allow only the endpoints, operations, records, and fields required by the selected workflow.
Read and write separation	Keep retrieval permissions distinct from mutation permissions and grant write access only when the use case requires it.
Independent policy enforcement	Validate identity, authorization, limits, and required inputs outside the model before executing an operation.
Bounded actions	Prefer narrow, purpose-built operations over unrestricted database or administrative access.
Human approval and escalation	Route ambiguous, exceptional, sensitive, or difficult-to-reverse cases to an authorized person.
Auditability and monitoring	Record the request, decision, tool call, result, and escalation so failures and unusual patterns can be investigated.
Safe failure behavior	Prevent retries, timeouts, or partial completion from producing duplicated or inconsistent changes.

The integration request should document the workflow in plain language, identify every read and write point, name the system owner, and specify the minimum required fields. It should also define how success and harm will be measured: not only whether the agent completed the conversation, but whether it selected the correct record, performed the authorized action once, protected restricted data, and escalated when it lacked sufficient confidence or permission.

This framing also improves the business case. Engineering is being asked to expose a narrowly scoped capability with explicit boundaries, rather than to provide broad access to a general-purpose agent. Leadership can then compare measurable workflow value with implementation effort and residual risk.

Key takeaways

System access creates value when it lets an agent complete work, but it simultaneously expands the security boundary.
The best initial workflow is frequent, bounded, operationally meaningful, and owned by a team that can approve its data and actions.
Progress from no integration to read-only retrieval and then to narrowly scoped write operations; do not treat access as an all-or-nothing decision.
Enforce identity, authorization, field restrictions, action limits, and audit logging outside the model’s natural-language instructions.
Evaluate correctness, unauthorized-action risk, failure recovery, and handoff quality alongside resolution rate.

The strongest long-term pattern is a portfolio of small, governed capabilities rather than one broadly privileged agent. Each successful workflow can supply the evidence needed to extend access deliberately, while keeping the consequences of error visible and contained.

References

Intercom — Win Executive Buy-In for AI Agent System Access: Unlock Actions, Boost Resolution, Cut Costs

June 11, 2026

How Agentic Analytics Reshapes Product Development Roadmaps
Agentic, analytics-driven product development changes the role of product data. Instead of waiting for teams to interpret dashboards and debate a backlog, an agent can help detect behavioral friction, estimate opportunities, propose interventions, and monitor whether a release improves the intended outcome.

The practical payoff is not an automatically generated roadmap. It is a tighter decision system in which evidence, experiments, delivery controls, and human judgment reinforce one another. The two source articles approach that system from complementary angles: one describes the operating loop around Amplitude Wave, while the other emphasizes the engineering and organizational foundations required to make agentic recommendations dependable.

The product agent is a decision loop, not a smarter dashboard

Traditional analytics tools help teams inspect funnels, cohorts, journeys, activation, and retention. The article about Amplitude Wave describes a more proactive model: an agent continuously scans behavioral data for friction, proposes a next-best improvement, supports validation through A/B testing, and uses feature flags to control rollout. After launch, the loop continues by monitoring activation, retention, and downstream revenue rather than treating deployment as the finish line.

The companion article makes a similar distinction between reporting and agency. It presents agentic systems as capable of proposing, testing, and learning, provided that recommendations remain connected to rigorous behavioral analytics. Synthesized together, the sources describe four linked functions: observation identifies where behavior diverges from an intended journey; prioritization weighs the size, risk, and confidence of an opportunity; experimentation tests whether a proposed change causes improvement; and monitoring determines whether to expand, revise, or retire that change.

This framing matters because an agent that only generates feature ideas adds another opinion to roadmap planning. An agent that connects ideas to observed behavior, controlled tests, and post-release measurement can instead reduce the distance between a weak signal and a defensible product decision.

Reliable recommendations depend on an analytics and evaluation stack

Both sources put instrumentation ahead of automation. The Wave article calls for clearly defined events, models that connect those events to user and account journeys, explicit success metrics, and governance around data quality and privacy. Without that foundation, an agent can produce confident explanations from incomplete or misleading evidence.

The second article extends the foundation into three technical capabilities. It advocates a unified analytics platform that brings quantitative behavior together with qualitative context, evaluation harnesses that test prompts, policies, and models for regressions, and a retrieval-first pipeline that grounds an agent in trusted organizational information. These layers address different failure modes: analytics establishes what users did, retrieval supplies relevant business context, and evaluations test whether the agent behaves reliably as its components change.

Interoperability broadens the evidence available to the system. The Wave article points to CRM integration, session replay, and support systems as useful connections for relating product behavior to customer value and go-to-market effects. CI/CD, experimentation tools, and feature flags then connect analysis to controlled delivery. The resulting architecture is less a standalone AI feature than a chain of evidence and controls spanning discovery, development, release, and measurement.

That chain also establishes a sensible boundary for automation. Behavioral correlations may justify investigation, but they do not by themselves establish causality. A/B testing can provide stronger causal evidence when it is appropriate and well designed; qualitative context can explain why a pattern may be occurring; and human review can catch strategic, ethical, or operational considerations that product telemetry does not represent.

Roadmaps become portfolios of measurable opportunities

When agents can surface evidence-backed opportunities, roadmap discussions can move away from ranking requested features in isolation. The unit of planning becomes an outcome-linked opportunity: a behavioral problem, the users or accounts affected, the metric expected to move, the evidence supporting the hypothesis, and the safest way to test it.

This does not eliminate product strategy. It makes strategy more explicit. Teams still decide which customers and outcomes matter, what constraints apply, and which trade-offs are acceptable. The agent can help maintain a current view of behavioral evidence and shorten the analysis cycle, but it cannot derive organizational priorities from telemetry alone.

The sources also connect this operating model to empowered product teams, product trios, continuous discovery, and outcomes-versus-output OKRs. In that environment, an agent is best treated as a participant in the discovery and delivery workflow: it can surface anomalies, assemble relevant context, suggest hypotheses, and track results, while the team remains accountable for framing the problem and authorizing consequential decisions.

The Wave article illustrates the intended scale of intervention with an onboarding example. It reports that an agent identified drop-off around a confusing configuration step; targeted in-app guidance and tooltips were then released behind feature flags, followed by a material improvement in activation with limited engineering effort. The report is a useful illustration of the loop, but it provides no numerical effect size or independent validation. It therefore supports the workflow concept more strongly than any general claim about expected results.

Governance determines how much autonomy an agent earns

Automation should expand according to demonstrated reliability and the reversibility of the action. Early implementations can begin in an advisory role, identifying friction and preparing evidence for a team to review. A later stage can allow the agent to configure draft experiments or recommend feature-flag settings. Direct changes to production warrant a higher threshold because errors can affect customers, revenue, privacy, and trust.

The Wave article explicitly calls for policies governing data use, review thresholds for automated changes, privacy-by-design, and human checkpoints for high-impact decisions. The engineering-focused article complements those controls with eval-driven development, including tests intended to detect reliability and safety regressions across prompts, policies, and models. Together, these ideas suggest that autonomy should be earned through observable performance rather than granted because an agent appears persuasive.

A practical adoption sequence follows from the synthesis. First, define the outcome and the decisions the agent may inform. Next, verify event quality and journey models before asking the system to prioritize opportunities. Then connect recommendations to a controlled experimentation and release process. Finally, evaluate both product impact and agent behavior, expanding permissions only when the evidence supports it. This sequence keeps the initial scope narrow while creating a path toward a more capable product-development system.

Key takeaways
- An agentic product workflow should connect behavioral observation, opportunity prioritization, experimentation, controlled delivery, and post-release measurement.
- High-quality event data is necessary but insufficient; grounded retrieval, qualitative context, and evaluation harnesses make recommendations more dependable.
- Roadmaps become more evidence-driven when teams plan around measurable opportunities rather than treating feature requests as predetermined commitments.
- Human judgment remains essential for strategy, causal interpretation, risk assessment, and high-impact release decisions.
- Agent autonomy should increase only as evaluations, governance controls, and observed performance justify broader permissions.
The near-term opportunity is to build a disciplined learning loop before pursuing full autonomy. Organizations that make their data trustworthy, their outcomes explicit, and their release controls measurable will be better positioned to let product agents take on more consequential work without weakening accountability.

References
- Shivam.Consulting Blog — Inside Amplitude Wave: The Proactive AI Product Agent That Reveals What to Build Next
- Shivam.Consulting Blog — Why Agentic, Data-Driven Product Development Excites Me—and How It Redefines Roadmaps
June 10, 2026
A Layered Playbook for Package Supply Chain Security
Package supply chain security is not simply a matter of choosing reputable libraries. The practical challenge is controlling an expanding dependency graph, the code that executes during installation, the resources that installed software can reach, and the automated tools allowed to make those decisions.

A useful defensive model follows the path an attack must take: enter through a package or dependency, execute in the development environment, discover valuable information, and transmit it elsewhere. Organizing safeguards around that sequence produces a stronger posture than relying on any single scanner, sandbox, or package reputation signal.

Package risk grows through the dependency graph

Developers usually evaluate the packages they select directly. The less visible risk lies in transitive dependencies: packages installed because another dependency requires them. The source article illustrates the scale of this effect by reporting that installing Jest brought in 266 packages. That example is not evidence that those dependencies were malicious; it shows how one deliberate choice can create hundreds of additional trust relationships.

This changes the unit of review. The relevant question is not only whether a named package appears legitimate, but whether its complete dependency graph is proportionate to the job. A small utility that introduces unfamiliar native modules, unrelated capabilities, or an unexpectedly broad tree deserves more scrutiny than its simple interface might suggest.

Manifests such as package.json, pyproject.toml, and requirements.txt make dependency installation repeatable. Repeatability alone, however, does not guarantee safety. If version ranges or unresolved transitive dependencies allow later releases to enter automatically, two installations based on the same manifest can produce different risk profiles. Pinning direct and transitive versions converts an evolving external graph into a more deliberate, reviewable input.

Match defenses to the stages of a package attack

The source article says an analysis covering more than 230,000 malicious-code incidents found a recurring pattern: malicious code first needs an entry point, then searches the device for sensitive data, and finally uses a network connection to exfiltrate what it finds. This reported pattern suggests three distinct control points.

Reduce risky entry and automatic execution

A waiting period for newly published packages can reduce exposure to releases that have not yet attracted community scrutiny. The article recommends installing only packages that are at least seven days old. That is a risk filter, not a guarantee: an older malicious package can remain undetected, while a legitimate urgent fix may occasionally justify an exception.

Installation scripts require separate treatment because they may execute before a developer has inspected the installed code. Disabling automatic install hooks by default creates a decision point. A package that depends on a post-install action can still be used, but the script, its purpose, and the capabilities it invokes should be reviewed first.

Constrain access after installation

Pre-install review cannot catch every problem. The next layer limits what package code can inspect or modify if it does execute. Sandboxed folders and isolated development environments can reduce the blast radius, but the source cautions that isolation by itself does not prevent malicious code from entering. Access boundaries therefore complement package controls rather than replace them.

Limit unnecessary network egress

Stolen information has less value to an attacker if malicious code cannot transmit it. Restricting unnecessary outbound connectivity addresses the final stage of the reported pattern. This layer matters because a package may evade provenance review and execute inside an environment despite earlier controls. Entry controls, resource boundaries, and egress restrictions together create independent opportunities to interrupt the attack.

Provenance is a decision process, not a trust badge

No single popularity or identity signal proves that a release is safe. The source proposes evaluating maintainer history, download patterns, repository activity, signed releases, and consistency across registries. Their value comes from comparison: a sudden change in maintainership, an unusual release pattern, or a mismatch between repository and registry information may warrant investigation even when each signal looks plausible in isolation.

Context also matters. Dependency behavior should be compared with the package’s stated purpose. A capability that is normal for a database driver may be difficult to justify in a formatting utility. This purpose-to-capability test helps teams focus limited review time on anomalies rather than treating every dependency as equally suspicious.

These checks work best when they lead to a clear disposition: approve the package and lock the reviewed version, replace it with a narrower dependency, inspect it more deeply, or decline it. Provenance information without a decision rule can become documentation that does not change behavior.

AI coding agents must inherit the same installation policy

AI-assisted development introduces a governance problem as much as a technical one. A coding agent may be able to select and install a package while pursuing a larger task, compressing several human decisions into one automated action. If it can also reach broad areas of the file system and use the network, a malicious dependency may encounter a larger potential blast radius.

The source describes workflows in which Claude searches, creates, and edits files across a broad knowledge system, including notes derived from downloaded PDFs. That breadth provides productivity value, but it also makes one-folder isolation impractical for the reported workflow. The proposed response is disciplined configuration: hooks require the agent to follow the same package-age, install-script, provenance, and dependency rules expected of a human developer.

This principle is more durable than a rule tied to one assistant. Package policy should apply consistently whether an installation is initiated by a developer, an AI agent, a local automation script, or a build process. The initiator may change; the acceptable evidence, permissions, and exceptions should not.

Key takeaways
- Review the full dependency graph, because the packages selected directly represent only part of the installed attack surface.
- Use a waiting period for new releases as one filter, while preserving a documented path for justified exceptions.
- Prevent install scripts from running automatically until their purpose and behavior have been examined.
- Combine provenance checks with a purpose-to-capability test and an explicit approve, investigate, replace, or reject decision.
- Pin direct and transitive versions, then run recurring audits to detect issues discovered after installation.
- Apply the same package rules to coding agents, automation, local development, and build environments.
- Layer installation controls, resource constraints, and network egress limits so that one missed signal does not determine the outcome.
A mature package security posture will increasingly depend on making these controls routine and machine-enforceable. As development becomes more automated, the teams best positioned to move quickly will be those that turn package trust from an informal judgment into a consistent operating policy.

References
- Shivam.Consulting Blog – Stop Package Breaches Before They Start: My Proven Playbook to Block Common Entry Points
June 10, 2026
AI Agent Product Development: From Workflow to Autonomy
AI agent product development is not primarily a model-selection exercise. It is the work of turning a business outcome into a bounded system that can retrieve information, use tools, make decisions, and escalate safely.

The practical payoff comes from sequencing those capabilities carefully. A focused workflow, explicit measures, controlled access, and continuous evaluation provide a more credible path to value than attempting broad autonomy at launch.

Key takeaways
- Define the business outcome and proof of success before choosing prompts, models, or tools.
- Begin with a repeatable workflow whose inputs, outputs, and failure conditions can be judged clearly.
- Increase capability in stages: relevant retrieval, limited tools, read-only integrations, controlled actions, and then broader autonomy.
- Treat privacy, governance, evaluation, observability, and human escalation as product requirements from the beginning.
- Scale only when operational quality and the intended business outcome remain stable in production.
Start with a decision contract, not an agent concept

An agent initiative becomes testable when the team can state what decision or task the system will handle, what information it requires, what it must never do, and how success will be measured. This creates a decision contract between the product, its users, and the organization operating it.

The supplied source recommends anchoring an AI strategy to one measurable outcome before writing a prompt or selecting a model. It gives lead response time, first-contact resolution, and time-to-first-value as possible measures. Those examples illustrate an important distinction: the agent is a means of changing workflow performance, not the outcome itself.

This framing also makes AI readiness concrete. Instead of asking whether an organization is generally ready for agents, a product team can examine one workflow: Is the required data available? Are the inputs sufficiently consistent? Can acceptable output be recognized? Are the constraints and escalation conditions explicit? A negative answer identifies product work to complete; it does not automatically call for a more capable model.

A useful initial scope therefore has clear boundaries and frequent enough repetition to produce evidence. The source identifies support-ticket triage, inbound-lead qualification, and account-note summarization as examples. Their significance is not that every organization should adopt them, but that they offer observable inputs and outputs. That makes errors easier to classify and improvements easier to evaluate.

Design capability as an autonomy ladder

The core architectural question is not whether an agent can perform an action. It is what evidence should be required before the product is allowed to perform that action without review. Treating capability as an autonomy ladder gives the team intermediate states between a passive assistant and an unrestricted operator.

The source proposes a retrieval-first pipeline that introduces only relevant knowledge into the context window. In product terms, retrieval is part of the experience contract: the system should receive the information needed for the task without being burdened by unrelated material. This can improve the conditions for relevant responses, although retrieval does not eliminate the need to evaluate the final behavior.

Tool access should be similarly bounded. The source recommends a small, explicit tool catalog, with the agent’s role, constraints, and escalation routes documented. It also points to Model Context Protocol as a way to standardize tool invocation across services. Standardization can make integrations more consistent, but it does not decide which tools the agent should receive or what permissions those tools should carry; those remain product and risk decisions.

Systems of record deserve special caution. The source advises beginning with read-only CRM integration and adding actions only after reliability has been demonstrated. This suggests a practical progression: first observe and recommend, then prepare an action for approval, and only later execute eligible actions within defined limits. Each step creates new failure consequences, so each should have its own evidence threshold.

Prompt engineering belongs inside this broader capability design. A prompt can express the agent’s role and boundaries, but predictable operation also depends on retrieved context, tool definitions, permissions, timeouts, escalation logic, and the surrounding user experience. Managing only the prompt would leave much of the product’s actual behavior outside the team’s control.

Make trust an executable product requirement

Agent risk becomes manageable when broad principles are translated into system behavior. Privacy-by-design should affect what data enters the workflow. Data governance should determine which sources and actions are permitted. Human oversight should appear as an explicit escalation path rather than an informal promise that someone can intervene.

The source calls for regression evaluations covering safety, accuracy, and bias, alongside logs of agent actions, rate limits, timeouts, and risk scoring for high-impact operations. Together, these controls form a layered safety model. Evaluations test expected behavior before and during release; operational limits constrain runtime exposure; logs support diagnosis and accountability; and risk gates determine when automation must stop or seek approval.

Uncertainty should also have a designed destination. According to the source, the default response for high-stakes or uncertain situations should be human escalation. A useful handoff needs more than a generic error message: the receiving person should be able to understand the request, the context used, the action considered, and why the system declined to continue. Handoff quality is therefore part of the product experience as well as the risk model.

This approach avoids treating guardrails as a final compliance checkpoint. When controls are defined alongside workflow requirements, they influence architecture, permissions, interface design, analytics, and release criteria. Trust then becomes something the team can test and operate, rather than a claim attached to the launch.

Use two evidence loops to decide when to scale

An agent can appear technically competent without improving the business outcome that justified it. Product development therefore needs two connected evidence loops: one for operational quality and another for workflow impact.

For operational quality, the source recommends monitoring precision, latency, containment, and handoff quality through agent analytics. These measures answer different questions. Precision concerns whether outputs or decisions are correct enough for the task. Latency affects whether the agent fits the pace of the workflow. Containment indicates how often work remains within the automated path. Handoff quality examines whether escalation preserves context and enables a productive recovery.

The business loop returns to the original outcome, using outcomes-versus-output OKRs to avoid equating shipped features with value. A team might improve a prompt, add a tool, or increase containment while leaving the target workflow unchanged. That is useful diagnostic progress, but it is not yet evidence that the product investment is working.

The source also recommends A/B testing prompts and tools and considering minimum detectable effect when sizing experiments. Experimentation is most informative when the changed component, eligible population, success measure, and guardrails are defined in advance. Otherwise, movement in a downstream metric can be difficult to attribute to the agent change.

Qualitative learning completes the loop. The source describes product trios spanning product management, design, and engineering, supported by continuous discovery, weekly transcript review, and the conversion of failure modes into test cases. It also recommends keeping prompts, tools, and evaluations versioned through a docs-as-code approach. This connects discovery to engineering discipline: observed failures become reproducible evaluations, evaluated changes become versioned releases, and releases can be compared or reversed.

Scope and autonomy should expand only when both loops support the decision. Stable technical metrics without workflow impact suggest that the use case or experience needs reconsideration. Business improvement accompanied by unsafe or unreliable behavior suggests that scaling is premature. Evidence across both dimensions supports a measured move into adjacent tasks or higher-impact actions.

Build the next release around earned autonomy

The durable pattern for AI agent products is earned autonomy: every increase in access or authority follows evidence from a narrower operating state. As evaluations accumulate and real workflow performance becomes visible, teams can make expansion decisions based on demonstrated capability rather than the apparent fluency of a demo.

References
- Shivam.Consulting Blog — Kickstart AI Agents with Confidence: 5 Proven Practices I Use to Ship Impact Fast
June 10, 2026
Why We Made Fin the Most Open Agent: Instant HubSpot & Freshdesk Support With 76% Resolutions

I’ve spent my career pairing product strategy with customer reality, and nothing is more clear right now than the demand for openness and speed. Today, we’re announcing that Fin can be used as a Service Agent on top of HubSpot and Freshworks, meaning you can use the world’s best Agent without migrating off your helpdesk.

Hubspot and Freshdesk customers can now:

Get Fin live, integrated, and working seamlessly in less than an hour.

Delivering a 76% average resolution rate.

Across all customer channels (voice, email, chat, social, and more).

Resolving complex queries that require reading and writing to third party systems.

With everything fully configurable to follow the unique policies of every individual business.

This launch is a very visible step in a journey we’ve been on from day one: building an open, customer-first platform that plays well with the rest of your stack. We’ve long known that businesses want flexibility in how they configure their customer-facing tech stack. Since the very beginning, we have built Fin as an open platform, with APIs, MCPs, CLI, and opening up access to Apex, our proprietary trained model that delivers best in class performance.

To make things easy for our customers, we have extensive public documentation of our product on our website, in our help center, and in our developer docs. We are the only Agent company in our space to do this, others hide most details behind sign-in screens, which we don’t believe is the right thing to do.

Open Agent platforms will win because customers refuse to be boxed into closed ecosystems. We now believe our category has reached a stage where customers demand open platforms, that those who open up are more likely to win, and those who remain closed and protectionist will accelerate their demise.

We are operating in a fast changing world, and customers do not want to be locked into a single vendor or closed ecosystem. They want the ability to experiment, to swap things in and out, and move everything with ease, technically and commercially.

In an open world, the best product will win. In a world where businesses can easily swap vendors, the best product will win. We are happy to compete on that front, confident that Fin delivers the best customer experience and the highest performance.

From a product management lens, this openness is powered by agentic AI patterns paired with robust CRM integration. Under the hood, we use Model Context Protocol (MCP), well-documented APIs, and orchestrated AI workflows to read from and write to third-party systems. That’s how Fin handles true multi-channel work—including voice AI agent scenarios—while giving teams the observability they need through Agent Analytics.

If you are a Hubspot or Freshdesk customer, you can now have Fin integrated and live within an hour, without needing any help from us. We’re here if you want us, but as part of our commitment to building an open platform, we’ve designed everything to be self-servable—start in minutes or watch a quick demo of how everything works.

Fin for Hubspot

Fin for Freshdesk

Inspired by this post on The Intercom Blog.

June 9, 2026

Author: Shivam Tiwari

Behavioral web intelligence works as an evidence stack

Match each lens to the question it can answer

Turn page observations into testable product decisions

Shared context improves alignment, but not automatically rigor

Key takeaways

References

System access changes both the value and the risk

Choose workflows where access justifies its complexity

Use an access ladder instead of a single launch

Put deterministic controls around probabilistic decisions

Key takeaways

References

The product agent is a decision loop, not a smarter dashboard

Reliable recommendations depend on an analytics and evaluation stack

Roadmaps become portfolios of measurable opportunities

Governance determines how much autonomy an agent earns

Key takeaways

References

Package risk grows through the dependency graph

Match defenses to the stages of a package attack

Reduce risky entry and automatic execution

Constrain access after installation

Limit unnecessary network egress

Provenance is a decision process, not a trust badge

AI coding agents must inherit the same installation policy

Key takeaways

References

Key takeaways

Start with a decision contract, not an agent concept

Design capability as an autonomy ladder

Make trust an executable product requirement

Use two evidence loops to decide when to scale

Build the next release around earned autonomy

References