Tag: Model Context Protocol (MCP)

  • Claude Code for Product Managers: Accelerate Prototypes, Validate Faster, Ship with Confidence

    I build products under constant pressure to learn faster without breaking trust. Claude Code has become a pragmatic addition to my AI product toolbox because it helps me move from idea to evidence with less friction—while keeping engineering, design, and compliance in the loop.

    “Claude Code for Product Managers explained: what it is, why it matters, and how it helps PMs prototype, validate, and move faster.” That line captures the essence. In practice, I use it to turn ambiguous problem statements into tangible artifacts—API stubs, SQL queries, test data, and lightweight prototypes—that sharpen conversation and accelerate decision cycles.

    What is it in PM terms? A code-aware assistant that helps me prototype safely and quickly. I can generate example API calls, transform messy CSVs for retention analysis, draft instrumentation plans for Amplitude analytics, or spin up a mock service to validate an integration. Because it understands structure, it’s effective at scaffolding small utilities (e.g., a data cleaner or a CLI harness) that make discovery and validation faster.

    Day to day, Claude Code reduces handoffs. If I’m exploring a new partner integration, I’ll have it produce a curl library and a Postman collection, then annotate each step with acceptance criteria and expected responses. When I’m shaping a feature, I lean on it to outline event taxonomies and feature flags so that engineering can wire telemetry without guesswork. For insights work, I’ll ask it to propose SQL for cohort, funnel, and retention analysis—always verifying against source schemas before anything touches production.

    Speed is only useful when it improves signal quality. I anchor the workflow in continuous discovery: small hypotheses, thin-slice prototypes, and fast instrumentation. Claude Code helps me estimate A/B testing readiness (including minimum detectable effect), generate smoke tests for critical user paths, and structure an eval-driven development loop so we learn from every iteration. It also supports context window management by summarizing long PRDs into the few constraints a prototype must respect.

    Governance matters. I apply AI readiness and AI risk management principles: never paste secrets or PII, isolate sandboxes, and log prompts as docs-as-code for auditability. I prefer a retrieval-first pipeline that feeds approved product docs, OpenAPI specs, and design tokens so generations stay grounded. When tools are integrated, I favor the Model Context Protocol (MCP) to constrain capabilities and maintain least-privilege access. Human-in-the-loop review is non-negotiable—especially for anything that might influence customer data or pricing.

    The best outcomes show up in product trios. I’ll facilitate a live session with design and engineering: we co-create prompts, compare alternatives, and converge on a thin slice we can ship. That collaboration keeps us empowered, reduces interpretation drift, and turns Claude Code into an accelerant rather than a sidecar. Over time, the trio curates a reusable prompt library for PRD outlines, experiment checklists, and integration playbooks.

    Getting started is straightforward: define a safe environment, assemble your authoritative corpus (requirements, specs, taxonomies), and codify a few high-value templates—API exploration, instrumentation plans, sandbox data generators, and acceptance tests. Track impact with simple, objective metrics: cycle time from hypothesis to instrumented prototype, time-to-first-signal, and the proportion of decisions made with data versus opinion.

    There are pitfalls. Hallucinated fields can creep into API calls, schema drift can break generated queries, and “clever” refactors may miss edge cases. I mitigate this by grounding generations in current specs, asking for unit tests alongside any code, and validating against a staging environment before anyone talks about production. Treat Claude Code as a collaborator, not an oracle.

    If your mandate is to learn faster, de-risk bets, and ship with confidence, Claude Code is worth adopting. Used thoughtfully, it compresses the distance between questions and answers, elevates product discovery, and lets teams validate more ideas with fewer meetings—without compromising on governance or quality.


    Inspired by this post on Product School.


    Book a consult png image
  • Kickstart AI Agents with Confidence: 5 Proven Practices I Use to Ship Impact Fast

    Kickstart AI Agents with Confidence: 5 Proven Practices I Use to Ship Impact Fast

    I’ve spent the last few years guiding teams as we bring AI agents into real customer workflows, and I’ve learned that success isn’t about hype—it’s about disciplined product thinking. The payoff is huge when you get it right: faster execution, lower costs, and happier customers. The path there, however, requires clarity, tight scope, robust guardrails, and relentless iteration.

    AI agents will completely change the way you work, but they’re still tools that need to be learned. Discover five best practices for getting started with AI agents.

    First, I anchor our AI Strategy to a single, measurable outcome. Before writing a prompt or choosing a model, I define the job-to-be-done and the success metric that proves value—think lead response time, first-contact resolution, or “time-to-first-value.” This outcome framing is how I assess AI readiness: we translate a business goal into a scoped workflow, identify the required data, and write down constraints. It keeps us from building cool demos that never move a KPI.

    Second, I start small with one high-signal, repeatable workflow. I look for processes with clear inputs and outputs where the agent can be judged objectively—triaging support tickets, qualifying inbound leads, or summarizing account notes. Then I wire a retrieval-first pipeline that brings only the most relevant knowledge into the context window, reducing hallucinations and speeding responses. If the workflow touches systems of record, I begin with read-only CRM integration and gradually add actions once the agent proves reliable.

    Third, I design the agent’s capabilities with intentional prompt engineering and tool use. I document the system role, constraints, and escalation paths, and I give the agent a small, explicit tool catalog instead of an all-you-can-eat toolbox. When appropriate, I standardize tool invocation with Model Context Protocol (MCP) so the agent can call reliable functions consistently across services. This keeps behavior predictable and auditable as we expand AI workflows.

    Fourth, I bake in AI risk management from day one. That means privacy-by-design, clear data governance, and eval-driven development with regression tests for safety, accuracy, and bias. I log every agent action for observability, add rate limits and timeouts, and use risk scoring to gate high-impact operations. When the agent is uncertain or the stakes are high, it escalates to a human by default. These guardrails earn stakeholder trust and prevent fire drills later.

    Fifth, I measure, learn, and scale with evidence. I run A/B testing on prompts and tools, track minimum detectable effect (MDE) to size experiments, and monitor Agent Analytics for precision, latency, containment, and handoff quality. I pair these with outcomes vs output OKRs so the team focuses on real results, not feature counts. When the metrics hold steady in production, I broaden the scope to adjacent tasks and raise autonomy in small, safe increments.

    On the team side, I organize product trios (PM, design, engineering) with a continuous discovery cadence. We review conversation transcripts weekly, capture failure modes, and turn them into test cases. A lightweight docs-as-code approach keeps prompts, tools, and evals versioned, so we can roll forward and back without drama. This is how we move fast without breaking trust.

    If you’re just starting, pick one workflow, set a clear metric, instrument it end to end, and let your evals and users teach you where to go next. Agentic AI rewards focus and discipline. The moment you see an agent reliably shave hours off a process—or rescue an interaction at 2 a.m.—you’ll know the investment is compounding.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Why We Made Fin the Most Open Agent: Instant HubSpot & Freshdesk Support With 76% Resolutions

    I’ve spent my career pairing product strategy with customer reality, and nothing is more clear right now than the demand for openness and speed. Today, we’re announcing that Fin can be used as a Service Agent on top of HubSpot and Freshworks, meaning you can use the world’s best Agent without migrating off your helpdesk.

    Hubspot and Freshdesk customers can now:

    Get Fin live, integrated, and working seamlessly in less than an hour.

    Delivering a 76% average resolution rate.

    Across all customer channels (voice, email, chat, social, and more).

    Resolving complex queries that require reading and writing to third party systems.

    With everything fully configurable to follow the unique policies of every individual business.

    This launch is a very visible step in a journey we’ve been on from day one: building an open, customer-first platform that plays well with the rest of your stack. We’ve long known that businesses want flexibility in how they configure their customer-facing tech stack. Since the very beginning, we have built Fin as an open platform, with APIs, MCPs, CLI, and opening up access to Apex, our proprietary trained model that delivers best in class performance.

    To make things easy for our customers, we have extensive public documentation of our product on our website, in our help center, and in our developer docs. We are the only Agent company in our space to do this, others hide most details behind sign-in screens, which we don’t believe is the right thing to do.

    Open Agent platforms will win because customers refuse to be boxed into closed ecosystems. We now believe our category has reached a stage where customers demand open platforms, that those who open up are more likely to win, and those who remain closed and protectionist will accelerate their demise.

    We are operating in a fast changing world, and customers do not want to be locked into a single vendor or closed ecosystem. They want the ability to experiment, to swap things in and out, and move everything with ease, technically and commercially.

    In an open world, the best product will win. In a world where businesses can easily swap vendors, the best product will win. We are happy to compete on that front, confident that Fin delivers the best customer experience and the highest performance.

    From a product management lens, this openness is powered by agentic AI patterns paired with robust CRM integration. Under the hood, we use Model Context Protocol (MCP), well-documented APIs, and orchestrated AI workflows to read from and write to third-party systems. That’s how Fin handles true multi-channel work—including voice AI agent scenarios—while giving teams the observability they need through Agent Analytics.

    If you are a Hubspot or Freshdesk customer, you can now have Fin integrated and live within an hour, without needing any help from us. We’re here if you want us, but as part of our commitment to building an open platform, we’ve designed everything to be self-servable—start in minutes or watch a quick demo of how everything works.

    Fin for Hubspot

    Fin for Freshdesk


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Parallel Agents Are the Next Level of Vibe Coding: Faster, Smarter, and More Reliable AI

    I’ve spent the past year watching single-agent systems hit their ceiling in production, and it’s clear to me that the next inflection point is here: parallel agents. This isn’t a fad or a framework-of-the-month; it’s a practical evolution that lets us ship AI that’s faster, more consistent, and easier to reason about in real-world products.

    When I say “vibe coding,” I mean the product craft of shaping AI behavior through prompts, examples, and constraints to achieve a specific user experience—long before we overinvest in code or brittle rules. Parallelism upgrades that craft. Or, as I’ve been framing it with teams, “the next level of vibe coding: Why parallel agents change everything.”

    Speed is the first win. By fanning out work to specialized agents—research, reasoning, tool-calling, formatting—we shrink latency without sacrificing depth. In customer-facing AI workflows, a structured fan-out/fan-in pattern routinely beats single-agent pipelines on responsiveness while returning richer results.

    Quality is the second win. Diverse agents produce diverse reasoning paths, which we can reconcile through consensus, self-consistency checks, or a lightweight reranker. Patterns like race-and-rerank and specialist-swarm lift answer accuracy meaningfully, especially when paired with a retrieval-first pipeline to ground outputs in verifiable context.

    Reliability is the third win. Parallel agents let me isolate risky steps, run guarded fallbacks, and degrade gracefully when tools misbehave. With Agent Analytics and eval-driven development in place, we instrument each hop, spot regressions quickly, and keep a clean chain of custody for every decision the system makes.

    Under the hood, I lean on the Model Context Protocol (MCP) to standardize tool access and keep agents composable. That separation of concerns pays off: prompt engineering stays focused on intent and role, while the platform handles authentication, quotas, and observability. It’s how we scale without turning orchestration into spaghetti.

    A pragmatic rollout looks like this: start with a retrieval-first pipeline, add a planner-executor split, then introduce parallel specialists where latency or accuracy bottlenecks appear. Gate each addition with offline evals, follow with A/B testing in production, and let traffic dynamically allocate fan-out based on uncertainty signals.

    Costs stay sane when we treat agents like any other product surface. Put budgets on fan-out width, cache aggressively, and route to smaller models when confidence is high. When uncertainty spikes, expand the swarm, validate with multiple tools, and pay for certainty only when it’s business-critical.

    The organizational shift is just as important. Product trios can now own end-to-end AI workflows, not just prompts. With clear metrics, a shared library of agent roles, and routine post-launch reviews, teams ship improvements weekly instead of quarterly—and they do it with confidence because the feedback loops are visible and fast.

    If you’ve been blocked by the fragility of single-agent systems, parallel agents unlock a new product frontier. They elevate vibe coding from artful prototype to dependable platform: faster by design, higher quality through diversity, and safer because every step is measured. That’s how we turn impressive demos into durable product strategy.


    Inspired by this post on Pendo – Best Practices.


    Book a consult png image
  • Mastering MCP: Battle-tested Playbooks from Miro, Atlassian, and What I’ve Learned

    Everybody’s talking about MCP for good reason. In my product org, it has moved AI from clever demos to dependable, value-creating workflows. When I compare notes with builders at Miro, Atlassian, and beyond, the same patterns surface again and again: nail your retrieval strategy, design for safe tool use, instrument like you mean it, and let real workflows—not novelty—set the agenda.

    When I say MCP, I’m talking about the Model Context Protocol (MCP): a practical way to connect LLMs to tools, data, and actions so agents can retrieve, reason, and execute inside real product experiences. It sounds simple; in practice, it forces you to align AI workflows with everything from data governance and observability to UX writing and change management. That alignment is where the wins live.

    Here is the playbook I now use, refined through our launches and lessons I’ve absorbed from teams at Miro and Atlassian who are shipping agentic AI into complex, high-trust environments.

    Start with a retrieval-first pipeline. Fancy tools won’t matter if the model’s context is stale, noisy, or mis-scoped. I treat retrieval as a product in itself: define authoritative sources, normalize them with docs-as-code discipline, and tag them for permission-aware filtering. Then I enforce context window management so the agent sees the smallest, highest-signal slice of data needed for the task. Good retrieval shrinks hallucination risk and makes every downstream tool call more accurate and cheaper.

    Map one golden path before you expand. In our first MCP rollout, we picked a single, high-frequency workflow with a clear outcome: summarize a Miro board into action items and push them to Jira without manual copy-paste. That forced us to design the end-to-end contract between retrieval, reasoning, and action. Only after the golden path hit reliability targets did we layer on variants (e.g., Confluence summaries, epic splitting, backlog grooming). Narrow to go fast; broaden to scale impact.

    Instrument like a modern platform, not a demo. I apply eval-driven development to MCP agents: offline test suites for intent classification and tool selection, online shadow evals to watch live drift, and post-deployment regression checks that fail closed when data contracts or tool permissions change. Pair that with granular observability so I can trace each agent turn: prompts, retrieved chunks, tool inputs, and outputs with latency and error codes. Without this, you’re guessing. With it, you can iterate weekly.

    Design for least privilege and graceful failure. MCP makes it easy to connect many tools; that’s exactly why you must scope access narrowly and log everything. In our stack, every tool call has a scope, a human-readable rationale, and an audit trail. If a call fails, the agent must recover: retry with backoff, fall back to read-only, or ask the user for consent or missing context. Teams at Atlassian have repeatedly emphasized how permission hygiene and auditable behavior build trust at enterprise scale; my experience matches that.

    Adopt CI/CD for prompts and policies. I treat prompts, tool schemas, and guardrails as versioned artifacts behind feature flags. That lets us canary changes to a small cohort, A/B test prompt variants, and roll back instantly if we see regressions. In practice, this is the difference between shipping one-off AI features and running a resilient AI platform. You’ll feel the leverage as soon as you ship your second iteration.

    Make tool choice explicit and inspectable. Agentic AI can feel opaque; I push for transparent tool arbitration. The agent must explain why it’s about to use a tool, show the proposed inputs, and surface the expected side effects. For power users, a reveal panel with retrieved sources, tool candidates, and confidence signals turns a black box into a glass box—especially valuable in collaborative canvases like Miro and structured work hubs like Jira and Confluence.

    Optimize for latency budgets users can feel. Multi-hop reasoning and multiple tool calls can degrade experience quickly. We set strict latency budgets by task type, apply caching on stable retrievals, parallelize safe calls, and prefetch likely context from the user’s session. If the task will exceed budget, the agent tells the user what it’s doing and delivers progressive results. Miro teams talk about protecting flow; Atlassian teams prioritize continuity in tickets and docs. Same principle: respect momentum.

    Treat prompt engineering as UX writing with systems thinking. The most reliable prompts combine plain-language intent, domain constraints, and crisp tool contracts. We align style and tone with our brand, and we embed microcopy that teaches users how to ask for the best results. Tooltips, in-app guides, and examples reduce churn and boost user activation without retraining the model.

    Meld product strategy with AI feasibility. I start roadmapping by outcomes, not model tricks: time saved in backlog grooming, higher-quality meeting notes in Confluence, or fewer context switches across Miro boards. Then I map feasibility: retrieval coverage, tool maturity, safety constraints, and the eval harness needed to prove gains. This keeps the team focused on value propositions customers feel, not only on what’s technically novel.

    Staff the right trio and the right rituals. My most effective MCP teams operate as empowered product teams: a product manager who owns outcomes and risk posture, a forward-deployed engineer who shapes tool schemas and platform scalability, and a designer who sweats conversational flows and recovery states. Weekly eval reviews replace vague demo days. We ship small, learn fast, and document what changed.

    Measure what matters, not just what’s easy. Beyond engagement, I track success with a ladder of metrics: task success rate, time-to-completion versus baseline, user edits per output, defect rates caught by evals, and downstream business impact (activation, retention, NRR lift). When a workflow moves these needles for a defined segment, I know we’re ready to scale or cross-sell.

    Expect tool sprawl and plan for governance. MCP’s superpower is extensibility; its weakness is the same. We maintain a curated tool catalog with owner, scope, schema version, and deprecation policy. We lint schemas in CI, require backward-compatible changes, and sunset unused tools quarterly. This reduces the blast radius of change and keeps the platform evolvable.

    Bring your ecosystem with you. The best results come from integrating into existing systems of record and systems of collaboration. At Miro, collective context lives in boards; at Atlassian, it lives in tickets, docs, and runbooks. Your MCP strategy should amplify those collaborative truths—pull the right context at the right moment and write back where people already work.

    A 30-day MCP starter blueprint I recommend looks like this. Week 1: pick one golden path, map permissions, define success metrics, assemble your eval harness. Week 2: build retrieval-first pipeline and a minimum set of tools with least-privilege scopes. Week 3: wire agent reasoning with transparent tool arbitration, ship to an internal pilot behind feature flags, instrument everything. Week 4: harden with evals, optimize latency, tighten UX microcopy, and open a limited beta with a product tour and clear feedback loops.

    Looking ahead, the next frontier is composable agents that coordinate across products and teams without stepping on governance landmines. With a disciplined retrieval-first pipeline, strong observability, and eval-driven development, that future is within reach. MCP isn’t magic; it’s a platform pattern. Treated that way, it compounds.

    If you’re wrestling with where to start, choose one workflow users do every day and make it unambiguously better. When your agent quietly handles the busywork and your metrics confirm the lift, skeptics turn into champions. That’s been my experience—and it’s the common thread I keep hearing from builders at Miro, Atlassian, and beyond.


    Inspired by this post on Pendo – Best Practices.


    Book a consult png image