Every week I meet marketers who are working harder than ever—more campaigns, more content, more dashboards—yet seeing less movement on metrics that matter. The surge of AI tooling has amplified activity, not necessarily impact. That’s the focus problem: we confuse motion with momentum, and our backlogs look great while our outcomes stall.
Learn how AI agents for marketing can help you prioritize impact so you can do important work, instead of just more work.
In my role leading product and growth teams, I’ve learned that AI only compounds value when it is pointed squarely at outcomes. If we don’t define what “good” looks like, agentic AI will simply scale busywork. The antidote is a disciplined operating model that connects strategy to execution and instruments agents with clear success criteria.
First, anchor your program with outcomes vs output OKRs. Choose one or two measurable business outcomes—such as qualified pipeline, conversion rate, or activation—and make everything else subordinate. This provides the compass agents need to make effective trade-offs when speed and volume tempt you to do “one more thing.”
Second, map a driver tree from the target outcome down to the controllable levers: audience segments, offers, channels, messaging, and experience friction. This traceability shows where agents can move the needle fastest—whether that’s accelerating research, sharpening positioning, or eliminating handoffs that slow experimentation.
Third, design a small, agentic AI workforce aligned to those levers. For example: a Research Agent that synthesizes market insights and past performance; a Copy Agent that generates on-brief, on-brand variants; a Distribution Agent that adapts content to each channel and schedules posts; and an Analytics Agent that runs A/B tests, summarizes results, and flags anomalies. Keep human oversight where judgment matters most—strategy, brand voice, and high-stakes decisions.
Fourth, instrument rigor from day one with Agent Analytics and eval-driven development. Define offline evals for brand consistency, factuality, safety, and response time; pair them with online experiments that quantify lift on your target outcomes. Set a minimum detectable effect (MDE) so you stop shipping changes that cannot plausibly move the metric.
Fifth, operationalize your AI workflows. Standardize prompts, inputs, and handoffs; templatize briefs and acceptance criteria; and keep a change log so improvements compound rather than reset. Use short, frequent feedback loops to prune low-impact work and double down on what demonstrably advances your objectives.
I’ve seen teams reclaim focus and momentum when they treat agents as teammates, not toys. The magic isn’t in producing more assets—it’s in consistently choosing the next best action in service of a clear outcome. When you combine outcome clarity, a driver tree, targeted agents, and tight evals, AI becomes a force multiplier for marketing impact.
If you’re feeling overwhelmed by AI’s possibilities, start small: commit to one outcome, one driver you believe is material, and one agent designed for that job. Prove lift, codify the workflow, then scale. Velocity is only valuable when it’s pointed in the right direction.
Inspired by this post on Amplitude – Best Practices.
Disruption is the only sustainable strategy in product. When a platform meaningfully changes how we build and operate, I pay attention—not just as a product leader, but as someone accountable for turning AI Strategy into durable competitive differentiation. That’s why the launch of the Fin API platform stands out: it’s a concrete step toward agentic AI at enterprise scale.
Today, I’m diving into what this launch includes, why it matters for product strategy, and how I’d navigate the build vs buy decision in this new landscape. My goal is to translate the announcement into actionable guidance for product teams, CX leaders, and forward-deployed engineers who are building the next generation of customer support and product-led experiences.
Fin is a customer agent platform that at present resolves over 2M customer issues a week, growing at a rapid exponential pace. It’s relied on by the best brands, large and small, in every vertical you can imagine. From Atlassian and Riot Games, to smaller hot upstarts like Mercury and Polymarket. It runs on a family of models trained by its AI group. Last week, they announced Apex, which is the world’s first specialized customer service LLM. In production tests over the last 6 months, it beat every single frontier model, including those from Anthropic and OpenAI, on resolution rate, latency, hallucination rate, and cost.
With this launch, teams can access the platform’s core capabilities and underlying models directly via API, with contracts starting at $250k per year, and usage rates that are by far the cheapest in the industry for each of the model’s subcategories. For leaders evaluating total cost of ownership, this is a meaningful data point: it shifts the economics of scaled automation from experimental to operational.
Why now? Because builders want options. I hear from teams daily that want to design their own agents, tune prompts and policies, and integrate with bespoke CRMs, data lakes, and product surfaces. The Fin announcement meets that demand with three clear build-paths, each mapping to a different operating model and maturity stage.
First, for the vast majority of companies, the Fin Agent Platform is the pragmatic starting point. Fin reports ~8k companies on it today. It addresses 99% of customer needs out of the box—without exhausting consulting engagements—while delivering top-tier resolution rates. If your priority is time-to-value, governance, and platform scalability, this route de-risks implementation and accelerates outcomes.
Second, for teams that need custom surfaces or channels, the Fin Agent API lets you present Fin in unique contexts. You get the Fin platform’s orchestration and controls, but you’re free to bypass the default messenger, email, voice, or any prebuilt channel and embed the agent natively in your product. I see this as the sweet spot for product-led growth motions where conversation design and UX writing are strategic levers.
Third, for companies building hyper-specific agents—think service plus in-product actions—the new API access to Apex and the broader collection of models is the obvious move. Unlike generalized models, these are purpose-trained for customer service scenarios and operational policies. If you have strong in-house solutions engineering, a retrieval-first pipeline, and eval-driven development in place, this path maximizes control without reinventing the model layer.
This also opens the door for vertical specialists. Fin-like businesses focused on deep domains can emerge quickly—Fin for dentists? Why not? Fin for car dealerships? Sure. I expect startups and modern CX providers (including players like Decagon and Sierra) to carve out niches where domain data, workflows, and compliance are the real moats. That’s where differentiated AI beats generic capability.
There’s a defensive reason to pay attention here. The software landscape is shifting fast: the moat is no longer feature parity—it’s the quality of your agents and the data flywheels powering them. Building software is simply less hard now, and I’ve watched engineering teams more than double measurable productivity as they adopt AI-assisted development. The implication is clear: the interface-and-features era is giving way to an agents-and-outcomes era.
Serious software companies must evolve from being a features company to an agents company—and build those agents on differentiated AI. More value will accrue at the model and orchestration layers, where safety, latency, cost, and resolution quality are won. That puts a premium on prompt engineering discipline, policy routing, continuous discovery of edge cases, and rigorous offline/online evals to keep hallucination rates low while maintaining speed.
How would I choose among the three build-paths? If you’re early or resource-constrained, start with the Fin Agent Platform to validate outcomes and align stakeholders. If you need branded experiences and tighter product integration, use the Fin Agent API to control surfaces without owning the heavy lifting. If you have strong ML ops and a mature customer support ai strategy, go model-level with Apex and companions, layering in your own guardrails, context window management, and test harnesses. In each case, balance velocity, control, and risk—your build vs buy decision should be grounded in clear metrics and an explicit product strategy.
Where does this lead? We’ll see more companies expose specialized model families with clearer economics and stronger governance. For now, I’m excited to see what teams build with the Fin API platform—and how they turn agentic AI into measurable improvements in resolution rate, CSAT, cost-to-serve, and ultimately, customer loyalty.
I believe the future of product design isn’t about replacing designers—it’s about giving every team access to one. That’s why Banani grabbed my attention. It’s an AI product designer that doesn’t just generate code—it generates design. For solo founders, stretched design teams, and early-stage startups, that shift matters: it raises the design floor without lowering the creative ceiling.
I spent time with Vlad Solomakha (CEO & Co-founder), Vova Kovalchuk (CTO & Co-founder), and Vlad Ostapovats (Founding Growth) to unpack how they took Banani from a Figma plugin proof-of-concept to a canvas-first AI design tool generating hundreds of thousands of designs per week. Vlad brings a decade of design experience and a precise north star: AI should produce beautiful, tasteful design rather than average, undifferentiated UI.
The architectural choices stood out. They engineered their agent to handle parallel screen edits, manage per-screen context across canvases with hundreds of frames, and make surgical edits without regenerating entire screens. This is the kind of agentic AI work that product leaders have been waiting for: concrete advances in context window management, tool orchestration, and prompt engineering that translate into higher throughput without sacrificing quality.
Equally important is how they addressed the "gulf of specification"—the mismatch between how designers think visually and how agents understand text. Banani’s canvas-first approach acknowledges that design is spatial, hierarchical, and iterative. Rather than forcing a chat-first UX, they center the canvas and let the agent do production work while keeping the designer firmly in control. In practice, this narrows intent ambiguity, speeds up iteration, and preserves taste.
The team made another pivotal bet: Why Banani doesn’t compile running applications — just HTML/CSS mockups — and how that shapes everything. By decoupling the design artifact from runnable code, they optimize for velocity, taste, and exploration. In my experience, this separation is the right product strategy for early discovery and gen ai for product prototyping—move fast on aesthetics and flows, then converge on implementation once you’ve validated the direction.
I also appreciated their pragmatic evaluation approach. Instead of traditional evals, they spin up 10 screens from one prompt to compare models. It’s hands-on, outcome-based, and aligned with eval-driven development in real product environments. They’re relentlessly discerning about when to work around model limitations versus when to wait for the models to improve—an essential discipline when building at the edge of what’s possible.
Under the hood, context engineering and specialized agent tools do the heavy lifting. Per-screen history with shared project context enables precise, reversible changes across large canvases. The result: fewer destructive regenerations, more reliable design intent preservation, and a workflow that feels like collaborating with a strong mid-level designer who’s exceptionally fast and consistent.
If you want a quick tour, I recommend jumping to a few highlights: 20:13 Product Tour Canvas First AI, 33:40 Gulf of Specification, 42:54 Agent Architecture Under Hood, 48:48 State History Context Tricks, and 56:04 Navigating Busy Canvases. Each segment reveals a different layer of the system design and product thinking behind Banani’s canvas-first UX.
For product leaders, this is a compelling blueprint for raising the design floor while protecting the last mile of craft. It aligns with empowered product teams, continuous discovery, and LLMs for product managers who need leverage without losing judgment. If you’re exploring agentic AI in design, this is a thoughtful, execution-focused model worth studying and trialing on your next product tour or redesign.
Resources worth exploring: Banani and TL Draw. To hear the full conversation, you can listen on Spotify or Apple Podcasts. Then, pressure-test the approach inside your own product development lifecycle and see how a canvas-first AI designer reshapes your team’s velocity and quality bar.
“Is product management dead?” I hear this question at almost every conference hallway chat. After listening to the latest Product Builders – All Things Product Podcast with Teresa Torres & Petra Wille, I’m more convinced than ever: product management isn’t dead—it’s evolving fast, and the leaders will be those who embrace the shift.
Listen to this episode on: Spotify | Apple Podcasts
The core take resonated deeply with my day-to-day at HighLevel: product management isn’t dying—“the traditional product trio (PM, design, engineering) is collapsing into something new.” The center of gravity is shifting from swim lanes to outcomes, from rigid handoffs to fluid collaboration, and from role definitions to capabilities that actually ship value.
AI is raising the baseline across the board. That “80/20 shift: AI handles patterns, humans handle hard problems” is real on my teams. With LLMs like “GPT 5.2” and “Opus 4.5,” coding agents such as “Claude Code” and “Codex,” and tools like “Replit” and “Lovable,” we’re compressing cycle time on the repeatable 80%. The bottleneck is no longer typing code or drafting copy—it’s selecting the right problems, crafting sharp product strategy, and making confident trade-offs.
This is why the future belongs to “product builders” — people with a shared foundation across disciplines and deep expertise in one area. I look for teams that can shape, prototype, validate, and iterate in tight loops, blending continuous discovery with empowered product teams. The baseline expands, the craft deepens.
Functional expertise still matters—more than ever—because the hard parts are getting harder. We need leaders who can weigh platform scalability against time-to-value, protect privacy-by-design, apply AI risk management, and navigate data governance while sustaining product-market fit. When AI accelerates execution, judgment becomes the differentiator.
For leaders, this creates a clear mandate: “What product leaders must do to create safe AI infrastructure.” In practice, that means building guardrails early—security reviews tailored to AI workflows, QA harnesses that include eval-driven development, model performance observability, and human-in-the-loop review systems. You can’t bolt this on later without paying a tax in velocity and trust.
Hiring signals are already shifting. “How job descriptions and hiring expectations are already shifting” shows up in my reqs: we emphasize cross-functional range, fluency with AI workflows, prompt engineering literacy, and the ability to frame measurable outcomes. We still want craft depth—design systems, systems thinking in engineering, rigorous discovery—but we prize people who move seamlessly from discovery to delivery.
In the episode, I appreciated the crisp framing of why product management isn’t dying—but changing. The rise of the “product builder” foundation reframes team topology and unlocks smaller, more cross-functional squads. AI changes the baseline skill set across product teams, and ignoring it is a career risk. If you’re not learning AI tools, you’re falling behind.
My key takeaways were straightforward and actionable. Smaller, more cross-functional teams are likely. Deep expertise still matters—especially for complex trade-offs. Leaders need guardrails: security, QA, and review systems built for an AI-driven workflow. And if you work in product, design, or engineering, this episode is your signal to start upskilling now.
“The risk of ignoring AI in your craft” is not hypothetical. I encourage PMs to carve out weekly lab time for hands-on experiments with LLMs for product managers, build lightweight prototypes with Replit or Lovable, and pressure-test opportunity solution trees with data-informed discovery. Pair with your engineers on agentic AI use cases, and integrate model evals into your CI/CD pipelines.
“Mentioned in the episode” were several resources worth exploring: “Product at Heart” (June, Hamburg), “Replit,” “Lovable,” “Every,” “Petra’s Coaching Packages,” and “coding agents (Claude Code, Codex) and LLMs (GPT 5.2, Opus 4.5).” These are great jumping-off points for your own product builder toolkit.
My recommendation: queue up the episode on your commute, then pick one workflow to augment with AI before the week ends. Replace a handoff with a shared canvas. Automate a repetitive analysis. Ship a scrappy prototype. Momentum compounds.
Have thoughts on this episode? Leave a comment below. I’d love to hear how your teams are evolving your product trios, what AI workflows are sticking, and where governance has been most challenging.
PR review bots are all the rage, but they cost a premium. We built our own for cheap that work just as well, if not better. Here's how.
As a VP of Product Management, I care deeply about the velocity and quality of our software delivery. The decision to build our own pull request (PR) review agents came from a simple calculus: we needed tighter control over developer experience, CI/CD integration, and cost—without sacrificing accuracy or reliability. The result was a pragmatic system that accelerates reviews, improves code quality, and pays for itself through faster feedback loops.
Before we wrote a line of code, we defined success. Our objectives were to shorten review cycles, reduce back-and-forth on style and test coverage, and surface risks earlier—measured against DORA metrics like lead time and deployment frequency. That focus aligned the team, guided our build vs buy decision, and anchored scope to the highest-impact use cases.
We started rules-first, AI-optional. The initial release enforced guardrails that are universally valuable: linting and formatting checks, required test coverage thresholds, commit message standards, ownership validation (CODEOWNERS), and basic security scans. These automated gates eliminated predictable review friction, freeing engineers to focus on logic and architecture rather than style debates.
Then we layered intelligence where it mattered. We added lightweight, explainable checks for common code smells and dependency risks, plus optional natural-language summaries that turn large diffs into concise context. Where appropriate, we introduced agentic AI workflows to triage PRs by risk, draft review comments, and suggest missing tests—always keeping humans in the loop. This hybrid approach kept costs low and outcomes high.
Integration with our CI/CD pipeline was non-negotiable. We wired GitHub/GitLab webhooks to a stateless service that queued work, executed checks in containerized workers, and posted results back as status checks and review comments. Caching, parallelization, and smart diff-scoping ensured we only computed what changed, keeping the experience snappy even on large repos.
Adoption hinged on developer experience. We made the bot’s feedback fast, specific, and actionable, with clear remediation steps and links to documentation. Feature flags allowed teams to opt into new checks gradually. ChatOps commands enabled quick overrides for emergencies, while policy-as-code kept rules visible, versioned, and auditable.
We treated this like any product: eval-driven development for accuracy, ongoing telemetry for false-positive rates, and explicit SLAs for response times. We instrumented outcomes end-to-end—tracking PR cycle time, comment-to-merge ratios, and rework—so we could prove the ROI and tune the system without guesswork.
The outcome: a reliable PR review companion that runs on a shoestring budget, integrates cleanly with our workflows, and measurably improves engineering throughput. If you’re weighing build vs buy, start small with rules that deliver immediate value, then layer intelligence where it earns its keep. With a clear product strategy, you can stand up capable PR review bots quickly—and scale them as your needs grow.
If you’re ready to try this yourself, begin with your top three friction points in code reviews, wire them into your CI/CD checks, and pilot with a single team. Iterate weekly, measure relentlessly, and let your developers be your strongest signal. You’ll be surprised how far a pragmatic, product-led approach can take you.
Inspired by this post on Amplitude – Perspectives.
I just watched one of the most significant leaps in customer service AI in years. Last week, a quiet but seismic release landed in CX: Fin introduced Apex, a vertical model purpose-built for support that raises the bar on speed, accuracy, and cost. As a product leader, this is exactly the kind of breakthrough that changes roadmaps, vendor strategies, and what customers can expect from modern service operations.
It’s a brand new model for Fin called Apex, and it’s objectively the highest performing, fastest, and cheapest model for customer service. It beats the very best models in the industry including GPT-5.4 and Opus 4.5.
In this analysis, I’ll unpack why the launch matters for the customer service agent category, what it signals for frontier labs and open‑weight ecosystems, and how leaders should rethink their AI Strategy, build vs buy decisions, and eval-driven development roadmaps.
Fin was already the highest performing and most sophisticated agent in the customer service space, consistently beating impressive competitors like Decagon and Sierra at an average win rate in the 70s. It operates at tremendous scale, now resolving almost 2M customer issues per week, a number that’s growing at an exponential clip. In its short life it’s grown to nearly $100M in recurring revenue.
As of last week, ~100% of all (English language, chat and email) customer conversations are now running on Apex. Since day 1, the Fin engine has comprised a system of models, and last year the team began replacing off‑the‑shelf models with custom ones trained on proprietary data. The core answering model had been a frontier labs offering—initially versions of GPT and more recently Sonnet 4.0. Now, that core answering model is Apex 1.0.
This model resolves customer issues at a materially higher rate than any other model available. One of their largest customers in the gaming space saw the resolution rate improve overnight from 68% to 75% (i.e. a reduction in unresolved conversations of 22%). The team notes they had never seen a jump this large from a single improvement since they started Fin.
Just as important, it’s dramatically faster, has fewer hallucinations, and is far cheaper than other available models—exactly the attributes operations leaders weigh most when deploying agents at scale. In practice, these are the levers that unlock higher CSAT, tighter SLAs, and better unit economics.
Achieving all three simultaneously is extraordinarily hard. Credit goes to foundational research from a 60‑person AI group run by Fergal Reid, and, crucially, to domain‑specific proprietary evals drawn from billions of human and agent interactions produced by the Fin resolution engine—already hand‑tuned to be the most effective in the category. That creates a flywheel: an eval‑driven development loop that trains models to keep improving at the edge of the system’s abilities. In other words, Apex 1.0 looks like the tip of the iceberg.
Zooming out, service is one of the few categories where generative AI has already delivered commercial impact at scale (alongside coding, and arguably the legal industry). With TAMs measured in the hundreds of billions, competition is intense and well capitalized. The pattern I’ve seen repeatedly is clear: winners in these spaces must become full‑stack AI companies. As features become ~free to build, durable competitive differentiation shifts under the hood—to proprietary data, post‑training, inference efficiency, and the quality of the eval loop.
Fin Apex raises the bar for finance-ready AI, highlighting a -65% cut in hallucinations and a quicker first token at 3.7s (0.6s faster), compared with Sonnet 4.6, Opus 4.5, and GPT-5.4 in side-by-side charts.
That’s why competitors will need to release their own models. Many appear to be just starting to hire the talent to do so, which likely gives Fin at least a year of head start. For product leaders, this is a strong signal to revisit build vs buy assumptions, and to quantify when owning your post‑training pipeline and evals becomes the rational move.
Honestly, 2–3 years ago I expected AI application differentiation to live mostly in what we built around third‑party models. The AI game humbles all of us; today it’s obvious that vertical models paired with proprietary evals create compounding moats.
In a podcast interview last week, Andrej Karpathy said:
"I do think we should expect more speciation in the intelligences. The animal kingdom is extremely [diverse] in the brains that exist. And there’s lots of different niches of nature… And I think we should be able to see more speciation. And you don’t need this oracle that knows everything. You kind of speciate it. And then you put it on a specific task. And we should be seeing some of that because you should be able to have much smaller models that still have the cognitive core."
The frontier labs still have the very best models, but open‑weight models aren’t far behind—making pre‑training look increasingly like a commodity. The frontier is moving to post‑training, which is precisely what we see with Apex (and Cursor’s Composer 2), and what we should expect to dominate going forward.
Labs now face a dual reality. On one hand, horizontal general‑purpose models can over‑serve specific verticals (e.g., customer service doesn’t need an oracle that knows everything). On the other, open‑weight models are good enough that high‑quality, domain‑specific post‑training can produce superior models for special‑purpose jobs—and in the ways that matter for those jobs. In service, soft factors like judgement, pleasantness, and attentiveness matter alongside hard factors like resolution effectiveness, speed, and cost.
I’m still bullish on the labs. Many organizations remain heavy customers of Anthropic—whether as part of multi‑model systems or through deep usage of Claude Code in engineering teams (see this example of Claude Code adoption). Yet classic disruption (à la the late, great Clay Christensen) is now at their door. The way out is to disrupt themselves by building cheaper specialized models too, which likely requires acquiring the evals—or the companies with the evals—needed for each task. Expect creative data partnerships, M&A consolidation, and a wave of hyper‑specific model providers that compete head‑to‑head with the labs.
In the meantime, Fin appears to be the only vendor in its space with a custom model that’s also objectively superior to everything else out there. I’m excited to see it deployed broadly for end customers, and I’m watching closely for the next announcement that will accelerate that rollout. For product leaders, the message is clear: the age of vertical models and agentic AI is here—bring your evals, or bring your checkbook.
Are you an AI product manager or want to become one? This guide cuts through the noise and shows where the PM role is really heading with AI.
I’ve spent the last few years scaling AI initiatives across complex SaaS products, and I’ve learned that “AI product manager” isn’t a vanity title—it’s a capability set. The role evolves traditional product management with new responsibilities across data, model behavior, risk, and continuous learning systems. My goal here is to demystify what matters, so you can lead with clarity, build with confidence, and deliver measurable outcomes.
First, let’s separate hype from reality. An effective AI Strategy starts with the customer problem, not the model. I anchor roadmaps around clear use cases, then evaluate whether we need a retrieval-first pipeline, agentic AI, or conventional automation. “Build vs buy” is no longer a procurement question; it’s a lifecycle question about iteration speed, quality control, data governance, and long-term unit economics.
Discovery also looks different. I still run continuous discovery and customer interviews, but I augment them with behavioral analytics and targeted experiments to validate feasibility, risk, and value. I practice privacy-by-design and AI risk management from day one, and I define guardrails for acceptable model behavior alongside success metrics. When high stakes are involved, I document data provenance and align with regulatory compliance standards to protect customers and the business.
Execution shifts from shipping static features to operating learning systems. In product roadmapping and sprint planning, I account for context window management, prompt engineering, and the realities of LLMs for product managers: latency, cost, drift, and failure modes. I use feature flags, A/B testing, and eval-driven development to move from offline model evals to online impact with a minimum detectable effect (MDE) worth the release risk. Observability, anomaly detection, and incident management aren’t optional—they’re how we earn trust.
Collaboration expands beyond engineering and design. I work closely with data science on evaluation frameworks, with solutions engineering to de-risk complex enterprise deployments, and with customer success to close the loop on model performance in the wild. Our outcomes vs output OKRs emphasize activation, time-to-value, and sustained retention over vanity accuracy metrics.
Tooling is now strategic advantage. My AI product toolbox includes prompt libraries with versioning, synthetic data generation where appropriate, and a disciplined approach to model and prompt regression tests. I standardize AI workflows—intake, evaluation, deployment, and monitoring—so teams can ship faster without cutting corners. This is how empowered product teams scale safely.
Career-wise, I look for—and coach—PMs who can frame trade-offs crisply: explain when to fine-tune vs use retrieval, when to embed agents, and when not to use AI at all. Show me driver trees that connect model metrics to business outcomes, a clear risk register, and a plan for continuous discovery. If you can tell a compelling story backed by transparent evaluation and customer value, you’re already ahead.
Here’s the bottom line: the “AI product manager” that matters in 2026 is a product leader who can turn uncertainty into systematized learning. If you focus on real customer problems, rigorous evaluation, responsible design, and iterative delivery, you won’t just carry the title—you’ll create durable competitive differentiation.
I’ve watched too many AI agent deployments celebrate velocity while overlooking the one thing that determines long-term success: whether real users are actually getting value. Dashboards tend to spotlight model upgrades, prompt tweaks, and launch counts, yet they rarely quantify task completion, trust, or time-to-value. That blind spot isn’t technical—it’s human.
Enterprises are spending 93% of their AI budget building agents and almost none know if those agents are actually working for users. Pendo Agent Analytics closes the gap.
In my product reviews, I look for evidence that agentic AI is improving outcomes across the customer journey, not just the demo path. Without behavioral analytics and observability, teams optimize for throughput instead of resolution, for novelty instead of reliability. This is where eval-driven development, A/B testing, and rigorous cohort analysis become non-negotiable: they translate agent performance into user impact we can measure and improve.
Here’s the pattern that works for me: define user-centric success metrics first, then let the AI follow. I prioritize signals like successful task completion, low-friction activation, reduced escalations, and sentiment lift—tied directly to product-led growth indicators such as retention and expansion. When these metrics move in the right direction, I know the agent is creating compounding value, not just answering faster.
Practically, I operationalize this with an analytics spine that captures end-to-end agent interactions: intents, prompts, responses, clarifying turns, handoffs, and final outcomes. I segment by persona, journey stage, and account tier to uncover where agents delight and where they degrade trust. With this foundation, I can run controlled experiments, spot anomalies early, and connect improvements in agent behavior to improvements in business performance.
Pendo Agent Analytics closes the loop by making these user outcomes visible and actionable. Instead of guessing whether an agent helped or hindered, I can analyze where users stall, which prompts or skills drive completion, and how interventions like in-app guides or product tours change behavior. That visibility lets me tune models and experiences in days, not quarters—and gives stakeholders confidence that our AI investments are paying off for customers.
If you’re scaling agents today, start small but instrument deeply: map top user intents, define offline and online evals, A/B test prompts and policies, monitor regressions, and tie every improvement to activation, adoption, and retention. The result is a durable feedback loop that keeps agents aligned with user value as your surface area grows.
AI agents are not a destination—they’re a capability. When we anchor that capability to clear user outcomes and measure it with the right analytics, we stop flying blind and start compounding advantage. That’s how we turn promising demos into dependable products.
For years, I’ve watched product, growth, and data teams burn cycles stitching together manual dashboards and reports, then slogging through replay review just to validate a hunch. That overhead slows discovery and delays decisions. The promise here is different: "Discover how Amplitude AI Agents help product, growth, and data teams turn questions into action without manual dashboards, reports, or replay review." As someone obsessed with decision velocity and evidence-based product strategy, that shift is exactly what I’ve been waiting for.
In practice, I think about "Amplitude AI Agents" as always-on data analysts embedded in our workflow. Instead of queuing requests or context-switching into tooling, I can ask targeted questions, get synthesized insights, and move directly to action. This is a powerful example of agentic AI meeting behavioral analytics in a unified analytics platform—removing friction between inquiry and impact while keeping teams focused on outcomes, not artifacts.
What changes for my day-to-day? I can interrogate customer behavior in real time, pressure-test hypotheses from discovery interviews, and quickly understand whether activation, retention, or monetization is the current constraint. If I’m probing a driver tree for activation or a retention analysis for a specific cohort, I can get to a decision faster—without waiting on someone to build a bespoke dashboard. That means more cycles spent shaping product strategy and fewer sunk into report wrangling.
This matters beyond speed. When product, growth, and data leaders anchor discussions in the same source of truth, we shorten the distance from signal to decision. That alignment is the backbone of product-led growth and continuous discovery: shared context, faster feedback loops, and clearer trade-offs. It also reduces the long tail of analytics debt—those one-off reports and stale views that quietly accumulate across teams.
Of course, adopting any AI workflow in analytics demands governance. I hold these systems to the same bar I set for my teams: clarity of assumptions, consistent metric definitions, and auditable reasoning. Pairing "Amplitude analytics" with strong data governance, CI/CD for analytics definitions, and lightweight evals helps ensure the recommendations we act on are reliable, reproducible, and explainable. AI should accelerate our judgment, not replace it.
The strategic shift is simple and profound: move from building dashboards to making decisions. With always-on analysis, we can spend less time instrumenting analytics theater and more time delivering customer value. That is how we translate insights into impact—and why I’m excited to operationalize this capability across our product trios and go-to-market partners.
Inspired by this post on Amplitude – Best Practices.
Every day, I challenge my teams to make one small, meaningful improvement—something so lightweight it’s impossible to ignore and easy to repeat. That tiny daily motion compounds, and over time it reshapes customer experience, operational quality, and team culture.
That’s the essence of Kaizen, the Japanese philosophy of continuous improvement. Developed in post-war Japan and popularized by companies like Toyota, Kaizen proves that small, steady changes lead to significant long-term results. In product management and customer support, this approach transforms big ambitions into daily behaviors that actually stick.
Crucially, Kaizen isn’t passive or unstructured. It thrives on three principles I reinforce across my org. First, small changes reduce resistance—when you lower the activation energy, teams move faster. Second, improvement is continuous, not occasional; instead of waiting for quarterly reviews or major releases, you ask: “What can we improve right now?” Third, everyone participates—the people closest to the work are best positioned to improve it. That’s how momentum spreads.
In practice, the cycle is simple: identify a small problem, test the change, measure the result, refine, and repeat. The point isn’t radical transformation in a single swing; it’s steady progress guided by data and observation—a rhythm that aligns beautifully with eval-driven development and continuous discovery.
At Intercom, we apply this same philosophy to how we manage our Agent Fin through a process we call the “Fin Flywheel”. Here’s how this works.
Train: Teach Fin how to handle and resolve the most complex customer queries.
Test: Run fully simulated customer conversations from start to finish to see exactly how Fin will behave before going live.
Deploy: Launch Fin across all channels so customers get consistent support wherever they reach out.
Analyze: Use AI-powered insights to review and improve Fin’s performance so it can deliver better customer experiences.
This isn’t a one-time setup; it’s a continuous loop where every interaction feeds ongoing improvement. Rather than deploying AI and assuming it will perform as expected, improvement is built into the system itself. The more Fin is used, the better it gets. That’s the hallmark of agentic AI done right—tight feedback loops, purposeful conversation design, and clear Agent Analytics that illuminate what to tune next.
But continuous improvement doesn’t stop with AI. Within our Human Support operations, I emphasize the same mindset that drives great LLMs for product managers: you instrument the experience, learn from real usage, and close gaps fast. We operate with a simple mindset: the first time that you solve a customer issue should be the last time it happens.
When a conversation reaches a human, we pause to diagnose and prevent recurrence. Why did this reach me? Why couldn’t Fin resolve it? How can we prevent this from happening again? Those questions anchor a culture of root-cause thinking and accelerate product-led growth by removing friction at the source.
To make this effortless, we’ve built a lightweight, AI-powered way to log suggestions in the moment—no long explanations or heavy admin required. Ideas are reviewed quickly and implemented by subject matter experts or by the team themselves. This keeps the flywheel spinning: insights flow in, fixes go out, and measurable outcomes improve.
The result is a frontline that evolves from reactive problem-solvers into a proactive improvement engine. The people closest to customers spot friction, suggest fixes, and see their insights shaped into meaningful change. It’s continuous discovery embedded in everyday work, not a side project.
Kaizen demonstrates that lasting progress doesn’t come from occasional transformation; it comes from intentional, everyday refinement. The “Fin Flywheel” applies that philosophy to AI. Our Human Support continuous improvement process applies it to human insights. Together, they create a shared system where both people and AI learn continuously from customer interactions.
When improvement is built into the mechanics of how you work, it stops being a one-off project and becomes an ingrained capability. Over time, those small daily improvements don’t just add up—they compound into a sustainable, data-driven advantage that elevates customer experience and differentiates your customer support ai strategy.
I remember the exact moment our product crossed the threshold from scripted automation to truly agentic AI. The excitement was real—so was the pit in my stomach when our dashboards went dark. Our trusted analytics and observability stack, which had served us flawlessly for traditional software, suddenly couldn’t explain what the agent was doing, why it made certain choices, or how to reproduce outcomes across runs.
"The moment our product became a AI agent, our entire observability stack became irrelevant—not something you want as an analytics company. Here's what we did."
Why does this happen? Agentic AI doesn’t behave like conventional apps. Instead of deterministic flows and neatly tagged events, we face non-deterministic trajectories, tool-use chains, evolving prompts, context window dynamics, and policy guardrails that influence outcomes in real time. Clicks and pageviews give way to tokens, tool calls, and conversation turns. Without purpose-built observability, you can’t do credible product discovery, measure behavioral analytics, or run eval-driven development with confidence.
That’s why we built Agent Analytics. We needed a unified lens to trace every step of an AI workflow—from user intent to model prompts, function calls, retrievals, tool outputs, and final responses—while capturing latency, cost, guardrail hits, fallbacks, and outcome tags. We instrumented runs end-to-end, added experiment support for prompt engineering and policy variants, and wired in evaluations so we could turn subjective quality into objective signals the team could act on.
The impact on product management was immediate. We shortened iteration cycles by making failure states obvious and reproducible, turned ambiguous feedback into structured data, and gave engineers and designers a shared source of truth for conversation design and AI workflows. With visibility into containment, escalation, autonomy ratio, and step-level success, we could ship confidently, rollback safely, and align roadmap bets to measurable outcomes—not anecdotes.
Building this capability demanded more than logging. We invested in data governance and privacy-by-design to mask sensitive content while preserving semantic context, and we separated human-identifiable data from model telemetry. We treated prompts and policies like code—versioned, diffable, and safely rolled out behind feature flags and CI/CD—so we could experiment without risking regressions in production.
What should every team measure? Start with outcome quality (task success, resolution, containment), reliability (tool success rate, guardrail triggers, fallbacks), performance (time-to-first-token, total latency, step-level latency), and efficiency (tokens and cost per successful task). Add groundedness checks for retrieval steps, regression evals for core journeys, and post-release anomaly detection to catch drift before users do. These metrics become your operating system for agent performance and your compass for product strategy.
If you’re building or scaling AI agents, you need Agent Analytics before you hit your first incident. It’s the difference between guessing and knowing—between reactive firefighting and proactive iteration. With the right observability, your team can move faster, manage risk intelligently, and translate agent behavior into business outcomes that compound over time.
Inspired by this post on Amplitude – Best Practices.
What if AI could help reduce the 10-plus years it takes to get a new drug to market? That question has shaped much of my own product strategy thinking, and it’s exactly why I was drawn to Medable’s bold move with Agent Studio. It’s a rare look inside an enterprise AI platform built for one of the most regulated industries in the world—and a team that’s still figuring it out in real time.
In this episode of Just Now Possible, Teresa Torres talks with four members of the Medable team: Luke Bates (Product Leader, Agent Studio), Jen Brown (Product Manager), Matt Schoolfield (Product Designer), and Fiachra Matthews (Principal Architect). Listening through a product management lens, I focused on how their choices reflect a modern agentic AI strategy that balances speed, safety, and scale.
Medable does something uniquely hard: enabling global clinical trials across 100+ languages and accelerating drug-to-market timelines. That scope demands more than clever prompts—it requires a durable platform approach. Their answer is Agent Studio, a no-code/low-code platform for configuring and deploying agents across the clinical trial lifecycle.
What impressed me most was how clearly the platform’s primitives map to repeatable value: models, skills, knowledge bases, MCP connectors, versioning, and trigger types. In my experience, platforms win when these building blocks are composable, governed, and observable—exactly the direction Medable is taking.
You’ll also hear about the two agents they’ve built on top of it: an ETMF agent that automates document classification across 80,000-plus documents per year, and a CRA agent that monitors patient safety and data quality across 13 different clinical systems. For a domain where errors carry real human consequences, this is the right mix of automation and oversight.
Under the hood, their architecture choices echo what I’ve seen work in other high-stakes environments. They walk through RAG approaches at scale: embeddings vs. markdown hierarchies vs. just-in-time MCP retrieval, and explain Why they built custom MCPs with an authentication and credentialing wrapper. They also detail Context window management with sub-agents and automatic tool filtering—critical to keep agents focused and reliable as complexity grows.
Data alignment is often the unsung hero of agent reliability. I appreciated how they described How they built a unified ontology layer to map terminology across 13 different clinical data systems. Equally important, they show their paper trail: How they document agent intent → specification → test evidence to satisfy regulatory bodies. In a GXP context, this kind of lineage isn’t “nice to have”—it’s the price of admission.
Discover how Medable's Agent Studio reimagines clinical operations, shrinking drug-to-market timelines from a decade to a year with no-code agents, automated eTMF document classification, unified data monitoring, and human-in-the-loop validation.
Strategically, I love that Medable chose a platform approach to agents instead of one-off builds. They outline Three deployment models: Medable-built products, services-led custom builds, and self-serve platform access. This mirrors a healthy platform business model: prove value with first-party solutions, extend via services for complex needs, and unlock scale with self-serve—while keeping governance centralized.
Reliability is a theme throughout. They describe Evaluation design in a GXP-regulated environment: golden datasets, production monitoring, and the challenge of human feedback as ground truth. We also get a concrete picture of what human-in-the-loop really looks like when clinical decisions are on the line—tight feedback cycles, auditable interventions, and clear escalation paths.
Looking forward, they don’t shy away from ambition. The "full self-driving" vision for clinical trials and what it would take to get there is both provocative and grounded. My read: the path runs through stronger domain ontologies, standardized interfaces (MCP done right), eval-driven development, and relentless simplification of agent skills.
If you’re a product leader building in regulated spaces, this discussion is a masterclass in balancing innovation with compliance. The takeaways map cleanly to AI Strategy: define platform primitives, invest in retrieval-first pipeline patterns, design for context window management, lean into eval-driven development, and operationalize regulatory compliance from day one.
To dive deeper, listen to the conversation on Spotify or Apple Podcasts, and explore Medable’s broader platform work at medable.com. I left both inspired and practically equipped—an uncommon combo in today’s AI noise.