When a customer reports a stolen credit card, the frontline play seems straightforward—freeze it. But that’s just the visible tip of a much larger customer support iceberg. Underneath sits the real work: dispute filings, fraud investigations, merchant communications, proactive outreach, and follow-ups that unfold over days across multiple systems. Most AI support tools only touch the surface; they don’t coordinate or close the loop. That gap is exactly where my product instincts kick in—and why this story matters.
I recently listened to a conversation with Jack Taylor (Product Engineer) and Ibrahim Faruqi (AI Engineer) from Gradient Labs, an AI-native startup building agents that automate the full scope of customer support in fintech. Their approach resonated with the challenges I see every day in customer support automation: fragmented workflows, regulatory complexity, and the need for human-in-the-loop moments. Gradient Labs has architected a platform with three coordinating agents—"inbound, back office, and outbound"—all built on a shared foundation of "natural language procedures, modular skills, and configurable guardrails."
What impressed me most was how they "Let non-technical subject matter experts define agent behavior through natural language procedures—no coding required." That’s a powerful way to remove engineering bottlenecks, accelerate iteration, and keep the domain experts—those closest to fraud, disputes, and compliance—directly in control. In my experience, this design choice alone can compress lead times from weeks to hours and aligns perfectly with continuous discovery and eval-driven development.
At the heart of their platform is orchestration. They "Architected a state machine orchestrator that manages turns, triggers, and skill selection across long-running conversations." That "turn" architecture is built for the messy reality of async, multi-day support. They treat "Skills as modular agent capabilities—and how they're scoped deterministically per turn," ensuring the system stays predictable and auditable. They also confront a nuanced challenge most teams dodge: "Defining "done" for outbound agents when the customer isn't the one ending the conversation." That’s where deterministic criteria, timers, and clearly scoped outcomes matter as much as the model beneath.
Compliance is not an afterthought—it’s baked into the core. Gradient Labs "Built guardrails as binary classifiers with eval pipelines, tuning for high recall on critical regulatory checks." In regulated domains, optimizing for recall on high-stakes checks is the right call; you can tolerate a few extra reviews, but you can’t miss a potential fraud signal. More broadly, they frame "Guardrails as classification problems: balancing recall and precision for regulatory compliance." That mindset is exactly how I like to merge AI risk management with product velocity.
Crucially, they avoid the trap of fully autonomous optimism. "Ask a Human: a tool call that brings humans into the loop for approvals or missing APIs" gives the system a safety valve for novel or high-risk cases. I also appreciated the explicit "Ask A Human Tool" pattern, which cleanly integrates approvals, policy exceptions, or data gaps without derailing the workflow.
Quality doesn’t happen by accident. They "Designed an auto-eval system that samples conversations for human review to catch edge cases and build labeled datasets" and built "Auto-eval pipelines that flag conversations for manual review and feed labeled datasets." That closed-loop evaluation flow is the backbone of sustainable performance in agentic AI. Combine this with targeted instrumentation—think CSAT, first contact resolution, deflection rate, time to resolution, and escalation rate—and you get a real Agent Analytics discipline, not just logs and dashboards.
The "iceberg" metaphor is more than a catchy visual. It’s a blueprint for scoping multi-agent platforms that work across the entire customer journey. With "inbound, back office, and outbound" agents coordinating on complex tasks like fraud disputes, the system can transition cleanly from intake to investigation to resolution—without dropping context or asking customers to repeat themselves. This is what genuine customer support automation looks like when it’s grounded in real operations.
Under the hood, the team leans into robust design choices that matter at scale: the "Complexities of Natural Language Input" are managed with explicit state and skill scoping, "Deterministic Skill Execution" reduces flakiness, and "Customer-Specific Guardrails" ensure compliance remains aligned to each client’s policies. Add their focus on "APIs and Customer Tools Integration" and the result is a platform that can actually take action—not just answer questions.
If you’re building in this space, here’s how I’d apply these lessons. Start by mapping the iceberg: enumerate back-office steps, approvals, and SLAs that follow the initial customer touchpoint. Capture those steps as "natural language procedures" owned by SMEs. Implement a "state machine orchestrator" to manage "turns, triggers, and skill selection" across multi-day workflows. Treat "guardrails as classification problems" and tune for high recall on high-stakes checks. Introduce "Ask a Human" early to handle missing APIs or policy exceptions. Finally, operationalize learning with "auto-eval pipelines" and tight, eval-driven development loops. That’s how multi-agent platforms deliver measurable outcomes in fintech support.
If you want to hear the full conversation, you can listen on Spotify or Apple Podcasts. You’ll also hear a nod to the "Incident.io episode – Referenced in the conversation," and a thoughtful take on the "Future of Multi-Agent Systems."
In short: this is a shift from simple Q&A bots to agents that can coordinate, comply, and complete. It’s the kind of multi-agent platform work that moves the needle for customer support in fintech—and a compelling template for any product leader scaling agentic AI and AI workflows beyond the tip of the iceberg.
Inspired by this post on Product Talk.












Leave a Reply