In competitive markets, I see two options: try to win the game competitors set, or choose to play a different game. In the "Customer Agents" category, I’ve watched too many glossy, fabricated demos—especially around voice—mask the real challenges. Voice is just extremely hard. We all know the future of customer experiences will be Agent-driven voice, yet most of us haven’t actually spoken with a modern AI Agent when calling a business because the tech hasn’t been truly ready in the wild. Today, the bar moves.
What changed? There’s a live, public demo of cutting-edge voice tech you can stress test yourself—no smoke, no mirrors. I recommend taking it for a spin: https://fin.ai/voice. It’s fast, natural, and, yes, very, very good.
For context, yesterday brought Apex Flash, their newest and fastest model, built for the unique demands of low latency channels like voice. Today comes Fin Voice 2, a major upgrade to Fin Voice with over 20 new features, and the first product built on Apex Flash.
Here are the three things that stood out to me—and why they matter for customer support AI strategy and product strategy.
First — thanks to Apex Flash, Fin Voice 2 is now the fastest, most natural Agent for phone, with higher resolution rates and customer satisfaction scores than ever before. Apex Flash is trained on millions of customer experience interactions, fine tuned for customer service, and can be configured to understand all your knowledge and follow all your policies. The result is higher resolution at significantly lower latency—the best of both worlds for voice AI agent performance.
Speed and naturalness here aren’t accidental. Most voice AI products are slow because they convert speech to text, send it to a general model, get a text answer, and then convert it back to speech. Fin Voice 2 was designed to work differently, separating the real time layer that handles speech processing, and the layer that generates answers. That architecture is purpose-built for the demands of customer service on voice.
Powered by Apex Flash, Fin Voice 2 raises the bar on quality and speed—boosting resolution rates and guidance following while cutting time to first audio and semantic search latency, with a lift in CSAT too.
Second — Fin Voice 2 can handle complex queries end to end: taking actions in external systems, verifying callers’ identities, processing refunds, booking appointments, and more. Phone is a high-stakes channel, and Fin adapts to customers across emotional states, clarifies when needed, and confirms key details before taking action. Most of the time, Fin can resolve the query in full, and when it can’t, it seamlessly hands off to the human team, maintaining full customer context and history. You also get multiple improvements to call quality, plus proactive outbound calls to follow up on unresolved issues—all orchestrated by robust AI workflows.
Third — Fin Voice 2 gives you total control with industry-leading tools to configure and manage how Fin behaves. You get rich, detailed insights into call behavior and quality, the most common topics of calls, and one-click recommendations to improve. As with everything in Fin, you can fully self-serve and then manage it all with ease, without requiring professional services. Many vendors only let you set up their voice agent under supervision; with Fin, you get everything you need to iterate fast.
If you haven’t tried the demo yet, go check it out: https://fin.ai/voice. If you prefer to wait, don’t be surprised when you end up speaking with it at a favorite brand soon.
From a product management lens, this is what matters: latency is a feature customers feel; transparency builds trust in enterprise AI; and control is non-negotiable for CX leaders. The combination of a purpose-built, agentic AI architecture, measurable gains in resolution and CSAT, and true self-serve configuration signals that voice is moving from prototype theater to production reality. That’s the different game I want our industry to play.
Turning a rambling stream of consciousness into a clean task list while someone is still talking has been a longtime product dream of mine. With Ramble, Todoist brought that dream to life by using live audio AI to capture tasks in real time—no transcription step required. The result is a voice-to-task flow that feels natural, fast, and surprisingly disciplined.
As I listened to the Doist team—Ernesto Garcia (Front-end Product Engineer), Thomas Jost (Backend Software Engineer), and Hugo Fauquenoi (Product Manager)—walk through their approach, I heard a blueprint for building pragmatic GenAI features. What began as a two-to-three month AI exploration became one of their most technically deliberate releases: a “Gemini-powered pipeline that makes tool calls while the user is still speaking, surfacing tasks on screen in real time without any text output from the model.”
The breakthrough started with user research. People weren’t merely dictating tasks; they were doing a “brain dump” first—often into pen and paper or even ChatGPT voice—and only then committing items to Todoist. Meeting users where they already are reframed the problem: don’t force structure upfront; capture fluid thought and translate it into actionable tasks instantly.
That insight led to a bold architectural choice: skip transcription entirely and process raw audio directly with a Gemini live audio model. By removing the brittle middleman of text, the team reduced latency and kept the model focused on one job—turning intent into structured actions. It’s a crisp example of AI workflows designed for reliability over novelty.
The real magic is in the real-time “tool calls.” As the user speaks, the model triggers add task, edit task, and delete task operations immediately. For high-friction contexts like driving, they paired visual task cards with subtle sound effects as confirmation cues. It’s thoughtful conversation design that respects attention and safety without sacrificing speed.
Teaching the model to capture tasks literally—without over-interpreting or trying to complete the work—required careful prompt engineering for voice and temperature tuning. Drawing a bright line between “capture versus do” kept the experience trustworthy. In my own AI Strategy work, I’ve found that establishing explicit agentic guardrails early prevents unintended autonomy later.
Dates were the sleeper challenge. The team had to inject the current date, normalize to days vs. months, and always output dates in English for the natural language parser—while preserving the user’s original language for everything else. If you’ve ever shipped date handling across locales, you’ll appreciate how many edge cases hide in “Taming Dates and Time.”
Quality didn’t hinge on intuition alone. They built an LLM-judge eval system using real employee recordings from 100+ people across 35 countries in 20+ languages to catch prompt regressions. That’s eval-driven development done right: representative data, repeatable scoring, and tight feedback loops as models and prompts evolve.
For project and label matching, they chose direct context injection over RAG. Instead of building a retrieval pipeline, they injected the full project/label list into the system prompt. With smart context window management and a sharply constrained task schema, this was both simpler and more accurate. Sometimes the fastest path to product-market fit is removing moving parts, not adding them.
One product principle stood out: easy correction beats perfect first-time accuracy. Natural language interfaces earn trust when users can fix misfires in a tap or two. That bias toward quick recovery over false precision is how you ship AI that feels useful from day one.
Looking ahead, the roadmap is compelling: multimodal task capture from images and text blobs, Apple Watch support, and automation integrations. As voice AI agent patterns mature, this “tool-only architecture” sets a solid foundation for going from capture to coordinated execution—without losing the simplicity that makes Ramble shine.
If you want to hear the full conversation, you can listen on Spotify or Apple Podcasts. It’s a masterclass in building focused GenAI features that trade cleverness for clarity—and still delight.
Resources & Links: Todoist • Doist • Google Vertex AI (Gemini)
Every update we shipped this month removed a specific constraint on what teams can do with Fin. In my world, the demo-to-production gap shows up as complexity, control, and confidence. Can the agent handle the query that actually matters? Will it sound right on a call? Can the team deploy it without filing an engineering ticket? Can managers understand what it’s doing? That’s the bar I hold us to.
This month, we delivered answers to all four. Here’s how.
Procedures and Simulations (0:51). The hardest problem in AI-powered customer service isn’t answering FAQs—it’s executing complex queries with real business logic and real consequences if anything goes wrong. Think billing refunds, multi-step flows, and actions that must be right the first time.
We made it dramatically easier to build and manage Fin for those complex queries—without pulling in an engineer. You can author in natural language, test every step in simulation, and deploy with confidence.
The workflow starts with AI drafting the procedure from your existing source material. You edit in natural language, with structured hooks to pull in live data, apply business logic, and add code for deterministic control where you need it. That’s how you handle multi-step flows with the precision that matters when things go wrong.
Simulations are the test environment. Define a test case, pass in the data Fin would receive in a real conversation, and watch it work through each step. You see what Fin is doing, why, and whether it’s meeting the criteria you set. Full transparency at every point. I’ve run these end-to-end myself, and there’s a particular confidence that comes from watching it work before it goes anywhere near a customer.
A conversational moment from the February Fin Product Updates recap: two teammates trade insights with laptops open, while a bold pull-quote drives home the promise—Fin removes complexity to start selling and supporting in under two minutes.
For a deeper look at Procedures and Simulations, head to fin.ai/procedures.
Fin Voice: three major updates. When something’s off in chat, it can take a few exchanges to notice; on a call, it’s immediate. Pronunciation, noise handling, and tone all matter because they’re the customer’s first impression.
Pronunciation rules (4:18). Fin has high out-of-the-box pronunciation accuracy, but it doesn’t know your brand—your product names, your industry terminology, the way your company uses certain words. Alihan Zinna, Staff ML Scientist, showed this with an IKEA example: without pronunciation rules, Fin mispronounced both “IKEA” and a product name; after adding rules, both were corrected and sounded natural.
New natural voices (5:48). We’ve added 11 new voices tuned to a range of brand tones so you can choose one that sounds like it truly belongs to your company—not a generic AI assistant.
Background noise reduction (6:28). People call from airports, shops, and busy offices. Fin now monitors background noise continuously and increases noise reduction when the environment demands it. No configuration needed. As Alihan put it, “This is one of those things customers really notice when it’s not working. The goal was to make it invisible. That’s what we built.”
Catch up on February’s Fin Product Updates with a walkthrough of the Call Metrics dashboard—saved filters, hold‑time tiles, missed and declined call counts, and a monthly breakdown that helps support teams act faster.
Shopify setup experience (8:21). Fin began as a Service Agent and is quickly becoming a Customer Agent—working across the whole lifecycle to support, sell, and guide, even before a customer has an issue. The revamped Shopify setup is a clear step forward.
Shopify catalogs are complex—thousands of products, variants, and dynamic inventory—and connecting all of that to an agent has historically been painful. We removed the friction.
Setup now takes three steps: first, connect your store. Second, install the Messenger directly in Shopify—no code, just a few clicks. Third, deploy Fin. Total time: under two minutes. We timed it live.
What that unlocks is real. In the demo, a first-time snowboarder asked for recommendations. Fin searched the catalog, reasoned about attributes that matter to a beginner (there’s no “beginner” tag in the catalog), personalized suggestions by height and weight, and added a board to the cart.
Even better, one customer updated their website copy to promote a sale. Fin immediately picked up the new context and began recommending sale items, nudging shoppers to add more to the cart to access a discount—no extra configuration required. It read the situation and acted.
See how the latest Fin update streamlines support scheduling. A product expert walks through Holiday Office Hours, showing how to set default hours, track response metrics, and add closures so teams stay consistent.
Three steps, and you have a real-time shopping assistant that knows your store and sells on your behalf.
Helpdesk improvements (12:31). Fin works with any helpdesk, but many teams consolidate to take advantage of our native Intercom helpdesk integration. We’ve shipped 19 helpdesk improvements in 2026 so far; two from this month stand out.
11 new call metrics. Hold time, outbound dial time, missed and declined calls, call terminating party, and more. These give leaders the visibility to analyze workload distribution and call handling quality in detail.
Holiday office hours. Teams no longer need to manually update office hours for every public holiday. This was the most upvoted request in our community, and we shipped it.
Across the board, we removed the constraints that hold teams back: the complexity ceiling in automation, the quality ceiling in voice, the setup barrier in Shopify, and the operational overhead in the helpdesk.
We closed out the month with a Star Wars–style crawl of 22 additional updates. All features mentioned here are live and available now. Explore more at fin.ai/updates. More to come—see you next month.
What happens when you treat an AI agent not as a chatbot, but as a full teammate on your sales team – one that can jump on video calls, demo your product, make phone calls, and follow up over days?
I recently dug into this question with the team behind ShowMe, an AI-native startup building digital sales reps for inbound teams. Founded in April 2025, ShowMe has engineered a multi‑agent system that combines conversation agents for live voice and video interactions, evaluator agents that score every call for quality and sentiment, and creator agents that ingest customer documentation to build tailored playbooks. A workflow layer orchestrates the entire lead‑to‑close journey across days, not minutes—exactly the kind of agentic AI approach I expect to see become standard in revenue workflows.
What stood out to me first was the origin story: a glaring conversion gap on a previous website, and the realization that a purpose‑built AI could fill it. The initial MVP was refreshingly pragmatic—start with a voice agent, pair it with product videos, and back it with a simple RAG knowledge base. That retrieval‑first pipeline let the team ship quickly, validate real user behavior, and then scale sophistication where it mattered.
Then came a pivotal affordance shift: adding a realistic avatar via HeyGen. It wasn’t just eye candy; it changed how prospects engaged. The video-call UX established trust and made the AI’s capabilities legible at a glance. Prospects behaved as if they were with a human rep—interrupting, probing, and asking for demos—because the surface area invited that behavior.
On the architecture side, the team decomposed a single sales conversation into multiple specialized sub‑agents—greeting, qualifying, pitching—to manage latency, memory constraints, and model limitations. Deterministic workflows handle the happy paths reliably, while a smart orchestrator is emerging to break out of rigid paths when context demands it. Confidence scoring and frustration detection kick in for real‑time human handoff decisions, a must for revenue‑critical moments where a missed nuance can cost pipeline.
Training the system to sell like your team is where it gets powerful. ShowMe ingests sales transcripts and training materials to teach company‑specific sales skills, then uses creator agents to assemble tailored playbooks. Conversation agents stay focused on live interactions, while evaluator agents continuously score calls for quality and sentiment. The result: repeatable, compliant, and brand‑consistent selling—without flattening personalization.
Quality isn’t an afterthought—it’s operationalized. Early deployments run with customer-driven evaluation loops where 100% of conversations are reviewed, tapering to about 5% over time as confidence increases. Feedback becomes automated tests to prevent prompt regression, and production quality is proven with POCs, A/B rollouts, dashboards, and CRM logging. This is eval-driven development applied to go‑to‑market: measurable, auditable, and continuously improving.
I also appreciate how they treat the agent as a coworker, not a widget. Onboarding happens via Slack, weekly reporting aligns with sales leadership rhythms, and tight CRM integration keeps data flowing both ways. That mindset unlocks adoption because it fits how sales teams actually operate—and it creates real Agent Analytics you can manage.
From a product perspective, several pragmatic details matter. Real‑time voice and avatar demos rely on latency tricks and a library of video clips to keep interactions snappy. The conversation agent evolved from a basic Q&A bot into guided sales discovery, balancing personalization with the ever-present risks of hallucination. Guardrails, human‑in‑the‑loop, and clearly defined handoff rules are non‑negotiables in high‑stakes sales workflows.
Looking ahead, the roadmap makes sense: move toward self‑serve PLG setup, add smarter orchestration that adapts beyond deterministic flows, and expand into adjacent roles like customer success. For product leaders building in gen ai, the pattern here is instructive: start with inbound value, design AI workflows that align to proven sales motions, and use rigorous evals to earn the right to automate more.
If you want to go deeper into the build, the live demos, and the full multi‑agent orchestration, listen to this episode on: Spotify | Apple Podcasts. For more on the stack, explore ShowMe and the avatar platform HeyGen.
Support teams in Spain just got the clearest signal yet that the old way of doing things won’t cut it anymore. As I look at the details, I see more than a regulatory hurdle—I see a blueprint for the modernization many of us have been pushing toward for years.
The signal arrives in the form of one of the most ambitious customer service regulations in Europe—a law designed to strengthen consumer protections and set clear expectations for fair, transparent, and personalized customer service. Among its measures: new protections against spam calls, stronger transparency requirements, safeguards around personalized interactions, and measurable standards for speed, accessibility, and complaint handling within customer support.
It’s a significant shift, especially for large enterprises and essential-service providers. While the initial reaction might be anxiety about audits and penalties, the larger opportunity is hard to ignore: this law compels us to build modern, resilient support operations that scale, perform, and earn trust.
Spain is often an early mover in consumer-protection regulation, and this shift could signal what future standards across the EU might look like. For EMEA leaders, this is a moment to reevaluate operating models, invest in automation thoughtfully, and ensure customer experience improvements directly support regulatory compliance.
Below, I break down what the law requires, what it means in practice, and how AI Agents like Fin can help teams meet regulatory expectations while delivering faster, more personal support at scale.
The law applies in full to providers of regulated services, including water, energy, passenger transport, postal services, pay-audiovisual media, and electronic communications, and also to any company (or group) that meets certain size and turnover thresholds, even if their core business falls outside those sectors.
Large companies (those with more than 250 employees and over €50 million in turnover) also hold additional obligations, particularly around multilingual support in Spain’s co-official language regions.
While the law is still moving through its final approval stages, the direction is clear: a broad set of obligations will apply to reinforce consumer rights, ensuring they can: Reach support quickly. Speak to a human when needed. Get clear information during outages or service disruptions. Have complaints handled promptly and on time.
1. 95% of support calls must be answered within three minutes
This raises the bar significantly for responsiveness, especially during spikes, outages, billing cycles, or seasonal surges. Most support systems are not built for this level of agility. In my experience, you can’t hire your way to this metric sustainably—you have to design for it.
2. Customers must be able to speak to a human on request
Automation is allowed, but it cannot be the only option. At any point during a call, a customer must be able to transfer to a human if they ask for one. Companies cannot trap customers in automated loops. The practical implication: every workflow needs a reliable, audited escape hatch to a person.
3. Support lines must be free of charge
Premium-rate numbers are prohibited. Customer service cannot generate revenue for the business, nor may it be used to upsell products. This cleanly separates service from sales and reduces consumer friction.
4. Essential services must offer 24/7 support for continuity issues
Electricity, water, gas, telecoms, and transport providers must always be reachable at all hours when customers need to report service interruptions. That means coverage, triage, and routing must be always-on.
5. Complaints must be resolved within 15 days – or within five days for undue charges
This halves the previous general complaint window of 30 days and adds a much faster path for billing-error complaints. Companies must maintain records, assign tracking numbers, and ensure timely follow-up. Your case management discipline will make or break this requirement.
6. No spam calls or unwanted commercial pressure
Companies must identify business calls with a designated prefix, and customer -service calls with a different one. Telecom operators will be required to block calls that do not use these codes. Additionally, contracts obtained via unsolicited calls will be legally null and void, protecting consumers from being pressured into commitments they never intended to make.
7. Companies must maintain a unified complaint-tracking system
All complaints, claims, and incidents must be recorded in a centralized system to ensure traceability. If your data is fragmented across tools, this is a call to centralize and standardize intake.
8. Companies must pass annual external audits
These audits assess whether customer service processes are meeting the required standards. In practice, that means consistent processes, measurable outcomes, and reliable evidence.
9. Better linguistic and accessibility rights
Large companies operating in regions with co-official languages must be able to provide support in those languages. They must also ensure their customer service is accessible for vulnerable consumers, such as those with disabilities or older adults. Multilingual and accessible by design is the new default.
10. Fairer contract renewals
Companies must provide customers with 15 days’ notice prior to automatic renewal of online subscriptions and make cancellation simple. This is both a compliance and customer trust win.
Most support systems weren’t built for this level of speed or operational rigor. But the steps required to comply are the same ones that make service better for customers—and better for the teams delivering it. That’s why I view AI as an essential capability, not a bolt-on.
With the regulatory expectations clear, the question becomes: what does a modern, compliant support operation look like? For me, it blends human empathy with intelligent automation, proving auditability without sacrificing experience.
This is where AI plays a meaningful role. Not as a replacement for humans, but as a reliable front line that can handle a wide range of queries, including the most complex ones that require real depth, while keeping queues under control.
Adopting an AI Agent like Fin helps teams build a support model that meets regulatory expectations and improves customer experience across all your channels. Here’s how.
Many organizations will struggle to meet the three-minute standard during normal times, let alone during spikes or busy seasons, without unsustainably scaling their teams. Fin can help by reducing the number of calls that reach your phone lines and Fin Voice will ensure the ones that do are handled quickly.
Reducing avoidable call volume before it reaches the queue
Many of the queries teams receive are predictable: outage updates, billing questions, account changes, and other repeatable issues. Fin can resolve these instantly across several channels, including live chat, SMS, email, and WhatsApp, using the content and processes your team already maintains. I’ve seen this alone cut peak-time pressure dramatically.
Answering the phone immediately
For customers who do call, Fin Voice can pick up straight away. It provides natural, conversational responses based on your existing knowledge and helps your team stay responsive during busy periods.
Making it easy to reach a human easier during spikes
When queues build up, Fin can capture the reason for the call, gather details, and prioritize the most urgent issues. If you offer callback options, Fin can help schedule them quickly so customers avoid long wait times, which is key for staying compliant during peak periods.
The law requires customers to reach a real person whenever they request one. Fin supports this by keeping the path to a human clear and dependable: every interaction includes an option to speak to a person, and that option is accessible until the issue is resolved; when chosen, Fin hands over full context so human teams don’t start from scratch; if you show team availability or wait times, Fin can surface that information for customers; escalations can be prioritized to ensure faster pickup; alerts can notify on-call staff when urgent issues arise. On the phone, Fin Voice follows the same principle. Callers can request a transfer at any moment, and Fin routes the call to the right team with context intact.
Essential-service providers must be reachable at any hour when customers need to report service interruptions. Fin can help you meet this requirement without building a full overnight staffing model.
Always-on answers and triage
Fin provides first-line support at any hour of the day or night. Fin Voice brings this capability to the phone, giving callers immediate help even when your human team is offline. Fin can also direct customers to the latest updates you’ve published, such as outage information or status pages.
Routing urgent issues to the right people
When an issue requires human judgment, Fin gathers the necessary details and routes it to the appropriate on-call team using your existing after-hours processes. Teams can set up notifications so urgent issues are seen quickly.
Proactively surface what matters most
With AI Insights, Fin can also monitor for emerging patterns in customer conversations through Trending Topics. This means that if there’s a sudden spike in reports about a specific outage or a recurring question about a new process, Fin can flag these trends in real time. Your team is alerted to what’s top-of-mind for customers, so you can prioritize updates, publish targeted FAQs, or escalate critical issues, ensuring your support stays relevant and responsive, even overnight.
Complaints and outages often create the biggest spikes in volume, and the new law increases pressure to respond quickly, keep customers informed, and maintain complete records. This is exactly where structured AI intake adds value.
A more structured complaint intake
Fin can recognize when a customer is lodging a complaint, gather required information, and initiate a record in your existing system with a clear ID assigned from the outset.
Clear ownership and deadline alignment
Your team can then use your case-management tools to apply the 15-day resolution timeline (or five says for undue charges). Fin’s structured intake helps ensure that ownership and next steps are visible, rather than buried in unstructured notes.
Faster, more consistent outage communications
During service interruptions, Fin can share the latest published information, provide estimated fix times when available, and direct customers to live updates. On the phone, Fin Voice can triage incident-related calls quickly so callers aren’t waiting for a human agent just to receive basic information.
While multilingual support is only mandatory for large companies operating in co-official language regions, it remains essential for meeting consumer expectations. Fin helps by supporting multilingual, natural language interactions across voice and other channels; operating within channels that support accessibility features, like channels compatible with screen readers or commonly used messaging apps; and offering “request a call” paths and collecting the necessary information up front so teams can follow up quickly for customers who prefer phone support.
The law prohibits customer service interactions from generating additional revenue or being used to offer new products. With Guidance, you can set Fin up to stay firmly within these boundaries by shaping how it responds, which topics it should avoid, and what it should prioritize when a customer is seeking help or lodging a complaint.
The law raises expectations around documentation and audit readiness. Fin helps by making customer interactions more structured and consistent: when a conversation involves a complaint, Fin can ensure the required information is captured and a clear ID assigned; that ID can follow the interaction so it remains easy to trace; consistent intake gives you better visibility into key metrics regulators care about, like response times, time to first human contact, escalation volume, and whether complaints are resolved within required timelines; transcripts, summaries, and metadata can be retained until cases are resolved, supporting audit requirements; many organizations maintain internal compliance playbooks outlining processes and owners. Fin’s structured intake helps keep these practices reliable; leverage Insights to identify trending topics, optimize processes and measure service quality.
Spain’s new customer service law raises the bar on speed, access, and accountability. It’s natural to worry about how your team will cope, especially if your support operation has grown organically across tools and regions. I’ve seen how quickly burnout and chaos can set in when expectations rise faster than capacity.
The reality is that meeting these expectations through people alone would put unsustainable pressure on already stretched support teams. The risk of burnout and operational chaos is real, which is why an AI Agent like Fin can bring welcome relief.
By handling everything from high-volume, repetitive questions to many of the deeper, more involved issues customers raise, Fin keeps queues manageable and prevents the strain from falling entirely on your human team, helping everyone stay above water as expectations rise.
For companies operating across the EU, adapting early to Spain’s stricter expectations can build resilience for whatever comes next—whether that ends up being driven by regulation or customer demand. Now is the time to align compliance, AI strategy, and customer experience into a single, measurable operating model.
I love real-world AI that ships, scales, and actually solves painful customer problems. This story checks every box. As a product leader who has brought agentic AI to production environments, I was captivated by how a small, focused team at Perk took a no-code voice AI prototype and turned it into a system that reliably makes 10,000+ calls per week to prevent failed hotel payments.
What happens when you combine a real customer problem, a no-code prototype, and a team willing to listen to every single call?
Steven Payne (Product Manager), Gabriel Stock (Senior Engineering Manager), and Philipe Steiff (Senior Software Engineer) from Perk share how they built a voice AI agent that calls hotels to verify virtual credit card payments, preventing travelers from arriving to find their rooms unpaid. This is a textbook example of linking operational pain to a high-leverage AI solution.
What started as a hackathon experiment in Make.com became a production system handling over 10,000 calls per week across multiple languages. Along the way, the team learned hard lessons about prompt engineering for voice (numbers, pronunciation, and a very "Karen-like" first version), how to break a single monolithic prompt into structured conversation stages, and why listening to actual calls beats any amount of theorizing.
From a product management perspective, this approach aligns perfectly with eval-driven development and continuous discovery. Structure the problem, instrument aggressively, ship safely, then listen—deeply—to real interactions. In my own teams, I’ve seen that nothing accelerates iteration on agentic AI like closing the loop between qualitative call reviews and quantitative evals.
They built a working prototype without writing a single line of backend code.
They structured the call into discrete stages (IVR, booking confirmation, payment) to improve reliability.
They created two eval systems: one for call success classification, another for conversational behavior.
They scaled from five calls a day to tens of thousands per week while maintaining quality.
This is a detailed look at building AI for real-time human interaction—where the stakes are high and the feedback is immediate.
Guests: Steven Payne, Product Manager, Perk; Gabriel Stock, Senior Engineering Manager, Perk; Philipe Steiff, Senior Software Engineer, Perk.
What stood out to me was how Perk's team identified an AI use case by connecting prior experimentation with a real operational problem. Why they chose Make.com for prototyping—and shipped to production without touching backend code—underscores how far no-code can take you when paired with crisp problem framing. The evolution from a single prompt to structured conversation stages (IVR handling, booking confirmation, payment request) is exactly how you harden agent behavior for production.
Breaking up the agent's task dramatically improved reliability. They also built two eval systems: classification for success rates and LLM-as-judge for conversational behavior. Even with automation, the team still listens to calls manually—a practice I strongly endorse for uncovering edge cases, trust issues, and UX nuances that dashboards can’t show.
The challenge of prompt engineering for voice—numbers, booking references, and text-to-speech markup—was non-trivial. Expanding to German revealed that prompts in native language improve results. And, as often happens with operations-heavy rollouts, this project uncovered other operational problems they didn't know existed—valuable signal for the roadmap.
Resources & Links: Perk. Make.com — No-code automation platform used for the prototype. Twilio — Voice/telephony provider. Eleven Labs — Text-to-speech provider (used in early experiments).
Chapters: 00:00 Introduction to the Team; 01:54 Understanding PERK's Mission; 02:59 Challenges in Travel Booking; 07:27 AI Solutions for Customer Care; 09:52 Prototyping with AI and Voice; 17:00 Implementing AI in Production; 25:51 Learning Through Trial and Error; 26:40 Prompting Challenges and Solutions; 27:58 Iterating on Prompts and Evaluations; 30:08 Scaling and Production Challenges; 32:43 Advanced Evaluation Techniques; 35:32 Real-World Applications and Success; 49:07 Future Directions and Expansion; 53:53 Conclusion and Team Reflections.
My product takeaways: Start with clear operational pain and measurable outcomes (e.g., payment verification). Use no-code to validate quickly, then progressively harden. Treat voice AI like any production system: break it into deterministic stages, add guardrails, and measure both outcome and behavior. Pair automated evals with hands-on reviews. And when going multilingual, write prompts in the native language—your accuracy will thank you.
If you’re exploring agentic AI for operations, this is the blueprint: tight scoping, Make.com for speed, Twilio for reliability, structured prompts for control, and an eval-driven loop to scale quality with confidence.