I created this practical guide to help product managers cut through the hype and apply AI where it genuinely moves the needle—faster discovery, clearer strategy, sharper execution, and measurable outcomes.
A practical guide to AI tools for product managers: tested picks, what each tool is best for, copy-paste prompts, workflows, and screenshot checklists.
Leading product management at HighLevel, I’ve pressure-tested dozens of gen AI solutions across product discovery, roadmap planning, delivery, and go-to-market. In this guide, I map an AI product toolbox to core PM jobs-to-be-done so you can move from experimentation to repeatable impact with confidence.
Expect clear recommendations on where each tool excels—LLMs for product managers, research synthesis for customer interviews, behavioral analytics for opportunity sizing, and lightweight automation for in-app guides and product tours. I connect these tools to proven practices like continuous discovery, outcomes vs output OKRs, and product roadmapping and sprint planning so you can operationalize AI inside your existing workflows.
I also share the evaluation criteria I use before rollout—AI Strategy alignment, data governance and privacy-by-design, AI risk management, observability, and total cost of ownership. This eval-driven development approach helps teams avoid technology FOMO while creating defensible, trustworthy workflows that scale.
To accelerate adoption, I’ve included copy-paste prompts (including prompt engineering patterns for both chat and voice), retrieval-first pipeline blueprints to ground your models in product docs and decision logs, and conversation design tips for support and success use cases. You’ll see step-by-step AI workflows that tie directly to journey mapping, opportunity solution trees, and Kano Model trade-offs.
Every workflow comes with screenshot checklists you can use for onboarding or stakeholder management, making it easy to align ICs and leaders on the same operating picture. Whether you’re optimizing A/B testing, retention analysis, or QBRs vs OKRs, these checklists turn good intentions into repeatable rituals.
Use this guide as your field companion to ship faster with higher confidence—reducing cycle time, improving signal in discovery, and building momentum for product-led growth. If you’re ready to translate generative AI into reliable PM leverage, start with the workflows, adapt the prompts, and make them your own.
Today, I’m spotlighting Fin for Sales, a new role for Fin Customer Agent that runs your inbound sales motion end-to-end. From my vantage point leading product management and collaborating closely with revenue teams, this is a meaningful evolution in how we capture, qualify, and convert high-intent demand with precision and speed.
The promise here is simple and powerful: a single Customer Agent with shared context, memory, and business goals that supports the entire journey from first touch to close. Fin for Sales brings Fin to the start of the customer journey so it can engage prospects, guide them through your funnel, and ensure the best opportunities reach your sales team without delay.
At a high level, here’s what stands out to me in practice. Fin engages every prospect instantly at the moment intent is highest. It runs discovery like your best rep with clear pricing guidance, product education, and objection handling. It qualifies and routes in real time using your playbook and syncs full context to your CRM. And it closes deals while you sleep by booking meetings, starting trials, and steering buyers to the right next step—boosting MQLs, pipeline, and early close/win rates.
Fin engages every prospect instantly. It starts the right conversation when interest peaks, re-engages before prospects go cold, and works on every channel, in every language, 24/7. In my experience, that immediacy is the difference between a lead that converts and a lead that disappears.
Introducing Fin for Sales, a conversational assistant that qualifies prospects in real time. The chat compares Free vs Pro, spotlights reporting and Salesforce integrations, and invites users to book a call.
Fin runs discovery like your best rep. It explains pricing, guides product discovery, handles objections, and personalizes each interaction based on who the prospect is and what they care about. This is where thoughtful conversation design and consistent playbook execution really compound.
Fin qualifies and routes in real time. Using your playbook, it collects and enriches data about your prospects, sends qualified leads to your sales team or down self-serve paths, while syncing full context to your CRM. Your team never works the wrong lead. That’s operational rigor revenue leaders crave.
Fin closes deals while you sleep. It can book meetings, start trials, and guide buyers to the right next step. Early customers are already seeing impressive results, increasing MQLs, growing pipeline and seeing close/win rates of nearly 50% in the first month. That’s the kind of lift that reshapes go-to-market strategy and forecasting confidence.
Fin for Sales links customer agent insights with Salesforce, turning live conversations into rich profiles and lead scores. View key details, intent and opportunity signals, and guided next steps like booking a meeting.
Why this matters: most online sales experiences still rely on forms, queues, and follow-ups—exactly when prospects want clarity and momentum. Hiring enough reps to cover every time zone, channel, and hour is unrealistic, and even the best teams burn cycles on leads that were never going to convert. I’ve watched high-intent demand slip through the cracks simply because the response wasn’t fast, consistent, or contextual enough.
Revenue leaders need a system that meets every inbound interaction immediately, without sacrificing quality, and routes only the right opportunities to sales. Incremental automation doesn’t fix the core issue; an agentic approach does. Fin for Sales closes that gap by pairing instant engagement with disciplined qualification and crisp handoffs.
How it works in the moment: when a prospect is actively exploring your site, any delay—a form, a queue, a “we’ll get back to you”—erodes intent. Fin engages in real time through the Spotlight Messenger, a new interface built specifically for sales conversations. It can proactively start a conversation based on context like the page someone is on or how they’re browsing, and it offers smart suggestions to kick-start engagement.
Fin for Sales schedules meetings directly in chat. A sleek widget shows a March 2026 calendar with selectable time slots and a clear Confirm booking CTA, streamlining lead capture and speeding up sales follow-ups.
Prospects who might have waited—or never reached out—now get answers immediately. Fin also works across channels including messenger and email, so buyers can engage however they prefer. Whether someone is browsing your pricing page at 2am or comparing features during a lunch break, Fin responds instantly and relevantly so no lead is left behind.
To move prospects toward a decision, Fin guides personalized discovery conversations that clarify needs and accelerate choices. Four pillars make this consistent and trustworthy. Playbook: you brief Fin in natural language on desired outcomes and scenarios; it follows your rules, handles objections with approved guidance, and stays on track. Knowledge: it draws from your product knowledge base to answer pricing, features, and plan fit, and can reuse what you’ve already trained for customer service—no duplicate setup. Enrichment: once Fin learns a user’s email or name, it enriches that data with outside sources to improve qualification, personalization, and routing. Memory: if Fin recognizes a returning visitor, it remembers context so the buyer never starts over.
As conversations progress, Fin surfaces the opportunities most likely to close. It qualifies like your best SDR—asking about use case, budget, fit, and timing—and applies your existing playbook to identify the strongest opportunities. Details captured in conversation, plus enrichment, produce a complete picture that’s structured and synced into your CRM for immediate sales action. And when a lead isn’t a fit, Fin gracefully disqualifies or redirects to self-serve resources, ensuring your pipeline stays focused.
Introduce Fin for Sales to your team with this clean hero banner: bold headline, signature blue spiral, and a clear 'Start free trial' call to action—inviting readers to explore an AI customer agent built for revenue.
When a lead is ready to act, Fin closes. It books meetings via tools like Chili Piper or Calendly, guides qualified buyers into trials or subscriptions, and routes opportunities to your sales team with full context. Crucially, it passes the full conversation history and an AI-generated summary so reps pick up exactly where the buyer left off—no repeated questions, no lost nuance. For self-serve motions, Fin can guide prospects from discovery to trial signup or even paid conversion, automatically assigning the right path.
Real results underscore the model’s value. Fin is already delivering measurable results for early customers across different company sizes, sales motions, and go-to-market models. Attio, an AI CRM built for scaling go-to-market intelligently, deployed Fin to replace their traditional form-and-wait inbound flow with real-time conversational engagement. In three months, Fin handled over 1,600 conversations with website visitors, qualified more than 50 leads for sales, and routed over 30 applicants into their startup program. One returning prospect engaged with Fin, had their questions answered in real time, and converted to a paying customer at six times Attio’s average contract value.
Fellow, an AI-powered meeting assistant and management platform, started by deploying Fin overnight, a window where no human was online and prospects waited up to 18 hours for a reply. In January alone, Fin booked 18 meetings the team would never have reached, converting at around 48%. Importantly, the human team maintained its booking rate while Fin added net-new meetings—proof that automation layered on top of strong human coverage can be additive, not cannibalistic.
Fin for Sales is built on the same AI platform that powers the highest-performing Agent in customer service, which keeps the end-user experience consistent. If a prospect asks a support question mid-sales conversation, Fin can handle it—no handoffs to other vendors, no lost context. It shares knowledge and memory across its platform, always knows whether it’s talking to a prospect or a customer, and moves between roles as needed. Setup follows the same Fin Flywheel: Train, Test, Deploy, Analyze. Describe your sales playbook, qualification criteria, and routing rules in natural language; test in preview; deploy live; and use Analyze to understand performance and iterate quickly.
Fin for Sales is available today, and there’s more coming. I share the conviction that the future is a single Customer Agent, vertically integrated down to the model layer, orchestrating customer experience across the entire lifecycle. If you want to see it in action, go to fin.ai/sales and talk to Fin—then imagine that instant, high-quality engagement running across your inbound sales engine, every hour of every day.
Your product deserves a support experience that does more than point users to a help article. In my work leading product teams, I’ve seen how an intelligent, in-product assistant can reduce friction, accelerate user activation, and create the kind of product-led growth that traditional support channels struggle to deliver. The bar is higher now: customers expect immediate, context-aware help that feels proactive, measurable, and trustworthy.
When I evaluate support solutions, I look for three capabilities: an assistant that truly knows the user’s context, can act on their behalf to resolve issues end-to-end, and can prove the impact with rigorous measurement. Anything less is just another interface to your knowledge base. The shift to agentic AI makes this possible—if it’s grounded in behavioral analytics and integrated with your unified analytics platform.
Learn more about Amplitude AI Assistant. Our in-product support agent knows your users, acts on their behalf, and measures whether it actually helped.
That promise resonates with how I design AI Strategy: start with data fidelity, not dialog. When an assistant is wired into Amplitude analytics and behavioral analytics, it can understand where a user is in the journey, the features they have (or haven’t) adopted, and which nudges or in-app guides historically drive success. This is the foundation for precise, contextual help—surfacing the right product tours at the right moments and removing guesswork.
Knowing users isn’t enough; the assistant must act. With agentic AI, the assistant can execute safe, auditable steps on a user’s behalf—updating settings, triggering a workflow, or guiding a multi-step configuration—rather than handing off a to-do back to the customer. Done well, this reduces time-to-value and support tickets while aligning with a thoughtful customer support ai strategy that respects permissions, privacy-by-design, and clear guardrails.
Equally important is measurement. I expect every AI touchpoint to demonstrate lift: faster time-to-resolution, higher feature adoption, improved retention, and lower churn. This is where robust A/B testing, Agent Analytics, and retention analysis come in—so we can quantify the assistant’s contribution against meaningful product outcomes, not vanity metrics. If we can’t measure it, we can’t manage it.
Operationally, I advise teams to pilot with narrowly scoped, high-impact journeys and iterate with tight feedback loops. Instrument the assistant’s actions and outcomes, set minimum detectable effect thresholds for experiments, and continually refine prompts and playbooks. Tie insights back to your unified analytics platform so learnings inform roadmap choices and reinforce a durable product-led growth motion.
In short, the next generation of in-product support will be built on data-rich context, agentic execution, and rigorous proof of value. That’s the standard I hold my teams to—and the experience users deserve when they ask for help.
Inspired by this post on Amplitude – Best Practices.
I’m constantly studying how AI is elevating product organizations, and Amplitude offers a compelling example of how to turn data into durable, customer-centered outcomes.
Spencer Whittaker is a senior AI product manager at Amplitude. He focuses on using AI to advance Amplitude's mission of helping companies build better products.
From my vantage point leading product teams, that focus translates into practical AI Strategy across behavioral analytics and Amplitude analytics: turning raw event streams into decision-ready insights that accelerate product-led growth and continuous discovery.
In my own roadmap reviews, the highest-impact patterns are consistent: pair A/B testing with eval-driven development, coach PMs on LLMs for product managers to sharpen problem framing, and amplify signal quality through thoughtful instrumentation and journey mapping. When these practices come together, empowered product teams ship with confidence and reduce time-to-learning.
Equally important are the guardrails: clear build vs buy criteria for gen ai components, privacy-by-design and data governance from day one, and a crisp measurement model that ties experiments to activation, retention analysis, and customer success outcomes.
Practically, this means instrumenting hypotheses with the right metrics, setting a minimum detectable effect (MDE) where relevant, and looping insights back into the opportunity solution tree so the next sprint is smarter than the last. This disciplined rhythm separates hype from durable value.
Seeing peers push this mission forward reinforces a core belief of mine: when AI helps teams find the right problems faster, we build products people truly love—and we do it responsibly, repeatably, and at scale.
Inspired by this post on Amplitude – Best Practices.
At Intercom, shipping is our heartbeat. We push code to production hundreds of times a day, and I’ve seen firsthand how that pace sharpens our product instincts and forces clarity in our CI/CD practices.
Engineers, engineering managers, designers, and PMs all contribute to this, safely. The average time from merging code to it running in production is 12 minutes. For me, that’s not just a vanity metric—it’s a DORA-style signal that our release pipeline and observability are aligned with the velocity our customers expect.
I’ve long held a belief that might sound counterintuitive: speed is not the enemy of safety. It’s a prerequisite for it. Accumulating code creates risk. Shipping small batches minimizes it. The faster you ship, the smaller each change is, and the easier it is to catch problems, and roll back when something goes wrong as the context is still fresh in your head. That small-batch discipline underpins how I approach AI workflows and risk management across product teams.
Today, over 93% of our pull requests (PRs) across our two main codebases are Agent-driven. And over 19% are auto-approved with no human reviewer in the loop. When I first saw those numbers at scale, I asked the same question you might be asking: are we trading rigor for speed? The answer lives in the data.
I want to focus on that second number, and why I think it makes us safer. Most people hear “AI is approving our pull requests” and think that’s reckless. I thought so once, too—until I looked at the outcomes that actually matter.
Last year, our CTO Darragh Curran set an explicit goal: double the productivity of our entire R&D organization within 12 months. Because the faster we can build and ship, the faster our customers get the capabilities they need. Ambitious? Absolutely. But the operational clarity that comes from such a target is invaluable for product leaders.
Nine months later, we did it. The results were significant across the board, but here’s the stat that crystallized it for me: downtime from breaking code changes dropped 35%, even as our deployments doubled. Shipping faster made us safer. As we modernize how we build and ship software, we systematically surface bottlenecks and tackle them. One of the biggest we found? PR review.
Humans simply don’t have the time or mental capacity to properly review the volume of AI-generated code we’re now producing. I’ve watched great engineers get stuck in review queues, or worse, feel pressure to rubber-stamp under time constraints—an anti-pattern I’ve battled in multiple orgs.
When an AI Agent can produce a working implementation in minutes, waiting hours or days for a human to review it is an impedance mismatch. The production line is moving faster than the quality gate can keep up. When that happens, one of two things follows: either the queue backs up and velocity drops, or, more dangerously, humans start rubber-stamping. Glancing at a diff, skimming the description, clicking approve. Some companies are drifting into this failure mode silently. We chose to confront it head-on and built a rigorous solution.
PR review, done properly, is complex. A good reviewer evaluates the problem statement, aligns the diff to intent, checks for safety and logical issues, applies deep product context, and scans for performance and anti-patterns. No single human can cover all of that on every PR at high deployment frequency. The truth—borne out by data—is that the human baseline we often assume is stronger than it really is.
AI is accelerating code reviews: our data shows median merge time drops from 75.8 minutes with human review to just 14.6 minutes with AI approval—about 5.2x faster—while maintaining strong safety checks.
So we asked ourselves: what if we could do better?
Our PR review Agent doesn’t treat code review as a single task. It decomposes it into separate sub-jobs, each handled by an independent sub-Agent. One assesses the quality of the problem description. Another checks whether the diff actually aligns with the stated intent. Another reviews for safety concerns. Another checks for logical correctness. Another reviews against best practices and known anti-patterns. And so on. As a product leader, this is exactly the kind of agentic AI architecture I look for: specialized, auditable steps that strengthen the overall control plane.
The result is that every PR is reviewed as if a dozen of our most tenured and knowledgeable engineers were all looking at it simultaneously, each bringing their own specialist lens. In the past, getting that breadth of review on a single PR was impossible. Now it’s the default. And unlike ad hoc human review, this system is consistent and tireless.
A human reviewer typically focuses on the actual code changes, the diff. Our Agent goes deeper. It traces execution paths, following the implications of a change through the codebase. This is something humans rarely had time to do, even when they wanted to.
While testing our new PR review Agent on a set of historical PRs, we found it flagging a one-line text copy change as incorrect. On the surface, it looked completely harmless, just a text update. We assumed it was a mistake, but it wasn’t. Our Agent caught that the new copy contradicted an existing validation mechanism elsewhere in the codebase. No human reviewer would have realistically found this unless they happened to have written that validation code very recently. Our Agent catches this kind of thing consistently, every time, because it’s always tracing execution.
The review isn’t generic either. It’s grounded in Intercom-specific guidance that our engineers have built and continue to refine, encoding the same context, standards, and product knowledge they’d apply if they were reviewing the PR themselves. When the Agent reviews a PR, engineers flag whether the review comments were helpful or not, and that feedback continuously sharpens the guidance. It’s a flywheel: the more our engineers invest in teaching the system how to think about our codebase, the better every subsequent review gets. This is eval-driven development in action.
Automated approval is also never forced. Any engineer can request a human review on any change, at any time. The system is a tool, not a mandate. At Intercom, shipping code doesn’t end at merge. The engineer who ships a change is expected to watch it go live, monitor its behaviour in production, and be ready to roll back if something isn’t right. AI approval doesn’t change that. The human who ships the code remains accountable for the outcome.
The naive take on AI-approved PRs is that it’s just a rubber-stamp LLM call so that humans don’t have to bother. A convenience feature. That misses what’s actually happening. Our Agent is strict. It won’t approve large PRs. If a change is too big, too complex, or too broad in scope, it flags it and requires it to be broken down. That design nudges engineers toward smaller, well-scoped changes—the safest way to ship, review, test, and, if needed, roll back.
This matters enormously for safety. Small changes are easier to review, easier to test, easier to understand, and, critically, easier to roll back when something goes wrong. This is the same principle that has always underpinned our shipping culture, but now the PR review Agent actively enforces it. As someone who’s owned incident management and SRE partnerships, I can’t overstate how powerful this is.
A snapshot of our code review results: AI-authored pull requests are reverted far less often than human-written ones—around 10x lower—across both stacks, with 0.53% vs 5.39% in backend and 0.22% vs 2.00% in frontend, signaling safer merges.
It’s tempting to look at a goal like “>50% AI-approved PRs” and worry we’re optimizing for a metric rather than an outcome. I see it differently. The real goal is to remove a bottleneck that, if left unchecked, pushes people toward rubber-stamping. By elevating the review bar and keeping batch sizes small, we protect both speed and stability.
We didn’t assume AI review would be good enough; we actively ran an experiment. Our hypothesis was that AI review could match or outperform human review quality, measured by outcomes: were the changes correct? Did they cause problems in production? How quickly were they reviewed and approved?
We started with a controlled pilot of over 100 PRs through the AI approval pipeline. The results: zero reverts of AI-approved PRs, and a 6–16x improvement in time-to-approval at the 75th percentile. Since then, the system has scaled significantly. In the first four weeks of broader rollout, 497 PRs went fully autonomous, with Claude writing the code and our AI approval system reviewing, approving, and shipping to production.
Beyond the approval pipeline itself, we also looked more broadly at how AI-authored code performs in production compared to human-authored code. AI-authored backend code had a revert rate of 0.53%, compared to 5.39% for human-authored. On the frontend, it was 0.22% versus 2.00%.
AI-authored code, reviewed and approved through our automated pipeline, is being reverted at a fraction of the rate of human-authored, human-approved code. I don’t expect that to stay at zero forever, but the evidence shows the quality bar our Agent holds is at least as high as the one humans were holding, and in many cases higher. And here’s the humbling perspective: the product changes that caused outages in the past? They were all reviewed and approved by humans. Human review is not a guarantee of safety. It never was.
Everything I’ve described—the sub-Agent architecture, the traceability, the labeling, the data—wasn’t just built for speed. It was built for auditability. Every AI-approved PR is labelled, logged, and queryable. The review comments, the approval decision, the test results, the merge event: all recorded. The evidence an auditor expects to see is the same whether a human or an AI approved the change. The “who” may change, but the “what” doesn’t. That’s how you meet SOC 2, HIPAA, ISO 27001, ISO 42001, and AIUC-1 without compromising agility.
We engaged our auditors, Schellman, early, before we scaled. We proactively worked with them to confirm that our automated review processes and the evidence they produce meet the requirements of our compliance frameworks, including SOC 2, HIPAA, ISO 27001, ISO 42001, and AIUC-1, among others. We think AI-driven change management can meet and exceed the standards that human-driven processes set, and we want to help prove that. In my experience, when you build for safety, compliance follows—never the other way around.
You can only go so far with PR review as a safety mechanism, no matter how good the reviewer is, human or AI. Only in production do you discover the unknown unknowns. The majority of Intercom’s largest outages weren’t even caused by changes to product code at all. They were infrastructure issues, unanticipated customer usage patterns, or third-party outages. PR review, whether human or AI, was never going to catch those. That’s why, in parallel, we’re also working on an Agent that proactively diagnoses issues in production. We’ll share more on this soon.
Speed has always been at the core of how we build at Intercom, not in spite of safety, but because of it. And we’re getting even faster with AI. It’s easy to assume that AI-approved PRs would lead to a drop in quality and safety but our data proves otherwise. Our heartbeat is just getting stronger. For product leaders, this is the blueprint: pair agentic AI with small batches, robust observability, and clear accountability, and you make shipping both faster and safer.
AI headlines are everywhere—and many claim they know exactly what’s coming next. In product management, I’m often asked to make single-point predictions about gen ai and LLMs for product managers. I resist that temptation because confident forecasts are seductive—and usually wrong.
Listening to Teresa Torres and Petra Wille unpack why certainty fails reinforced what I practice with my product trios: scenario planning. Instead of betting on one future, I explore several plausible ones, define the signals that would confirm or disconfirm each, and translate those insights into product strategy and product roadmapping and sprint planning we can adapt as evidence evolves.
Their argument mirrors what I see with customers and stakeholders: people are bad at predicting the future, and overconfidence creates fragility. Early adopters don’t represent everyone, so when we extrapolate from enthusiasts to the mainstream, we waste time and erode trust by building the wrong things.
Here’s how I apply this to avoid technology FOMO and make sharper AI Strategy decisions. I treat every bold claim as one possible future, then ask, “what else could happen?” I push extremes—AI everywhere vs. AI as invisible utility; GUIs vanish vs. GUIs evolve; centralized vs. edge compute—and hunt for the needs that stay true across scenarios. Those invariants anchor empowered product teams to outcomes, not outputs, and they help us stage bets responsibly.
Listen to this episode on: Spotify | Apple Podcasts
My key takeaways: Confident predictions are often wrong. Early adopters don’t represent everyone. Treat predictions as one possible future. Scenario planning > trying to be right. Focus on patterns, not hype.
In short: We’re in a period of change—but no one can predict exactly how it plays out. Strong predictions often ignore uncertainty.
A better approach in practice: Treat every prediction as a scenario. Ask: what else could happen? Use multiple futures to guide decisions.
As you evaluate roadmaps, watch for traps like “My experience = everyone’s future” thinking, over-indexing on early adopters, and ignoring real-world constraints like budgets, compliance, and change management.
Tactically, we run quick scenario exercises, push ideas to extremes to explore implications, and extract the underlying insight (not the exact prediction). This complements continuous discovery and helps us write outcomes vs output OKRs that are resilient to uncertainty.
00:00 – The problem with future predictions
04:00 – Why experts get it wrong
06:00 – Scenario planning explained
12:00 – Early adopters vs. reality
20:00 – AI, GUIs, and extreme takes
27:00 – Using scenarios in product work
34:00 – Final thoughts
Resources & Links:
Follow Teresa Torres: https://ProductTalk.org
Follow Petra Wille: https://Petra-Wille.com
Mentioned in this episode:
Claude Code
What did I miss—or what scenarios are you considering for your team? Leave a comment below and let’s compare notes.
I’m fascinated by how fast truly AI-native companies can move when the problem is urgent, the founders have deep domain credibility, and the culture is built around customer obsession from day one. Artemis, an AI-native security platform, just emerged from stealth with $70M in combined seed and Series A funding, assembled a 30-person team in seven months, and made a bold promise to “stay on a texting basis with every customer, even at scale.” As a product leader, I see this as a masterclass in AI Strategy, go-to-market focus, and disciplined execution in cybersecurity.
At its core, Artemis is operating in what I’d call an “AI vs AI” security war: increasingly, we’re defending against adversaries who leverage models just as aggressively as we do. That shifts the job from rule-writing to intelligence orchestration, threat detection and response at machine speed, and continuous evaluation. It also explains why AI-native companies are outperforming their AI-enabled counterparts—when intelligence is the product, the org must be built around model quality, data pipelines, and rapid iteration, not as a bolt-on.
Founder-market fit is the early signal I look for, and here it’s unmistakable. Shachar Hirshberg’s “AWS and Palo Alto” playbook and Dan Shiebler’s path “From Twitter to Abnormal” create a rare combination: deep infrastructure and enterprise security know-how paired with production-grade machine learning at scale. When those experiences intersect, you get crisp problem statements, faster learning loops, and credibility with the exact ICP that feels the pain first.
Timing the leap to build is more art than science, but I listen for three cues: customers describing the problem in quantified terms, a wedge that can deliver value within one buying cycle, and a data advantage that compounds. Artemis clearly identified a high-urgency buyer and ignored adjacent segments that would dilute focus—an underrated act of courage that accelerates product-market fit.
Hiring for AI fluency is a different exercise than traditional software roles. I don’t just screen for model familiarity; I screen for product thinking under uncertainty, a bias for eval-driven development, and the ability to explain tradeoffs to security teams. Practical prompts help: “How would you diagnose precision/recall tradeoffs under evolving threat patterns?” or “Show me how you’d design a red/blue evaluation harness for a new detection.” The best candidates can translate model metrics into business outcomes and customer trust.
Building a 30-person AI-native team in stealth requires ruthless clarity on the handful of roles that compound: forward deployed engineers who can ship with customers, solutions engineering that feeds learning back into the model, and product managers who treat data as the primary surface area. Culture-wise, I anchor on two rituals: weekly customer debriefs with actual artifacts (alerts, misclassifications, escalations) and a written log of hypotheses, evals, and next bets—so the entire team can reason from the same evidence.
AI implementation reshapes the dashboard. Beyond the usual business KPIs, I watch a second layer: model precision/recall by scenario, alert fatigue reduction, time-to-first-signal on emerging threats, drift and data freshness, and latency under load. When these improve, downstream product metrics—activation, expansion, NRR—almost always follow. Observability isn’t an afterthought; it’s the control center for trust in AI-driven cybersecurity.
ICP discipline is non-negotiable. Artemis focused on the segment with the highest urgency-to-adopt and the clearest data pathways, and deliberately ignored a seemingly attractive adjacent ICP that would slow learning. I’ve made that trade myself: it feels painful in the short term but pays off in faster cycles, cleaner roadmap decisions, and better founder-led GTM.
Closing the first customers is where the magic happens—and where the most surprising signals of early product-market fit emerge. It’s rarely about feature breadth. It’s about whether customers escalate, volunteer data, and invite your team into their workflows. In founder-led sales, the most valuable insights come from the objections you lose on. I document every “no,” cluster them by root cause, and turn the top two into experiments within a sprint.
I also believe the first product should make founders a little uncomfortable—just enough to prove the thesis in the messiest, fastest path possible. In AI security, that often means prioritizing the smallest end-to-end loop that can stop or downgrade a real threat, even if the initial UX is rough. If the loop works, you’ll earn the right to harden it.
Co-founder dynamics matter as much as the roadmap. I liked the question “Should we be arguing more?” because it reframes conflict as a system. My rule: disagree in writing with a time box, escalate only the principle in dispute (not the plan), and commit to the decision with a pre-agreed review point. This keeps speed without calcifying bad calls.
On structure, I’m convinced AI-native beats AI-enabled for this market. Organize around data, evaluations, and deployment rather than traditional feature teams. Blend product, research, and solutions into durable, customer-facing units. Consider forward deployed engineers who can ship safely in live environments and bring back the sharpest, most actionable learning. It’s the only way to keep pace with adversaries that iterate as fast as you do.
The broader landscape provides context and competition. I benchmark capabilities and go-to-market motions against players like Abnormal, CrowdStrike, and Palo Alto Networks, with respect for the automation lineage from Demisto (now Cortex XSOAR). Cloud scale and data gravity from Amazon Web Services (AWS) matter, while model innovations from OpenAI and Anthropic raise the offensive and defensive bar. And Artemis is staking a claim in that intersection—where security outcomes, model excellence, and frontline customer intimacy meet.
If you care about AI risk management, threat detection and response, and building empowered product teams that can win in this “AI vs AI” environment, the lessons here are clear: hire for AI fluency, not just titles; instrument the model like a business; let founder-led GTM shape your roadmap; and keep the customer close enough that you can text them—because that’s how you outlearn the market.
Scaling a real-world marketplace from scrappy to dominant takes a different kind of product leadership. Reflecting on Christopher Payne’s decade leading DoorDash as President and COO — growing from roughly 70 employees to the dominant food delivery platform in the US — I’m struck by how much of that success hinged on mastering an atoms-based business while still operating with software-level rigor. As a VP of Product Management, I see the same patterns in my own work: relentless clarity on inputs, a bias for builder-executives, and a cadence that keeps leaders close to product details without becoming bottlenecks.
Running an atoms-based business versus a pure software company forces you to obsess over operational physics: unit economics, quality control, on-time reliability, and dense local liquidity. It’s precisely where traditional “bits” executives can stumble. What’s worked for me is a simple “plate spinning” framework for executive attention: identify the five or six plates that must never stop — customer experience, marketplace health, quality and safety, product velocity, platform reliability, and P&L — then schedule recurring deep dives to keep those plates spinning. If a plate wobbles, I drop in, fix the root cause, re-instrument the inputs, and zoom back out.
Hiring at hypergrowth speed only works when you bias toward a “builder mentality.” I look for executives who run toward fuzzy problems, write clearly, and can prove they’ve shipped value with incomplete information. Prior industry experience can be a liability when you’re reinventing the market; first-principles thinkers outlearn domain experts who try to port yesterday’s playbooks. In executive hiring, I’ve found structured work samples and narrative memos far more predictive than marathon interview loops — companies routinely spend too much time on job interviews and too little time evaluating how candidates think and execute.
Great executives never outgrow the details. Staying close doesn’t mean micromanaging — it means sampling the customer journey and instrumenting the system so you can feel where it hurts. In my own practice, I rotate through frontline touchpoints weekly: support transcripts, NPS verbatims, failed checkout sessions, and reliability dashboards. Small signals often reveal systemic issues. A single ciabatta bread moment — the kind of edge-case substitution that seems trivial — can expose broken handoffs, unclear policies, and misaligned incentives across the marketplace.
Top-down goal setting beats bottom-up when you’re aiming for category leadership. Bottom-up targets tend to regress to comfort; they calibrate to today’s constraints, not tomorrow’s possibilities. I set ambitious, top-down outcomes (not output), frame the non-negotiables, and map driver trees to clarify the input metrics that matter. Then I ask empowered product teams to pressure-test the plan, propose approaches, and own the how. This preserves ambition while unlocking creativity — a practical balance of clarity and autonomy that outcomes vs output OKRs were designed to achieve.
One-size-fits-all management is a myth. Early-stage teams need hands-on coaching and fast decisions; later-stage teams need mechanisms that scale: crisp PRDs, pre-mortems, and operating cadences that separate strategy, planning, and execution. The mark of a high-functioning executive team is not uniform style — it’s high candor, fast escalation paths, and visible commitment after debate. In tough moments, a little charisma goes a long way; in practice, that’s not theatrics, it’s steady optimism, simple language, and consistent follow-through that keeps people moving forward.
The hypergrowth skill stack for executives is surprisingly learnable: ruthless prioritization under uncertainty, narrative writing that aligns cross-functionally, structured delegation with clear “inspection points,” and a weekly rhythm that protects maker time. I leverage a cadence of business reviews (inputs > outputs), customer-scent checks, and decision logs so we can move fast without losing the thread. CEO and executive time management is the ultimate forcing function — if we can’t show where our attention maps to goals, the team won’t either.
Some of my enduring lessons echo the best of Amazon and eBay: customer obsession beats competitor obsession, input metrics beat lagging vanity metrics, and simple mechanisms beat heroics. From Jeff Bezos’s playbook I borrow the insistence on written narratives, single-threaded ownership, and clarity on what will not change. Those principles remain the backbone of platform scalability and resilient product strategy, especially when markets get noisy.
AI is about to flatten organizations. With agentic AI, retrieval-first pipelines, and AI workflows embedded into product development, managers can widen their span without losing fidelity. I see LLMs for product managers accelerating discovery, PRD drafting, and experiment analysis — while raising the bar on decision quality. The implication for leadership: fewer layers, more transparency, and even greater pressure to define sharp, top-down outcomes that teams can autonomously pursue.
If I had to compress this into a playbook, it’s this: set audacious, top-down goals; keep your “plate spinning” calendar sacred; write more than you talk; hire builders, not resume archetypes; sample the customer journey every week; and build mechanisms that make the right thing easier than the heroic thing. That’s how you scale product management leadership from dozens to thousands — in atoms, in bits, and in the messy, exhilarating space where they meet.
Product teams rarely fail because they don’t ship enough features; they fail because they don’t learn fast enough. That’s the core tension I manage every day: when to build to learn and when to build to earn. Navigating that balance is how we protect focus, accelerate time-to-value, and ultimately deliver durable business impact.
Over the years, I’ve seen at least two major ways to develop product: build to learn and build to earn. The first is discovery-led and evidence-seeking; the second is delivery-led and value-capturing. Both are essential. The real craft is knowing which mode to be in, when to switch, and how to keep stakeholders aligned around outcomes instead of output.
The project model remains the default in many organizations—even in the age of AI—and it’s all about output. Stakeholders or executives assemble a prioritized roadmap of features and projects, and teams ship against it. This can create momentum, but without clear outcome metrics and customer validation, it’s easy to drift into a feature factory that looks productive while missing the mark on user value and business results.
When I build to learn, I emphasize continuous discovery. That means using customer interviews to surface unmet needs, running lightweight prototypes to test desirability and usability, and deploying A/B testing to quantify impact. I map assumptions, risks, and opportunities with an opportunity solution tree, and I timebox experiments so we learn fast and cheap. The standard is evidence, not opinions—especially my own. The goal is simple: reduce uncertainty before we scale.
When I build to earn, the objective shifts to capturing value with confidence. Here I align teams to outcomes vs output OKRs, commit to clear acceptance criteria, and ensure product roadmapping and sprint planning reflect the highest-leverage bets we validated in discovery. Delivery excellence matters: crisp definition, reliable release trains, observability, and a strong feedback loop to confirm we’re moving activation, conversion, or retention in the intended direction.
Deciding when to transition from learning to earning is all about thresholds of evidence. I look for leading indicators that our solution reliably solves the target problem, shows a measurable lift in key behaviors, and can be delivered with acceptable risk. If we can’t articulate the expected outcome and how we’ll measure it, we’re not ready to scale. If we can, we invest, monitor impact, and keep guardrails in place to avoid scope drift.
The operating model that makes this sustainable is simple and disciplined. I rely on empowered product teams organized as product trios (product, design, engineering) to run dual tracks of discovery and delivery. We socialize learning with stakeholders early and often to strengthen trust and stakeholder management. We elevate strategy by linking every roadmap item to a problem statement, a testable hypothesis, and a quantified outcome—no orphan features, no vanity launches.
In the AI era, speed can tempt us back into shipping-by-idea. I use gen AI for product prototyping and insight synthesis, and I lean on LLMs for product managers to accelerate discovery work—without treating AI as a shortcut to validation. Our AI Strategy clarifies where AI augments discovery, where it powers the product, and how we evaluate risk, so we move faster without compromising rigor or ethics.
My rule of thumb: spend just enough time building to learn to achieve conviction, then shift decisively to building to earn—while preserving a small discovery cadence to keep learning alive. This rhythm protects focus, compounds insight, and makes growth more predictable. It’s how we avoid the output trap, deliver meaningful outcomes, and create products that customers love and the business celebrates.
Churn is the silent tax on growth, and I treat churn prediction as a core product capability—not a side project. Over the years, I’ve led teams through multiple implementations across different data maturities and go-to-market motions, and the same question keeps returning at kickoff: what’s the smartest path to impact now and defensibility later?
“Should you build or buy your churn prediction model?” The right answer depends on time-to-value, data readiness, available talent, and whether churn prediction is a true differentiator for your product strategy or simply a must-have capability to power customer success and product-led growth.
When speed and coverage matter most, I start by evaluating category platforms that pair behavioral analytics with activation. As one example, vendors emphasize immediate business outcomes such as integrations, in-app guides, and workflow triggers that help you act on risk signals fast—without waiting months for model training or data engineering.
Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.
Buying makes sense when you need rapid time-to-value, opinionated best practices, and a unified analytics platform to operationalize insights through product tours, in-app guides, and CRM integration. In these cases, I’m optimizing for coverage, consistent signal quality, and ease of activation for customer success—so the team can focus on interventions, not infrastructure.
Building is compelling when churn prediction is a source of competitive differentiation or you have proprietary signals others can’t access. If your product generates unique behavioral data, requires custom anomaly detection or explainability constraints, or must blend usage telemetry with domain-specific risk scoring, a tailored model can raise precision and unlock novel retention levers.
My hybrid approach has become a reliable playbook: buy first to establish a strong baseline and close the activation loop, then selectively build where proprietary data and context yield outsized gains. I use retention analysis to identify high-signal behaviors, then iterate with A/B testing and a clear minimum detectable effect (MDE) to validate uplift before committing engineering capacity.
Total cost of ownership is non-negotiable. I account for more than license or training costs: ongoing data engineering, feature pipeline maintenance, model monitoring for drift, and AI risk management all add up. Strong data governance, privacy-by-design, and regulatory compliance must be baked in—whether I build, buy, or blend both.
Activation determines real ROI. Predictions that don’t flow into customer success workflows, lifecycle messaging, or in-product nudges rarely move Net Recurring Revenue (NRR). I prioritize tight integrations that enable targeted experiments—journey mapping, contextual tooltips, and timely outreach—to reduce friction and increase user engagement at the moments that matter.
My quick decision test: buy if time-to-value and adoption are the immediate goals; build if proprietary signals and explainability are core strategic assets; blend if you want fast wins now with room to differentiate later. Answering the build vs. buy question through this lens consistently improves retention, accelerates product-led growth, and keeps teams focused on the customer experience rather than plumbing.
Turning a rambling stream of consciousness into a clean task list while someone is still talking has been a longtime product dream of mine. With Ramble, Todoist brought that dream to life by using live audio AI to capture tasks in real time—no transcription step required. The result is a voice-to-task flow that feels natural, fast, and surprisingly disciplined.
As I listened to the Doist team—Ernesto Garcia (Front-end Product Engineer), Thomas Jost (Backend Software Engineer), and Hugo Fauquenoi (Product Manager)—walk through their approach, I heard a blueprint for building pragmatic GenAI features. What began as a two-to-three month AI exploration became one of their most technically deliberate releases: a “Gemini-powered pipeline that makes tool calls while the user is still speaking, surfacing tasks on screen in real time without any text output from the model.”
The breakthrough started with user research. People weren’t merely dictating tasks; they were doing a “brain dump” first—often into pen and paper or even ChatGPT voice—and only then committing items to Todoist. Meeting users where they already are reframed the problem: don’t force structure upfront; capture fluid thought and translate it into actionable tasks instantly.
That insight led to a bold architectural choice: skip transcription entirely and process raw audio directly with a Gemini live audio model. By removing the brittle middleman of text, the team reduced latency and kept the model focused on one job—turning intent into structured actions. It’s a crisp example of AI workflows designed for reliability over novelty.
The real magic is in the real-time “tool calls.” As the user speaks, the model triggers add task, edit task, and delete task operations immediately. For high-friction contexts like driving, they paired visual task cards with subtle sound effects as confirmation cues. It’s thoughtful conversation design that respects attention and safety without sacrificing speed.
Teaching the model to capture tasks literally—without over-interpreting or trying to complete the work—required careful prompt engineering for voice and temperature tuning. Drawing a bright line between “capture versus do” kept the experience trustworthy. In my own AI Strategy work, I’ve found that establishing explicit agentic guardrails early prevents unintended autonomy later.
Dates were the sleeper challenge. The team had to inject the current date, normalize to days vs. months, and always output dates in English for the natural language parser—while preserving the user’s original language for everything else. If you’ve ever shipped date handling across locales, you’ll appreciate how many edge cases hide in “Taming Dates and Time.”
Quality didn’t hinge on intuition alone. They built an LLM-judge eval system using real employee recordings from 100+ people across 35 countries in 20+ languages to catch prompt regressions. That’s eval-driven development done right: representative data, repeatable scoring, and tight feedback loops as models and prompts evolve.
For project and label matching, they chose direct context injection over RAG. Instead of building a retrieval pipeline, they injected the full project/label list into the system prompt. With smart context window management and a sharply constrained task schema, this was both simpler and more accurate. Sometimes the fastest path to product-market fit is removing moving parts, not adding them.
One product principle stood out: easy correction beats perfect first-time accuracy. Natural language interfaces earn trust when users can fix misfires in a tap or two. That bias toward quick recovery over false precision is how you ship AI that feels useful from day one.
Looking ahead, the roadmap is compelling: multimodal task capture from images and text blobs, Apple Watch support, and automation integrations. As voice AI agent patterns mature, this “tool-only architecture” sets a solid foundation for going from capture to coordinated execution—without losing the simplicity that makes Ramble shine.
If you want to hear the full conversation, you can listen on Spotify or Apple Podcasts. It’s a masterclass in building focused GenAI features that trade cleverness for clarity—and still delight.
Resources & Links: Todoist • Doist • Google Vertex AI (Gemini)
Every so often, a single line captures the essence of platform thinking at scale. "Vinay is a Staff AI Engineer at Amplitude. He builds the foundational AI platforms that empower internal innovation and help define the future of AI analytics." That statement crystallizes the mandate many of us share: create durable AI capabilities that compound value across teams, products, and customers.
When I think about "foundational AI platforms" in the context of Amplitude analytics and behavioral analytics, I see more than infrastructure. I see a product strategy choice: invest in a unified analytics platform that lowers the cost of experimentation, increases the trustworthiness of insights, and speeds time-to-learning for empowered product teams. That’s the engine behind sustainable product-led growth.
For me, the platform blueprint starts with three layers: high-quality data foundations (schema design, governance, lineage), model lifecycle rigor (evaluation, observability, versioning), and safe, self-serve interfaces that meet teams where they work. Without strong data governance and clear accountability, even the smartest gen ai features struggle to gain adoption. With them, platform scalability and reliability become a competitive advantage—not just an operational checkbox.
Empowering internal innovation requires thoughtful constraints. I’ve seen the best teams pair self-serve tooling with guardrails: templates for use cases, bias and risk checks, and well-documented pathways from prototype to production. This balance turns AI Strategy from a slide into a system—one that helps teams decide when to build vs buy, how to measure value, and how to retire what no longer serves the roadmap.
Looking ahead, the future of AI analytics is about making intelligence ambient. That means stitching together event data, product usage, and customer context so insights surface exactly when decisions are made. It also means bringing gen ai responsibly into the workflow—summarizing behavior, explaining anomalies, and suggesting next best actions—while maintaining transparency and auditability.
My practical takeaways: invest early in shared components that everyone can use (feature stores, evaluation harnesses, data contracts); standardize interfaces so teams ship faster with fewer handoffs; and measure platform outcomes with product metrics, not just infrastructure metrics. Done well, this approach compounds: faster cycles, higher confidence, and a steady drumbeat of wins that reinforce a culture of learning.
In short, building the right AI foundations is how we unlock scale, create leverage for every team, and keep our edge in a dynamic market. That one line about building foundational AI platforms isn’t just a role description—it’s a north star for any product leader serious about shaping the next era of analytics.
Inspired by this post on Amplitude – Perspectives.