Tag: retrieval-first pipeline

4 Critical AI Risks Every CIO Must Tackle Now—and a Practical Playbook to Mitigate Them

I spend a lot of time with CIOs and IT leaders who are moving fast on generative AI. The momentum is real, but so are the risks. When AI touches core workflows, data, and customer experiences, we need a clear, pragmatic plan that blends AI Strategy with disciplined product management leadership and IT governance.

Learn about the risks that AI poses to IT teams, and how they can mitigate them.

Here are the four risks I see most often—and the playbook I use to de-risk delivery while preserving speed and innovation.

Risk #1: Shadow AI and data leakage. Teams experiment with unapproved tools, and sensitive data ends up in prompts, logs, or third-party services. Without strong data governance and privacy-by-design, even a small proof of concept can create outsized exposure.

How I mitigate it: start with an AI acceptable-use policy, data classification, and clear guardrails on what can be prompted. Deploy a redaction layer and secrets management before any model call. Favor a retrieval-first pipeline so models reason over vetted internal knowledge rather than raw or personal data. Conduct vendor due diligence and DPAs up front, and centralize audit logs to support regulatory compliance and incident response.

Risk #2: Hallucinations and unreliable outputs. LLMs are probabilistic; they can fabricate citations, numbers, or steps. In customer support and internal operations, this erodes trust and creates rework—especially when teams assume model answers are authoritative.

How I mitigate it: adopt eval-driven development with task-specific test sets, reference answers, and pass/fail thresholds that gate CI/CD. Ground models with retrieval, constrain outputs with schemas, and keep a human-in-the-loop for high-risk actions. A/B testing, error taxonomies, and continuous monitoring turn model behavior into measurable, improvable Web Vitals for AI reliability.

Risk #3: Expanded attack surface. Prompt injection, data exfiltration, supply chain risks in model providers, and insecure connectors can undermine existing cybersecurity controls. Traditional threat models often miss these new interaction patterns.

How I mitigate it: treat AI as a first-class asset in threat detection and response. Implement input/output filtering, allow/deny lists, content moderation, and strict isolation of tools and connectors. Red team prompts and tools regularly, rotate credentials, and codify runbooks with SRE and incident management for fast containment. Apply least privilege to agents, APIs, and vector stores, and monitor for anomalous tool-use.

Risk #4: Compliance, bias, and auditability gaps. As AI scales, questions about explainability, fairness, data residency, and retention move from theoretical to board-level. Without traceability, it’s hard to satisfy audits or respond to regulators.

How I mitigate it: embed privacy-by-design from the first sprint—data minimization, consent, purpose limitation, and retention controls. Maintain model cards, versioning, and lineage for prompts, datasets, and parameters. Centralize audit logs, set policies for high-risk use cases, and run periodic compliance reviews with security and legal. Cross-functional communities of practice keep changes aligned across product, engineering, and IT Leadership.

Operationally, I anchor AI initiatives to outcomes vs output OKRs, use empowered product teams and product trios to balance feasibility, value, and risk, and integrate model changes into CI/CD with quality gates. This creates a repeatable mechanism to ship safely, learn quickly, and scale what works.

If you’re standing up new AI workflows or hardening what you already have in production, this playbook gives you a practical path: drive adoption confidently, protect your data, and stay compliant while maintaining competitive velocity.

The bottom line: AI risk management isn’t a brake on innovation—it’s how we earn the right to go faster.

Inspired by this post on Pendo – Perspectives.

January 3, 2026
10 AI Business Models You Need Now: Proven Playbooks Turning Algorithms into Revenue

I’ve spent the past few product cycles re-architecting roadmaps around one simple reality: AI is no longer just a feature—it’s a business model. The companies winning market share are those that treat models, data, and workflows as monetizable assets with defensible moats, not science projects.

AI business models are rewriting value creation. Learn how smart teams turn algorithms into profit engines, reshaping entire industries.

From my seat in product leadership, I evaluate AI bets through three lenses: durable value (moat and differentiation), measurable outcomes (clear ROI), and unit economics (gross margins under real-world load). With that frame, here are ten AI business models I see performing now—and how I decide when to invest.

1) API-first Model-as-a-Service. I monetize foundation or specialized models via an API, priced by tokens, requests, or time-in-context. Success hinges on latency, accuracy, and “context window management” that balances quality with cost. This is where “consumption SaaS pricing” shines and where disciplined rate-limiting, observability, and SLAs build trust.

2) Vertical AI copilots. I package domain-specific expertise (legal, healthcare, finance, field service) into workflow-native assistants that surface next-best actions. Because these copilots live where work happens, I price on outcomes—time saved, revenue recovered, or risk reduced—aligning value with customer metrics and accelerating product adoption.

3) Agentic AI automation. When autonomous agents handle multi-step tasks across tools, I lean toward per-outcome or per-job pricing. Reliability is the moat, so I invest early in eval-driven development, robust guardrails, and human-in-the-loop QA. This model compounds fast once agents can execute end-to-end workflows with transparent audit trails.

4) Copilot add-ons inside existing SaaS. I’ve seen “AI Assist” tiers deliver immediate ARPU lift and retention gains. The playbook: start with high-frequency, high-friction jobs (drafts, summaries, enrichment), then expand to proactive suggestions. This aligns tightly with product strategy and lets me stage value without overhauling the core experience.

5) Insights-as-a-Service via data network effects. I transform exhaust data into benchmarking, predictions, and prescriptive recommendations—while honoring privacy-by-design and data governance. The more customers I onboard, the stronger the patterns, and the higher the switching costs. Pricing ties to seats plus an outcomes or value metric.

6) Retrieval-first pipeline for enterprise knowledge. I land with high-accuracy answers over customer data (search, summarize, cite), then expand into workflow automations. This “retrieval-first pipeline” reduces hallucinations, boosts trust, and creates defensibility through connectors, semantic indexing, and continuous relevance tuning—an ideal fit for LLMs for product managers prioritizing reliability.

7) Open source monetization. When I bet on openness, I monetize hosting, support, enterprise controls, and compliance features. The advantage is developer love and rapid iteration; the moat is operational excellence at scale, plus integrations customers rely on. This model converts community momentum into predictable revenue.

8) Marketplaces for prompts, skills, and agents. I create a platform for third-party extensions and charge a take rate on usage. The flywheel spins when developers see distribution, customers see breadth, and I enforce strong quality bars. The roadmap focuses on governance, discovery, and safe execution policies.

9) Solutions with forward deployed engineers. For complex rollouts, I pair product with specialized implementation to guarantee outcomes. Revenue blends software plus services, accelerating time-to-value and informing the roadmap with real-world constraints. Over time, learnings fold back into scalable, self-serve capabilities.

10) AI risk, security, and compliance tooling. As AI scales, so does the need for policy enforcement, monitoring, and auditability. I monetize via platform subscriptions that address model provenance, data leakage prevention, red teaming, and reporting. Strong “AI risk management” is now a purchasing requirement, not a nice-to-have.

How do I choose among these models? I start with the customer’s biggest workflow pain, map it to the fastest path to measurable outcomes, and align pricing with value creation. Then I build defensibility through data advantage, distribution, and governance. If a model deepens trust, improves margins, and compounds learning, it earns a place on the roadmap.

Inspired by this post on Product School.

December 24, 2025
What It Takes to Build AI-Powered Products: A Senior Engineer’s Playbook and Mindset

I spend my days partnering with technical leaders who bridge invention and impact. The role of a Senior Software Engineer at Amplitude working on AI-powered products epitomizes how engineering and product fuse to ship customer value with speed, safety, and conviction. In my world, that fusion isn’t accidental—it’s designed, measured, and relentlessly improved.

When I form product trios—engineering, product, and design—we clarify the problem, the target users, and the measurable outcomes before a single line of code ships. This is how empowered product teams operate: we trade feature wish-lists for hypotheses, align on success metrics, and commit to learning loops that turn ambiguity into progress.

On the technical front, modern AI systems demand a retrieval-first pipeline, robust data contracts, and a thoughtful orchestration layer for LLMs. I expect eval-driven development to be first-class: offline unit-style evals for prompts and policies, and online evals that track behavior changes and quality at scale. This rigor gives us confidence to ship, learn, and iterate without burning cycles on guesswork.

Velocity matters, and so does reliability. I look for CI/CD that makes small, safe, frequent releases the default, and for DORA metrics to shine a light on delivery health. Pair that with platform scalability, clear SLOs, and pragmatic SRE practices, and teams earn the right to move fast without breaking trust.

Responsible AI is non-negotiable. We operationalize AI risk management with guardrails, input/output filters, red-teaming, and human-in-the-loop review where stakes are high. Data governance and privacy-by-design ensure that our creativity never outruns our compliance—because durable products are built on durable trust.

Impact comes from evidence. I advocate for disciplined A/B testing, careful minimum detectable effect (MDE) planning, and retention analysis that ties feature work to real business outcomes. Clear analytics pipelines and transparent dashboards keep stakeholders aligned and make good decisions repeatable.

Ultimately, the Senior Software Engineer I want to collaborate with is a builder who balances systems thinking with customer empathy: someone who can design reliable architectures, instrument the work with meaningful evals, and co-lead discovery to de-risk the roadmap. When we combine that mindset with crisp execution, AI-powered products stop being demos—and start becoming indispensable.

Inspired by this post on Amplitude – Perspectives.

December 18, 2025
Master AI as a Product Manager in 12 Months: My 2026 Roadmap to Ship Smarter, Faster

AI isn’t a side quest for product managers anymore—it’s the skill stack that will define how we discover problems, prototype solutions, and ship value in 2026. Over the last few cycles, I’ve watched teams that embrace AI Strategy outperform on speed, signal, and stakeholder confidence. This roadmap is the approach I use to build capability in a structured, outcome-driven way—so we ship smarter, faster, and more impact-driven products.

"AI for PMs in 2026: why it matters, what to learn, and a 12-month AI roadmap to master product skills and ship smarter, faster, impact-driven products."

Here’s how I frame what to learn and why: focus on enduring capabilities first (problem discovery, experimentation, ethics), then layer the AI product toolbox (LLMs for product managers, retrieval-first pipeline patterns, AI workflows), and finally operationalize with outcomes vs output OKRs. The goal isn’t to sprinkle gen ai on everything—it’s to make better decisions, reduce cycle time, and unlock product-led growth in measurable ways.

Months 1–3: Foundations. I build literacy around model behavior and constraints, context window management, and prompting patterns. I pair this with data governance and privacy-by-design basics so we avoid rework later. Practically, I assemble an AI product toolbox (evaluation checklists, prompt libraries, retrieval-first pipeline templates) and apply them to product discovery—summarizing research, clustering feedback, and sharpening value propositions without losing critical nuance.

Months 4–6: Prototyping and evaluation. This is where ideas become testable artifacts. I use gen ai for product prototyping to create UX mocks, PRDs, and in-app guides rapidly, then validate with eval-driven development. I run lean experiments (A/B testing with a clear minimum detectable effect), wire up analytics to Amplitude, and track activation and retention signals. The mantra: instrument early, measure causally, and iterate based on evidence.

Months 7–9: Shipping AI-enabled workflows. I partner with product trios to integrate AI into real user journeys—customer support ai strategy, CRM integration, and guided onboarding are common wins. We explore agentic AI for complex multi-step tasks, add safeguards for AI risk management, and pressure-test systems with threat detection and response playbooks. As features reach production, we monitor deployment frequency and tighten feedback loops to protect quality while accelerating learning.

Months 10–12: Scale and governance. I operationalize what works with product roadmapping and sprint planning aligned to outcomes vs output OKRs. We codify playbooks for continuous discovery, define eval gates for new AI features, and unify analytics so teams can compare lift apples-to-apples. Stakeholder management matures into clear narratives: what shipped, what moved, what’s next—so leadership sees compounding value, not just activity.

Throughout the year, I keep the focus on real users and real metrics: fewer hops from insight to iteration, tighter loops between problem and prototype, and crisper communication around trade-offs. The result is a team that can translate AI capabilities into differentiated product experiences—reliably and responsibly. If you follow this path, you’ll enter 2026 with the confidence to lead, the systems to scale, and the evidence to prove it.

Inspired by this post on Product School.

December 17, 2025
Stop Tuning Prompts: How Context Engineering 10x’d Accuracy and Adoption in Our AI Platform

"The best AI products improve more through context engineering than prompt tinkering." I’ve seen this play out repeatedly in high-stakes, enterprise use cases: substantive gains come from how we curate, structure, and deliver context to models—not from wordsmithing. When we started treating context as a product surface, performance climbed, hallucinations dropped, and teams shipped with more confidence.

Here are four key decisions we made to improve our AI context.

First, we moved to a retrieval-first pipeline. We unified trusted sources—CRM records, support knowledge bases, product telemetry, and governance-approved docs—behind hybrid retrieval (semantic + keyword) with strong metadata ranking. This let us constrain generations to verifiable facts, apply privacy-by-design rules at the edge, and practice disciplined context window management so every token carried its weight. Freshness policies, source-level confidence scores, and lightweight schemas kept the system precise and auditable.

Second, we made eval-driven development non-negotiable. Every change to context assembly goes through offline evals and online A/B testing with clear acceptance thresholds (e.g., task success, groundedness, time-to-first-answer, and deflection rate). We sized tests with minimum detectable effect (MDE) and tied them to outcomes vs output OKRs so we weren’t just shipping more prompts—we were shipping measurable improvements that mattered to customers.

Third, we personalized context based on intent and role. We built AI workflows that detect user intent, segment by persona, and dynamically assemble context: recent account activity for customer success, policy-safe excerpts for finance, and fine-grained reasoning chains for product teams. For conversational and voice AI agent experiences, we combined short-term conversation memory with scoped, long-term account memory to preserve relevance without bloating the prompt. This agentic AI pattern ensured faster, safer, and more helpful responses.

Fourth, we operationalized context as a first-class platform capability. We invested in data governance (ownership, lineage, and redaction), instrumentation (Amplitude analytics for usage, retrieval hit rates, and failure modes), and CI/CD guardrails for context updates. Product trios partnered with SRE to monitor drift, while side-by-side comparisons and human-in-the-loop reviews turned frontline feedback into structured improvements. The result: a durable system that improves continuously instead of relying on one-off prompt tweaks.

Context engineering isn’t glamorous, but it compounds. By prioritizing retrieval-first design, rigorous evaluation, intent-aware assembly, and operational excellence, we transformed our AI features into dependable, enterprise-ready capabilities. If you’re serious about LLMs for product managers and sustainable AI Strategy, shift your energy from clever prompts to robust context—and watch adoption and trust follow.

Inspired by this post on Amplitude – Perspectives.

December 16, 2025
Inside a Staff AI Engineer’s Impact: How Cross-Functional AI Initiatives Drive Product Wins

When I think about the roles that truly move the needle on AI Strategy and product outcomes, the Staff AI Engineer stands out. This is the person who can translate research into repeatable AI workflows, partner with product to solve real user problems, and operationalize models in a way that scales. It’s where innovation meets accountability—and where product management leadership meets hands-on engineering craft.

Ram Soma is a Staff AI Engineer at Amplitude, leading various AI initiatives across the company. He has a background in data science and machine learning engineering.

What does that look like in practice from my seat? It starts with precise problem framing and measurable success criteria. I align with a Staff AI Engineer on eval-driven development and instrumentation so we can track impact from prototype to production. With Amplitude analytics operating as a unified analytics platform, we can quantify user activation, retention analysis, and feature adoption, then iterate through continuous discovery with tight feedback loops.

Execution quality hinges on robust experimentation. Together, we design A/B testing plans with minimum detectable effect (MDE) targets, isolate confounding variables, and build evaluation harnesses that reflect real-world UX constraints. We also agree on rollout strategies—staged deployments, guardrails, and observability—so we can learn safely while preserving customer trust and performance SLAs.

On the technical approach, I look for pragmatic architectures that balance speed and reliability: a retrieval-first pipeline for grounding, judicious use of LLMs for product managers to instrument prompts and policies, and agentic AI patterns only when task decomposition truly reduces complexity. Just as important are privacy-by-design and data governance practices from day one, because responsible innovation beats retrofitting controls after the fact.

Finally, the magic happens in empowered product teams and product trios. When product, design, and Staff AI Engineering operate with shared context and clear constraints, we compress decision cycles and ship value faster. That’s how AI initiatives evolve from demos to durable capabilities—and how we enable product-led growth with measurable results that customers feel, not just features they see.

Inspired by this post on Amplitude – Perspectives.

December 16, 2025
From Concierge to AI Marketing Engine: Inside Mowie’s Document Hierarchy Playbook

I’m constantly asked by SMB owners: What if your small business could have a full marketing team—automated content calendars, customer segmentation, and channel-specific posts—without the headcount? That question is no longer hypothetical; it’s precisely the promise behind Mowie, and the way they got there is a masterclass in practical AI product development.

I recently listened to Chris O'Connor (CEO) and Jessica Valenzuela (Co-Founder) of Mowie, an AI marketing platform built for small and medium-sized businesses in restaurants, retail, and e-commerce. Their story starts with a concierge marketing service—doing the work by hand for overwhelmed owners—and evolves into a fully automated AI product.

They walk through their "document hierarchy" approach: how Mowie crawls the web to build a "dossier" about each business, infers customer segments and marketing pillars, and generates quarterly content calendars with channel-specific posts. As a product leader, this is the kind of retrieval-first pipeline that consistently outperforms naive prompt chaining because it builds durable context before generation.

They also unpack the technical challenges of structuring unstructured data and the evolution from rigid schemas to loosely structured markdown. In my experience with LLMs for product managers, markdown becomes a flexible intermediate representation that’s easy to diff, trace, and feed back into models without brittle parsing.

Equally important, they use customer feedback—from calendar approvals to regeneration requests—as their primary evaluation signal. That’s eval-driven development in practice: close the loop with lightweight evals that reflect genuine user intent, not proxy metrics.

The planning model is elegant: the three mini-calendars—public events, business-specific events, and recommended campaigns—roll up into a coherent plan that eliminates the blank-page problem and enables steady, predictable execution.

Crucially, they’re building traceability so customers can see which context documents influenced their content. This kind of transparency increases trust, accelerates edits, and supports governance in regulated categories where auditability matters.

Onboarding and data collection stay pragmatic: let the system crawl first, ask humans only for deltas, and progressively profile over time. It’s a pattern I advocate in continuous discovery and AI workflows—keep humans in the loop without overwhelming them, and make the right action the easy action.

Early on, they used Simon Sinek's Golden Circle framework to validate demand and sharpen messaging. Framing the "why" before the "what" helps teams maintain a crisp value proposition and tighten their go-to-market strategy.

Performance measurement goes beyond vanity metrics by connecting marketing performance back to point-of-sale data for attribution. The ability to tie campaigns to revenue events is the bridge from clever content to accountable outcomes.

What’s next is equally compelling: deeper attribution, omnichannel expansion, and digital out-of-home displays. For SMBs, that points to a unified analytics platform spanning email, social, and in-store touchpoints—exactly where modern marketing is headed.

My takeaways for builders: invest in a retrieval-first pipeline with a resilient document hierarchy; prefer loosely structured markdown over rigid JSON when dealing with messy inputs; design human-in-the-loop controls that double as evals; and always connect activity to business outcomes. That’s how you turn an idea into a repeatable system that scales.

If you want to explore further, start here: Mowie AI — AI marketing platform for SMBs. For early validation and storytelling, revisit Simon Sinek's Golden Circle.

Inspired by this post on Product Talk.

December 11, 2025
6 AI Strategies to Accelerate Business Growth: Unlock Revenue, Cut Costs, Scale Faster

I’ve spent the last few years weaving AI into core product workflows, and the pattern is clear: when we pair disciplined product thinking with pragmatic AI Strategy, growth compounds. The question I hear most isn’t if AI can help, but where to begin and how to de-risk the journey while moving fast.

AI for business growth starts with one of these six strategies. See how companies use AI to unlock revenue, cut costs, and scale smarter and faster.

1) Revenue acceleration with unified customer intelligence. I start by connecting behavioral analytics and CRM integration to a unified analytics platform, then layer a retrieval-first pipeline so large language models can surface high-intent accounts, churn signals, and next-best actions. With Amplitude analytics and A/B testing, we validate AI-driven playbooks for upsell, cross-sell, and win-back—turning insights into measurable lift rather than novelty.

2) Cost reduction through targeted automation. Not all automation yields the same outcome. I look for repetitive, high-volume processes where quality is easy to verify—customer support ai strategy with AI-assisted deflection, accounts payable automation, and security workflows like threat detection and response. Combining agentic AI with clear guardrails reduces handle time, frees teams for higher-value work, and keeps error rates within acceptable thresholds.

3) Faster time-to-market via eval-driven development. Speed without signal is noise. I lean on eval-driven development to instrument models, measure drift, and tighten CI/CD loops. We track DORA metrics like deployment frequency while using gen ai for product prototyping to compress discovery and delivery. Frameworks and tools such as Claude Code help engineers iterate safely behind feature flags so we can ship learning, not just code.

4) Personalization that drives activation and retention. Growth sticks when onboarding is contextual. I use in-app guides, product tours, and thoughtful tooltip design powered by LLMs for product managers to tailor the first-run experience. With retention analysis and outcomes vs output OKRs, we align personalization with the moments that matter—activation, habit formation, and expansion.

5) Trust-by-design to scale responsibly. AI risk management, privacy-by-design, and data governance are not afterthoughts; they are growth enablers. By defining policy, red-teaming prompts, and practicing context window management, we reduce rework, limit incident management, and maintain compliance across markets. Clear review gates make it easier to say yes to more AI use cases without compromising customer trust.

6) Voice and agent experiences that feel like product, not add-ons. When prompt engineering for voice and voice AI agent patterns are integrated into the core journey—guided onboarding, smart handoffs, proactive notifications—engagement rises. Agent Analytics turns conversations into product signals we can act on in roadmapping and sprint planning, closing the loop between user intent and product improvement.

My playbook for getting started is simple: pick one revenue and one efficiency use case, define success upfront, and ship a narrowly scoped MVP with robust analytics. Use continuous discovery with product trios to refine prompts, data sources, and experience design. Then scale what works, retire what doesn’t, and let evidence—not hype—set the roadmap.

If you’re evaluating where to apply gen ai next, these six lanes offer fast paths to impact without sacrificing governance or customer trust. The companies I’ve seen win treat AI as a capability within the product, not a separate project—and they measure it with the same rigor they use for any critical feature.

Inspired by this post on Product School.

December 10, 2025
Make Every Answer the Last: Building a Self-Improving AI Support Engine for 2026

Once I’ve defined the right roles on my team, the next move is to design an operating model that makes progress a habit. My goal is simple: every interaction should strengthen the system so the AI Agent keeps improving over time.

I anchor the team on a mantra that has never failed me: “The first time you answer a question should be the last.” That single statement reframes support as a compounding system rather than a one-off activity.

The ambition is to ensure every resolution makes the next one faster and more accurate, so fewer issues repeat, quality compounds, and support scales naturally. That doesn’t happen by accident—it requires intentional design.

In practice, this comes down to four essentials: clear ownership of performance, guardrails that make iteration fast and safe, feedback loops that turn learning into routine upgrades, and a culture that celebrates the work of improvement—not just the outcomes. Here’s how I put that into play.

First, I start with clear ownership. Ambiguity is one of the most common reasons AI performance plateaus. When no one truly owns how the AI Agent performs, feedback gets lost, issues linger, and improvements stall.

On high-performing teams, I assign a single owner—often an AI ops lead—responsible for making the AI Agent better. They review resolution trends to spot underperformance, make targeted updates to content, configuration, and behavior, coordinate with product and engineering on systemic blockers, and set improvement priorities, targets, and timelines. The title matters less than the mandate; what matters is clear authority to drive change across teams.

Real-world example: At Dotdigital, AI performance plateaued after a strong start—resolving around 2,800 conversations per month for three consecutive months. To drive resolution rates up, the team created a dedicated support operations specialist role, filled by an experienced agent with deep product knowledge. This person will focus on refining snippets, improving content, and enhancing the AI’s resolution capabilities.

Second, I make iteration fast and safe. As the AI Agent takes on more volume and complexity, change can start to feel risky—so teams hesitate, and performance stalls. Lightweight governance fixes that by making the path from insight to action predictable.

I keep the rules simple and explicit: which changes need review (and which don’t), who the decision-makers are, how we test updates before they go live, where feedback flows so it’s seen and acted on, and when progress gets reviewed on a steady cadence. Governance isn’t bureaucracy—it’s what keeps improvement routine and safe.

Real-world example: Anthropic ran a focused “Fin hackathon” sprint to improve their AI Agent’s resolution rate. The team audited unresolved queries, identified underperforming topics, and created or updated content to close gaps. They converted frequently used macros into AI-usable snippets, monitored Fin’s performance during live support, and continuously refined content based on real interactions. This structured approach enabled rapid improvement while maintaining quality standards.

Third, I build a system that learns by default. AI performance isn’t static, but many organizations treat it like a one-time implementation. The most successful teams operationalize learning: they analyze where the AI Agent struggles and feed those insights directly into structured improvements.

The signals are straightforward: review common handoffs to humans, track unresolved queries by topic or intent, measure resolution rate trends over time, and use those inputs to prioritize fixes and content upgrades. Whether you follow a formal loop like the Fin Flywheel framework or something lighter, the goal is the same—make improvement inevitable.

Fourth, I treat content as competitive infrastructure. Your AI Agent is only as good as what it knows. As George Dilthey, Head of Support at Clay, put it: “That’s when we realized: AI doesn’t just come up with information out of nowhere, you have to feed it. We were spending all our time evaluating tools when we should’ve been focused on content.”

I operationalize knowledge like infrastructure: every topic has a clear owner, content is structured, versioned, and ingestion-ready, new products ship with source-of-truth content by default, and changes ship on a schedule—not when someone finds time. This is the backbone that differentiates teams who scale confidently from those who stall out.

In my organization, we’ve evolved our New Product Introduction (NPI) process by aligning early with R&D on a single, canonical source of truth that becomes the foundation for all downstream content—including what the AI Agent uses to resolve queries. By embedding content creation into launch readiness, not as an afterthought, we’ve consistently hit 50%+ resolution rates on new features from day one.

Finally, I make belief visible. Even the best system will stagnate if people stop believing in it. Belief can fade quietly unless you reinforce it on purpose. I keep it strong by sharing specific wins regularly, highlighting improvements with metrics, and recognizing the people behind the gains—then giving them space to lead. This isn’t just about morale; it keeps everyone aligned on the bigger play.

When you put it all together—clear ownership, safe iteration, a learning system by default, and content as infrastructure—AI performance compounds. As the AI Agent gets better, the entire support model becomes faster, more reliable, and truly scalable. That’s the foundation of a modern, AI-first support organization.

Next, I’ll take this a level deeper and share how capacity planning changes when AI handles the majority of inbound volume and your team shifts into higher-value roles. If scaling with confidence is the goal, this is where the operating model pays off.

Inspired by this post on The Intercom Blog.

December 9, 2025
Beyond Accuracy: The Trust-First Evaluation Metrics I Use to Scale High-Impact AI Products

When I assess whether an AI product is ready for prime time, I start with trust—not model accuracy. Accuracy is table stakes; trust is what earns adoption, drives retention, and unlocks durable product-led growth.

Evaluation metrics in AI products go beyond accuracy. Learn how product teams use trust-driven metrics to build reliable, growth-driving AI systems.

In practice, I organize trust-driven metrics into four layers: model quality and safety, user and business outcomes, operational reliability and cost, and governance and compliance. This layered approach keeps product trios aligned on what matters now, what must be gated in CI/CD, and what signals we’ll use to prove progress against outcomes vs output OKRs.

On model quality and safety, I care about precision, recall, F1, calibration, and abstention behavior, but also the hard-to-fake signals: hallucination rate, grounding and faithfulness, citation coverage, toxicity, bias, and fairness. For generative systems, I instrument refusal correctness (declining unsafe requests) and evidence adequacy (did the answer rely on retrieved, trustworthy sources).

User and business outcomes must be explicit. I track adoption, activation, task success rate, time to first value, win rate uplift in assisted workflows, CSAT and NPS deltas, and retention analysis by cohort exposed to AI features. For customer support scenarios, deflection rate, average handle time change, and first-contact resolution are core; for sales or ops copilots, I monitor cycle-time reduction and error-rate reduction in critical tasks.

Experimentation is non-negotiable. I design A/B testing with a clear minimum detectable effect (MDE), pre-registered guardrails for safety and quality, and sequential tests that stop early if harm outpaces benefit. Online metrics are always paired with offline evals so we can iterate quickly without exposing users to regressions.

Operationally, trust shows up as speed, stability, and cost predictability. I track latency end-to-end, time to first token, throughput, rate of 5xx and timeouts, cost per request, and caching effectiveness. We also trend safety incidents per 10,000 interactions and mean time to mitigation to keep reliability visible alongside performance.

Governance and compliance are part of the product, not an afterthought. Data governance and privacy-by-design metrics include PII exposure rate, data lineage coverage, access-control correctness, audit pass rate against internal policies, and model and prompt change traceability. This is the backbone of our AI risk management posture and accelerates regulatory compliance reviews instead of slowing them down.

The delivery engine for all of this is eval-driven development. We maintain golden datasets and scenario-based test suites that mirror real user intents, gate releases in CI/CD with minimum thresholds, and run canary rollouts to validate offline–online alignment. Every model or prompt update gets a comparable scorecard so product, engineering, and design can trade off quality, speed, and cost with shared facts.

For LLM-heavy features, retrieval-first pipeline metrics are mandatory. I monitor retrieval hit rate, recall at K, mean reciprocal rank, context contamination, and citation correctness. With large prompts, context window management matters: we track context utilization, truncation rate, and the contribution of each context block to final answers to avoid silently losing critical evidence.

Finally, trust must be legible. I package these metrics into an executive scorecard that maps to business outcomes, risk appetite, and OKRs, with clear thresholds for ship, improve, or roll back. When teams can articulate trade-offs—say, a 20% latency reduction at a small cost increase, or a lower hallucination rate at the expense of higher abstention—they build credibility with stakeholders and confidence with customers.

Trust is not a single number; it’s a system of evidence. By instrumenting these layers and operationalizing AI Strategy with rigorous, transparent metrics, we can ship faster, reduce surprises, and earn the right to scale AI features across the product portfolio.

Inspired by this post on Product School.

December 8, 2025
Crack the AI Search Code: How Startups Win Recommendations in ChatGPT and Perplexity

AI search is reshaping how customers discover emerging products, and I’ve seen firsthand how this shift rewards startups that speak clearly to both humans and machines. Learn how LLMs like ChatGPT and Perplexity decide which startups to recommend and what signals help a brand get discovered in AI search.

In practice, AI search behaves less like a list of blue links and more like a synthesis engine. These models look for credible, consensus-backed, well-structured sources they can cite with confidence. That means your brand’s discoverability hinges on technical clarity (schema, structure, speed), topical authority (depth, citations, expert bylines), and evidence of real-world adoption (reviews, case studies, third-party validation).

I start by mapping buyer intent across the entire journey—category exploration, problem framing, solution fit, integration needs, ROI, and competitive comparisons. Then I design a page system that answers each intent with precision: clear “About” and “Use Cases” pages, integration-specific pages, objective "X vs Y" comparisons, transparent pricing, and a living FAQ that mirrors the exact questions users ask in conversational queries.

Structure matters. I add JSON-LD schema for Organization, Product, FAQPage, HowTo, and Article where appropriate; keep canonical URLs consistent; and ensure titles, meta descriptions, and Open Graph data reinforce the same story. Clean sitemaps, a sensible robots.txt, and fast, mobile-first performance reduce friction for crawlers and increase the odds that LLMs extract accurate snippets.

Authority is earned off-site as much as on-site. I prioritize third-party signals—G2/Capterra reviews, analyst mentions, reputable press, open-source repos with README clarity, academic or industry citations, and credible partner integrations. LLMs heavily weight these external proofs when recommending solutions, especially for B2B and regulated categories.

On your site, demonstrate expertise. I include expert bylines with real credentials, cite primary sources, showcase customer outcomes with verifiable metrics, and make methodologies transparent. Shallow, keyword-stuffed posts don’t help; comprehensive, up-to-date explainers with references do.

Make your content retrieval-friendly. LLMs favor text they can segment, anchor, and quote. I structure pages with descriptive headings, short paragraphs, and linkable anchors; offer HTML-first documentation (not just PDFs); and provide copyable code or configuration steps when relevant. This also sets you up for a retrieval-first pipeline in your own product experiences.

From a product and platform angle, I expose trustworthy documentation and a clear trust center—security, compliance, data governance, and privacy-by-design content. When a user asks an LLM whether they can safely deploy your solution, these pages often get pulled into the answer.

Evaluation closes the loop. I run an eval-driven development process for content: a stable prompt set that mirrors real queries, regular tests in both Perplexity and ChatGPT, and analytics to track referrals from AI-driven sources. I iterate headlines, schema, and on-page structure, then tie changes back to engagement and pipeline using A/B testing where it’s appropriate.

Don’t neglect comparison and alternatives pages. Fair, well-cited pages that address trade-offs and points of parity build trust—and they give LLMs succinct, quotable language for recommendation contexts. Clarity beats hype every time.

Finally, keep your corpus fresh. I schedule quarterly content reviews, retire outdated claims, and highlight release notes and integration updates. Freshness signals help models favor your content when they resolve time-sensitive queries.

If you treat AI search as a product surface—one that rewards precision, provenance, and performance—you’ll dramatically increase your odds of being recommended where it matters. That’s how I operationalize AI discovery for startups: intent mapping, structured content, external authority, a retrieval-friendly corpus, and a rigorous eval loop.

Inspired by this post on Amplitude – Perspectives.

December 3, 2025
Why Pristine Data Wins: Accelerate AI Success with Governance, Structure, and Discipline

Every successful AI initiative I’ve led or advised has shared the same foundation: we treat data as a product. Models will improve, infrastructure will evolve, and use cases will expand—but only high-quality, well-governed, and well-structured data compounds value over time.

“Companies that prioritize data quality, governance, and structure will accelerate their AI initiatives the fastest.” That line has become a non-negotiable principle in my playbook because it consistently separates prototypes that stall from platforms that scale.

When I say data quality, I mean trustworthy signals: clear definitions, deduplication, lineage, and timely freshness. Governance adds accountability and safety: ownership, access controls, auditability, and privacy-by-design aligned with regulatory compliance. Structure makes it all usable: consistent schemas, event taxonomies, and feature stores that let product teams ship faster without reinventing pipelines.

In practice, this looks like aligning an AI Strategy with a unified analytics platform so every team works from the same truth. It means instrumenting feedback loops, labeling outcomes, and building a retrieval-first pipeline that brings the right context to LLMs at the right time. It also means thoughtful context window management so models remain grounded, relevant, and cost-efficient.

I’ve seen the difference firsthand. Early gen ai prototypes built on messy, conflicting data looked promising in demos but failed in the wild—hallucinations spiked, confidence scores dipped, and user trust eroded. Once we tightened governance, standardized schemas, and implemented human-in-the-loop evaluation, accuracy climbed, risk dropped, and feature velocity increased without sacrificing safety.

For product managers, the mandate is clear: treat data work as core product work. Define quality SLAs, make data contracts explicit, and give empowered product teams the tools to observe, debug, and improve signals continuously. Pair AI risk management with measurable product outcomes, and you’ll turn experimentation into a durable advantage.

The payoff is more than model performance; it’s organizational clarity and speed. With the right data foundation, LLMs for product managers become easier to deploy, customer experiences feel coherent, and roadmaps shift from firefighting to compounding wins. Invest in data quality, governance, and structure now, and your AI initiatives won’t just move faster—they’ll sustain momentum.

Inspired by this post on Amplitude – Best Practices.

December 2, 2025