Tag: A/B testing

  • Why MCP Is Transforming Product Management: Field-Tested Lessons from Miro, Atlassian & More

    MCP is the acronym I keep hearing in every product conversation—and for good reason. When teams like Miro and Atlassian lean in, it signals a real shift in how we design, ship, and scale value. From my vantage point leading product at HighLevel, I see MCP less as a feature and more as an operating advantage: a way to align strategy, execution, and governance so product teams move faster with higher confidence.

    When I evaluate a platform like MCP, I start with three questions. First, does it advance our product strategy and sharpen competitive differentiation? Second, does it strengthen product-led growth by improving activation, onboarding, and retention? Third, does it help us drive outcomes vs output OKRs so we consistently measure what matters, not just what ships?

    Execution discipline makes or breaks any MCP investment. I design measurement upfront: instrument A/B testing, define activation milestones, and monitor retention cohorts. In parallel, I use Pendo for in-app guides and product tours to accelerate adoption and reduce time-to-value, then connect this data back to roadmap decisions so each release compounds learning instead of creating noise.

    On the operating model, I apply a rigorous build vs buy lens and stress-test platform scalability, reliability, and integration surfaces. Stakeholder management is critical—security, SRE, and solutions engineering must be partners from day one. I anchor teams in product trios and continuous discovery so we learn with customers in the loop, not after the fact.

    At Pendomonium 2026, Pendo CPO Rahul Jain brought together four product leaders who are building with MCP. Read or watch their conversation to learn more.

    My practical playbook for MCP: choose one high-signal use case, define clear success metrics, and run a tightly scoped pilot with visible executive sponsorship. Treat governance and data hygiene as first-class requirements. Close the loop weekly with qualitative insights from customer interviews and quantitative telemetry from experiments. Only then scale to adjacent workflows, keeping a steady focus on measurable customer value and repeatable delivery.

    Whether you’re an emerging startup or an established enterprise, the opportunity is the same: turn MCP curiosity into durable capability. With disciplined measurement, thoughtful stakeholder alignment, and a relentless outcomes mindset, MCP can become a lever for product management leadership—not just another acronym in the stack.


    Inspired by this post on Pendo – Best Practices.


    Book a consult png image
  • Cracking the Hardest Percentages: Turn Complex Support into Scalable, Trust-Building Automation

    Cracking the Hardest Percentages: Turn Complex Support into Scalable, Trust-Building Automation

    I’ve learned that the smallest slice of your support queue often dictates the majority of your operating cost, customer memory, and automation ceiling. In product reviews and CX ops deep-dives, I see the same pattern: the “easy” tickets pad your resolution counts, but the complex, multi-step queries quietly own your handle time and your brand trust. If you care about compounding impact, your customer support AI strategy has to target that hardest percentage first.

    Complex queries are a small percentage of your queue, but they consume a disproportionate share of your team’s time.

    Take a typical queue: password resets outnumber refund disputes ten to one, but a reset takes five minutes and a dispute takes thirty. The “rare” query accounts for over a third of total handling time. The same pattern holds for account investigations, subscription changes, and billing disputes.

    How you handle complex queries is also what customers actually remember about their support experience. When someone is dealing with a damaged order or a billing dispute, the stakes are higher, and a fast, good resolution is what separates a forgettable interaction from one that builds lasting trust.

    Most AI Agents automate the easy, informational queries well. The question for your automation rate is whether they can handle the hard ones. That’s where agentic AI and robust AI workflows make or break your outcomes.

    We’ve gotten really good at informational queries – the hard part is what comes next. I’ve seen teams invest deeply here, and for good reason: it lifts containment quickly and cheaply. But to break through the plateau, you have to execute actions across systems, not just answer with text.

    We’ve invested deeply in informational Q&A. We built Apex, a specialized customer service model trained on billions of support interactions, as Fin’s core answering engine. Beneath that sits a custom retrieval model, a purpose-built reranker, and a unified RAG pipeline, all trained specifically for customer service. Fin resolves issues at a higher rate than general-purpose frontier models, with fewer hallucinations and at lower cost.

    But informational Q&A only covers queries where text is the answer. Most Agents can handle that. Far fewer let you configure complex, multi-step actions without a forward-deployed engineer setting it up for you, which creates a gap.

    Every query your team handles falls into one of three categories:

    Informational: “Can you ship transatlantic by priority next day?” Answered with text from your knowledge base.

    Personalized: “Where is my order?” Requires data unique to that user.

    Action-led: “My order arrived damaged, I need a refund.” Requires doing something: checking a return window, cross-referencing transaction data, making a judgment call – reading from multiple systems and acting across them.

    Dark-themed line chart of percentages from Jan 2026 to Apr 2026. An orange line with circular markers climbs steadily, pauses briefly mid‑period, then spikes sharply to a new high near the end of the timeline.
    From Jan to Apr 2026, the trend moves steadily upward, pausing briefly before a sharp late surge. A clear snapshot of momentum for customer service KPIs, finance results, and the impact of new procedures.

    These complex queries, the ones that require multi-step processes across systems, aren’t edge cases; they’re the reason your support team exists. This is the gap Fin Procedures was built to close.

    It works in practice, and the trajectory matters for product strategy and ops planning.

    Procedures is live, it’s scaling, and the results are clear. Since launching in managed availability, Procedures has handled over 1.5 million conversations, and volume is doubling month over month across hundreds of apps in fintech, e-commerce, gaming, healthcare, and SaaS.

    When customers hit complex, multi-step queries, the experience is dramatically better when Fin can do the work end-to-end. We tested this with a randomized 5% holdout – conversations where Procedures would normally run, but didn’t. CSAT was 28.93% higher when Procedures ran, a statistically significant result.

    A product, not a services engagement. I’ve sat through too many “automation” projects that were really solutions engineering gigs: workshops, custom scripts, then a queue of change requests when policies shift. It’s fragile and slow.

    The B2B AI industry has a consultingware problem. It’s not databases being forked anymore, it’s prompts. The economics of maintaining bespoke setups per customer don’t work. Either the application falls behind new models, or the vendor changes the model and quality degrades invisibly.

    In my view, an agentic AI platform should be a product your team owns end to end: a natural language editor – literally paste your existing SOPs – branching logic, data connectors, and AI-powered simulations for testing. Your CX ops team configures this, iterates on it, owns it. If you need help, a forward-deployed team can assist, but they’re optional, not a dependency. You always have control.

    And because it’s a unified product, improvement compounds. When the vendor optimizes a prompt, every customer’s Procedures get better. When they upgrade the model, they can A/B test across the entire customer base and know it’s better before rolling out. You can’t do that when every customer has a bespoke prompt. The consulting model isn’t just expensive, it’s structurally unable to compound.

    Today, Fin Procedures is available to every Intercom customer – no waitlist or managed rollout, ready for all 8,000+ customers.

    We’re iterating fast based on real customer feedback. Here’s what’s landed since the last major update, and why it matters for reliability and governance:

    AI-powered Procedure review: Flags broken logic, missing references, and unreachable conditions before you deploy.

    Promotional banner reading "Get started with the #1 Agent today" over a dark, aurora-like gradient background, featuring a white button labeled "Start a free trial"; marketing graphic for an AI support agent.
    Kick off your journey with the #1 Agent—an AI partner designed to turn resolutions into real outcomes. Tap “Start a free trial” to explore faster, smarter customer service and see how Fin delivers value from day one.

    Procedure failure reporting: A new reporting dimension that lets you drill into conversations where Procedures failed, so you can diagnose and fix.

    Version history with rollback: Track every change, compare versions, roll back if needed.

    Data connector health monitoring: See at a glance if your integrations are healthy, degraded, or failing.

    Optional data connector parameters: Fin only asks customers for information when it’s actually needed, instead of prompting for every field.

    Email Simulation support: Test how your Procedures behave across chat and email before going live.

    Agent in the Loop (Beta) unlocks the next tranche of automation. Even with Procedures, two things hold teams back from automating their most complex queries: missing integrations and policies that require a human sign-off on sensitive decisions.

    “Agent in the Loop” is built for both. Need Fin to check your internal admin tools but haven’t built a data connector yet? Put a human checkpoint at that step. Fin handles the conversation, gathers context, and pauses, surfacing a structured summary for a human agent to verify or act, then resumes. You get automation on the 80% that doesn’t need the integration.

    For compliance – identity verification, high-value refunds – Fin does the legwork, a human makes the final call and then hands it back to Fin. This works natively in the Intercom Inbox and via Slack. Some competitors don’t have an inbox-native variant at all, meaning humans need to leave their primary workspace to review AI actions.

    Procedures are also built to let you collaborate with all your teammates – both human agents and AI Agents. Fin can work with them directly inside a Procedure, using APIs and webhooks to loop in another teammate mid-flow, hand off context, and pick back up once they’re done.

    Making it easier, faster. Procedures is already self-serve, but the next step is making Procedure creation, testing, and maintenance significantly more streamlined and easy to do, with less manual editing and more AI-assisted building and debugging. There’s a lot coming in this space over the next few months – and it aligns perfectly with a retrieval-first pipeline and stronger governance at scale.

    The hardest percentages matter the most. The biggest unlock for your automation rate won’t be answering more FAQs, it will be handling the complex, multi-step queries that consume your team’s time and define what customers remember about their experience with you.

    That means working with an Agent that goes beyond answering questions and executes processes. A product your team owns and configures, not a service you buy and hope gets maintained. And a platform where every improvement compounds across every customer. That’s Procedures. Available now, for everyone.


    Inspired by this post on The Intercom Blog.


    Book a consult png image
  • Stop Drowning in Tasks: How AI Marketing Agents Restore Focus and Maximize Impact

    Stop Drowning in Tasks: How AI Marketing Agents Restore Focus and Maximize Impact

    Every week I meet marketers who are working harder than ever—more campaigns, more content, more dashboards—yet seeing less movement on metrics that matter. The surge of AI tooling has amplified activity, not necessarily impact. That’s the focus problem: we confuse motion with momentum, and our backlogs look great while our outcomes stall.

    Learn how AI agents for marketing can help you prioritize impact so you can do important work, instead of just more work.

    In my role leading product and growth teams, I’ve learned that AI only compounds value when it is pointed squarely at outcomes. If we don’t define what “good” looks like, agentic AI will simply scale busywork. The antidote is a disciplined operating model that connects strategy to execution and instruments agents with clear success criteria.

    First, anchor your program with outcomes vs output OKRs. Choose one or two measurable business outcomes—such as qualified pipeline, conversion rate, or activation—and make everything else subordinate. This provides the compass agents need to make effective trade-offs when speed and volume tempt you to do “one more thing.”

    Second, map a driver tree from the target outcome down to the controllable levers: audience segments, offers, channels, messaging, and experience friction. This traceability shows where agents can move the needle fastest—whether that’s accelerating research, sharpening positioning, or eliminating handoffs that slow experimentation.

    Third, design a small, agentic AI workforce aligned to those levers. For example: a Research Agent that synthesizes market insights and past performance; a Copy Agent that generates on-brief, on-brand variants; a Distribution Agent that adapts content to each channel and schedules posts; and an Analytics Agent that runs A/B tests, summarizes results, and flags anomalies. Keep human oversight where judgment matters most—strategy, brand voice, and high-stakes decisions.

    Fourth, instrument rigor from day one with Agent Analytics and eval-driven development. Define offline evals for brand consistency, factuality, safety, and response time; pair them with online experiments that quantify lift on your target outcomes. Set a minimum detectable effect (MDE) so you stop shipping changes that cannot plausibly move the metric.

    Fifth, operationalize your AI workflows. Standardize prompts, inputs, and handoffs; templatize briefs and acceptance criteria; and keep a change log so improvements compound rather than reset. Use short, frequent feedback loops to prune low-impact work and double down on what demonstrably advances your objectives.

    I’ve seen teams reclaim focus and momentum when they treat agents as teammates, not toys. The magic isn’t in producing more assets—it’s in consistently choosing the next best action in service of a clear outcome. When you combine outcome clarity, a driver tree, targeted agents, and tight evals, AI becomes a force multiplier for marketing impact.

    If you’re feeling overwhelmed by AI’s possibilities, start small: commit to one outcome, one driver you believe is material, and one agent designed for that job. Prove lift, codify the workflow, then scale. Velocity is only valuable when it’s pointed in the right direction.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Stop Forcing AI to Prove ROI: A Product Leader’s Playbook to Measure Real Business Value

    Stop Forcing AI to Prove ROI: A Product Leader’s Playbook to Measure Real Business Value

    Every planning cycle, I feel the drumbeat: “Show me the AI ROI—this quarter.” The pressure is real, especially when boards and CFOs expect immediate payback. Yet when I review stalled initiatives across teams and peers, the pattern is consistent: most companies treat AI like a feature to ship, not a system to manage. That mindset almost guarantees we measure the wrong things, declare victory (or failure) too early, and miss the durable value AI can create.

    Here’s the core problem I see: we leap to solution and skip the counterfactual. Without a baseline, a clear control, or a defined “what would have happened otherwise,” we’re guessing. We also fixate on lagging, financial KPIs that move slowly (revenue, cost, risk), then use outputs—not outcomes—as OKRs. If we don’t align on outcomes vs output OKRs upfront, the best team in the world can still optimize for activity over impact.

    My AI Strategy starts from a simple truth: value shows up along three vectors—revenue, cost, and risk—on different timelines. In the near term, we must validate leading indicators (adoption, engagement, activation) that ladder to those vectors through a transparent driver tree. Over time, those drivers compound into the lagging KPIs finance cares about. When we make the driver tree explicit, everyone can see how model precision, response time, and workflow integration roll up to conversion lift, case deflection, time-to-resolution, or reduced exposure.

    To make this rigorous, I run a five-step playbook. First, define the decision and business outcome in plain terms. Second, instrument the baseline with behavioral analytics on a unified analytics platform—tools like Amplitude analytics or Pendo help expose friction points we’ll later target. Third, create a counterfactual using A/B testing and specify a minimum detectable effect (MDE) so we know how long to run and how much traffic we need. Fourth, quantify costs (training, inference, integration, change management) and include AI risk management, privacy-by-design, and data governance up front. Fifth, lock a measurement plan that connects leading indicators to lagging ROI through the driver tree.

    Most AI initiatives don’t fail on model quality—they fail on adoption. If the workflow isn’t smoother, trust isn’t earned, or value isn’t obvious, users revert. That’s why I invest early in onboarding, in-app guides, product tours, and thoughtful tooltip design to reduce the time-to-first-value. Then I watch user activation, retention analysis, and task completion to ensure the assistive experience is not just novel—it’s habit-forming.

    For generative use cases, eval-driven development is non-negotiable. I maintain offline evaluations for accuracy and safety, and online evaluations for business impact. Retrieval-first pipeline health, context window management, and prompt engineering affect reliability; so do latency and grounding quality. We ship behind feature flags, measure guardrail effectiveness, and tighten feedback loops from human-in-the-loop reviews into model updates—continuously.

    On the business side, I avoid “AI theater” by structuring benefits like a CFO. Revenue: increased conversion or expansion driven by better recommendations, faster sales cycles, or higher trial activation. Cost: case deflection, agent time saved, fewer escalations, and lower rework. Risk: reduced exposure via automated checks, anomaly detection, and consistent policy application. If any claim can’t be tied to measured deltas—via A/B testing or strong quasi-experiments—it doesn’t go in the deck.

    Build vs buy deserves the same discipline. I map platform scalability, governance requirements, and total cost of ownership against time-to-impact. Teams often underestimate integration and maintenance drag; a pragmatic mix of bought components with thin custom layers can accelerate outcomes while keeping options open. The goal isn’t to own every layer—it’s to own the learning loop and the differentiated experience.

    I also remind teams that tooling should serve the strategy, not replace it. I’ve seen concise, effective messaging that captures the point: “Increase revenue, cut costs, and reduce risk with Pendo’s Software Experience Management platform. Optimize the entire software experience to drive adoption and improve engagement.” The words are compelling because they reflect the three-vector value model and the adoption imperative. The same standard should apply to any AI initiative we propose.

    If you’re under pressure to prove ROI, shift the conversation: lead with the driver tree, specify your counterfactual, and anchor on leading indicators you can move in weeks—not quarters. Then connect those to the lagging KPIs finance expects over time. When we manage AI like a product—grounded in evidence, experimentation, and user-centered adoption—we don’t have to force ROI. We compound it.


    Inspired by this post on Pendo – Perspectives.


    Book a consult png image
  • Stop Misleading A/B Tests: Master Sample Size Assumptions for Reliable Results

    Stop Misleading A/B Tests: Master Sample Size Assumptions for Reliable Results

    I’ve learned the hard way that sample size calculators can be both empowering and deceptive. They feel wonderfully precise, but they’re only as trustworthy as the assumptions you feed them. When I lead A/B testing at scale, I treat the calculator as a planning tool, not a verdict—then I systematically validate the assumptions behind it so our decisions stay rigorous and our roadmap stays credible.

    At a minimum, most calculators assume you know your baseline rate, your “minimum detectable effect (MDE),” your desired statistical power, and your significance level. They also quietly assume independent observations, clean randomization, stable traffic quality, and a fixed test horizon with no peeking. If any of those break, the “right” sample size can be wildly wrong—and the test conclusions can nudge teams toward the wrong product or go-to-market bet.

    Baseline and variance come first for me. I estimate the baseline conversion (and volatility) from recent behavior using behavioral analytics, sanity-check it across key segments, and look for seasonality. Tools like Amplitude analytics help me spot anomalies, bots, or instrumentation drift. If baseline is unstable or highly skewed, I either stabilize it with longer lookbacks or narrow the target segment to reduce noise.

    Setting the “minimum detectable effect (MDE)” is where product strategy meets statistics. I work backward from an outcome that actually matters: the revenue, retention, or activation uplift that justifies the opportunity cost of building and running the experiment. If that effect size is implausible given historic lift and variance, I rethink the scope or stack changes into a sequenced set of learning experiments rather than overpromising a single moonshot.

    For power and alpha, I default to 80–90% power and a 5% significance level unless the downside risk of a false positive is unusually high, in which case I tighten alpha. I choose one-tailed tests only when we would not act on a negative result and we’ve explicitly pre-registered that decision; otherwise, two-tailed is safer for real-world ambiguity.

    Randomization and independence are where many tests quietly fail. I randomize at the user level (not session or pageview), guard against cross-device contamination, and ensure consistent exposure via feature flags. If there’s shared context—say, team-based usage or geographic clustering—I account for it via cluster randomization or acknowledge the inflated variance it can introduce.

    Traffic allocation integrity is non-negotiable. I monitor for sample ratio mismatch by comparing observed group splits to the intended allocation and immediately pause if they drift. When SRM appears, the root cause is often instrumentation gaps, eligibility filters applied asymmetrically, or caching layers. Fixing that early preserves trust in every test that follows.

    Fixed-horizon math assumes no peeking. If stakeholders need continuous reads, I use sequential testing methods with alpha spending or always-valid approaches designed for ongoing monitoring. If we commit to a fixed horizon, we stay disciplined: no early looks, no midstream metric swaps, no retrofitted hypotheses.

    Multiple comparisons can quietly inflate false positives. I predeclare one primary metric to decide, define guardrail metrics to protect experience and revenue, and apply appropriate corrections (for example, controlling the false discovery rate) when testing many variants or slicing results by numerous segments.

    Duration and seasonality matter more than most roadmaps admit. I run through full business cycles (at least one complete week for daily patterns, longer for B2B buying rhythms), plan for novelty effects, and watch for behavior settling after initial exposure. If the intervention changes long-run behavior, I extend the measurement window or add a post-test holdout to capture durable impact.

    Not all metrics are binomial. For revenue, time-on-task, or heavy-tailed distributions, I confirm variance assumptions, use robust estimators or bootstrapping, and consider variance reduction methods like CUPED to improve power without overextending duration. The calculator’s simplicity should not mask the data’s complexity.

    Finally, I connect experimentation to product outcomes. I map hypotheses to a driver tree, ensure each test ladders to activation, retention, or monetization, and document assumptions up front so we learn even when results are null. The result is a culture that respects the math and moves faster precisely because we trust our reads.

    Here’s the practical checklist I use before pressing “Start”: validate baseline and variance from recent behavior; set an MDE tied to meaningful business impact; choose power and alpha explicitly; confirm user-level randomization and stable exposure; watch for sample ratio mismatch; align on fixed-horizon vs sequential testing; predeclare a single primary metric and guardrails; run long enough to cover seasonality; use robust methods for non-binomial metrics; and write a brief pre-read so the whole team commits to the plan.

    When we honor these assumptions, sample size calculators become sharp instruments rather than blunt ones. You’ll ship fewer misleading wins, avoid costly false negatives, and build a repeatable experimentation engine that compounds learning—and results—over time.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Inside Amplitude’s ML Playbook: Practical Strategies for Smarter A/B Tests and Growth

    Inside Amplitude’s ML Playbook: Practical Strategies for Smarter A/B Tests and Growth

    I’m continually asked how machine learning can make product analytics more actionable. Drawing from Amplitude analytics in real-world settings, I’ve distilled what matters most for product teams that want faster, smarter decisions without sacrificing rigor.

    When I design experiments, I start with minimum detectable effect (MDE) to size samples correctly and avoid costly, inconclusive tests. I pair that with disciplined A/B testing hygiene—clear hypotheses, thoughtful stop rules, and guardrails for key metrics—so results translate into credible product strategy choices instead of noisy dashboards.

    For growth and retention, I map behavioral analytics to activation and long-term value. Driver trees help me connect feature adoption to revenue or retention, and anomaly detection keeps me from overreacting to outliers when seasonality or data quality shift.

    I segment cohorts by user intent and lifecycle stage, measure user activation with crisp event definitions, and monitor leading indicators across a unified analytics platform. This keeps cross-functional conversations grounded, accelerates product-led growth, and reduces the risk of optimizing for vanity metrics.

    Operationally, that means building self-serve views that flag MDE-ready experiments, surface retention analysis by cohort, and trigger anomaly detection alerts only when the signal outpaces noise. The payoff is fewer meetings debating data quality and more time shipping value.

    If you’re leveling up your analytics stack, start by tightening experimentation basics, instrumenting activation and retention with behavioral analytics, and wiring in anomaly detection as a safety net. You won’t just move faster—you’ll learn faster, and with the confidence to bet big when the data earns your trust.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Unlock Confident Decisions with Bayesian Statistics: Smarter A/B Tests from Small Samples

    Unlock Confident Decisions with Bayesian Statistics: Smarter A/B Tests from Small Samples

    Shipping great products is a game of making high‑quality decisions under uncertainty. In my role leading product management, I’ve seen teams stall when classic methods demand huge sample sizes before we can say anything useful. Bayesian statistics has become my go‑to approach for turning sparse data into clear, decision‑ready insights—especially when traffic is limited or experimentation windows are tight.

    Understand Bayesian statistics vs. frequentist methods and learn how Bayesian approaches improve experiment insights with small sample sizes.

    Here’s why I rely on it in A/B testing: frequentist methods focus on p‑values and long‑run error rates, which are tough to translate into action. With a Bayesian lens, I can express outcomes as intuitive probabilities—“Variant B has a 92% chance to outperform A”—and use credible intervals to communicate likely ranges of impact. That clarity reduces decision friction and helps the team move faster with confidence.

    Bayesian methods shine when sample sizes are small and the minimum detectable effect (MDE) of a frequentist test would be impractically large. I incorporate prior knowledge—historical conversion trends, seasonality, and learnings from related experiments—to stabilize noisy early data. Done thoughtfully, priors improve estimate quality without overfitting; I always run sensitivity checks to ensure the posterior is driven by the data we’re observing, not wishful thinking.

    In practice, my workflow is straightforward. I set a prior from historical performance in Amplitude analytics, run the experiment, and update the posterior daily. I track the probability of superiority, expected lift, and a credible interval that the CRO role can rally around. When the probability of a meaningful win crosses a pre‑agreed threshold, we ship. When it doesn’t, we bank the learning and move on—no prolonged debates about p‑values that few stakeholders truly understand.

    This approach also strengthens product discovery. By using behavioral analytics and retention analysis as informative priors, I can evaluate early signals from narrower cohorts—new geographies, niche segments, or enterprise accounts—where traffic is scarce. The result is faster iteration in product‑led growth environments, even when a full‑funnel test would take weeks to reach frequentist significance.

    Operationally, I treat Bayesian experimentation as part of a unified analytics platform strategy. The same posterior machinery that powers A/B testing can support anomaly detection during releases, quantify risk in phased rollouts, and estimate lift from in‑app guides or product tours. Because results are framed in plain language probabilities, cross‑functional teams make better, faster decisions aligned to outcomes rather than outputs.

    A few guardrails keep me honest. I preregister decision rules (stop/go thresholds, guardrail metrics), run prior sensitivity analyses, and document assumptions alongside results. That discipline prevents overconfidence, improves reproducibility, and builds trust with leadership.

    If your experiments are bottlenecked by low traffic or you’re tired of waiting weeks for a binary “significant/not significant,” consider a Bayesian upgrade. You’ll get earlier readouts, clearer stakeholder communication, and a repeatable path to compounding learning—without sacrificing rigor.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Unlocking Impact: What Amplitude’s MCP server and experimentation platform teach product leaders

    Unlocking Impact: What Amplitude’s MCP server and experimentation platform teach product leaders

    In my role leading product management at HighLevel, I study the architectures and operating models behind high-velocity learning. I often reference "Amplitude's MCP server and its experimentation platform" as a benchmark for how to operationalize scale, reliability, and speed of insight across complex product ecosystems. That lens informs how I design processes, data flows, and decision loops that turn ambiguity into measurable outcomes.

    Experimentation is the heartbeat of eval-driven development. In practice, that means running disciplined A/B testing, deploying targeted feature flags to de-risk rollouts, and sizing experiments with a clear minimum detectable effect (MDE) so we avoid vanity wins. When teams internalize these habits, we shift from opinion-led debates to evidence-led decisions—and that’s where product-led growth compounds.

    I'm an AI enthusiast, so I think a lot about how experimentation accelerates AI roadmaps. The same rigor that validates UI changes should govern prompts, retrieval strategies, and policy settings for LLM-backed features. By treating AI behaviors as first-class experiment surfaces—and tying them to user activation, retention analysis, and value proposition metrics—we move faster without compromising safety, privacy-by-design, or customer trust.

    Making this work in production demands clean instrumentation and a unified analytics platform. I look for stacks that combine Amplitude analytics with robust observability and CI/CD to ensure we can ship, measure, and iterate continuously. When platform scalability and data governance are baked in from the start, product trios can focus on product discovery rather than firefighting pipelines or reconciling metrics.

    My playbook is straightforward: define decision-worthy questions, map them to crisp success metrics, run right-sized experiments with feature flags, and use consistent analytics to close the loop. Do this well, and you create a durable advantage—faster learning cycles, sharper product positioning, and a culture that lives by outcomes over output. That’s the real lesson I take from platforms that execute experimentation at scale: process and technology are table stakes; what wins is the discipline to learn relentlessly.


    Inspired by this post on Amplitude – Perspectives.


    Book a consult png image
  • Inside My Product Marketing Playbook: Amplitude Analytics Tactics That Drive PLG Wins

    Inside My Product Marketing Playbook: Amplitude Analytics Tactics That Drive PLG Wins

    I’ve curated a focused set of product marketing insights that zero in on what actually moves the needle—turning data into decisions. You’ll find a special emphasis on Amplitude Analytics, because its behavioral analytics foundation makes it easier to translate product usage into clear messaging, sharper positioning, and measurable growth.

    In my day-to-day as a product leader, I’m constantly bridging the gap between product discovery and go-to-market strategy. The best outcomes come when we connect quantitative signals to narrative: using behavioral analytics to inform the value proposition, refining product positioning with cohort trends, and driving product-led growth with activation and retention insights.

    Here’s how I put this into practice. I start with user activation and retention analysis to identify the few behaviors that predict long-term value. Then I run tightly scoped A/B testing to validate messaging and in-product prompts that nudge those behaviors. When the numbers move, I translate wins into a consistent story—one that sales, success, and marketing can all rally around.

    One pattern keeps repeating: clarity beats complexity. Instead of piling on more features, I focus on the minimum, verifiable set of behaviors that correlate with outcomes. That discipline makes it easier to craft a crisp value proposition, streamline go-to-market strategy, and accelerate feedback loops between product, design, and marketing.

    As you explore this collection, expect practical playbooks over platitudes. You’ll see how to apply Amplitude Analytics to uncover hidden friction, validate hypotheses faster, and operationalize product-led growth motions that compound over time. My goal is to help you move from interesting dashboards to decisive actions that strengthen your roadmap and your revenue.

    If you care about building empowered product teams that learn continuously, you’ll feel at home here. Dive in, borrow what works, and adapt the rest to your context—then measure it, iterate, and share the wins with your team.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Ship Smarter with Amplitude + Lovable: See Behavior, Fix Friction, Iterate Faster

    Ship Smarter with Amplitude + Lovable: See Behavior, Fix Friction, Iterate Faster

    I build products with a simple mantra: launch, learn, repeat. Shipping fast is necessary, but shipping smart is what compounds. To do that, I keep analytics close to the work—inside the builder—so every decision is tied to real user behavior, not assumptions.

    Connect Amplitude MCP to Lovable to understand user behavior, spot frictions, and ship better updates without leaving your builder.

    In practice, this integration lets me bring Amplitude analytics and behavioral analytics directly into the creative flow. I can explore funnels, cohorts, and drop‑offs the moment I’m crafting an experience, then translate those insights into concrete changes without context switching. The result is tighter feedback loops and more confident iteration.

    My typical loop looks like this: identify a friction point from funnel analysis, design two or three variants in the builder, and run A/B testing to validate the improvement. I focus on user activation and retention analysis as leading signals, because sustained engagement is the clearest indicator that we’ve solved a real problem. When the data confirms it, we promote the winning experience and move to the next opportunity.

    Keeping the work inside the builder also supports continuous discovery. I can pair quantitative insights with qualitative observations, refine journey mapping, and document learnings while the context is fresh. That makes prioritization and product discovery more reliable, and it turns each iteration into a teachable moment for the team.

    Strategically, this builder‑first approach enables product-led growth. With fewer handoffs and a unified analytics platform, we compress time from insight to impact. It helps me defend roadmap decisions with evidence, communicate trade‑offs clearly, and keep the team focused on outcomes that matter to customers and the business.

    If your goal is to iterate with speed and precision, bring analytics to where you build. Keep the loop tight, measure what moves the needle, and let the data guide your next best update.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Unlock High-Impact Mobile Engagement: Amplitude Guides & Surveys for iOS, Android, React Native

    Unlock High-Impact Mobile Engagement: Amplitude Guides & Surveys for iOS, Android, React Native

    Mobile engagement is most effective when it’s timely, contextual, and grounded in real user behavior. In my experience leading product teams, the fastest path to activation and retention comes from meeting users in the moment with relevant in-app guides and lightweight surveys that reduce friction and illuminate intent.

    Deploy behavioral-driven mobile engagement with Amplitude Guides and Surveys for iOS, Android, and React Native platforms.

    What excites me about this approach is how naturally it supports product-led growth. In-app guides and product tours streamline onboarding, while targeted micro-surveys surface the “why” behind user actions. The result: clearer journey mapping, fewer blind spots in the funnel, and a smoother path to user activation—all without adding engineering heavy-lift for each iteration.

    To optimize continuously, I pair behavioral analytics with A/B testing and retention analysis. This lets my team validate hypotheses quickly, localize friction by segment or stage, and tune messaging for different cohorts. With Amplitude analytics at the core, we can connect engagement nudges to downstream outcomes, not just clicks—so we’re improving time-to-value, not just surface metrics.

    My recommended starting point is simple: define a single activation moment, instrument the critical behaviors around it, and launch a focused guide plus one survey to test the narrative. Use journey mapping to identify the key decision points, then iterate weekly based on observed behavior, not opinions. This cadence keeps learning velocity high and ensures every change moves us closer to clear outcomes.

    From a leadership perspective, I coach product trios to own an activation or retention KPI, run small controlled experiments, and document learning with crisp before/after evidence. Cross-platform support across iOS, Android, and React Native means we can scale wins quickly, standardize patterns, and create a repeatable playbook for new features and markets—all while keeping the user experience coherent and respectful.


    Inspired by this post on Amplitude – Best Practices.


    Book a consult png image
  • Ship MVPs in Days, Not Months: My Proven Prompt Prototyping Playbook for Product Teams

    Ship MVPs in Days, Not Months: My Proven Prompt Prototyping Playbook for Product Teams

    Most MVPs take too long, cost too much, and still miss the mark. Over the past year, I’ve shifted my team to a prototyping prompts approach that lets us validate problem-solution fit in days, not months. The result is faster learning loops, clearer tradeoffs, and a dramatically higher hit rate on features that actually move the needle.

    When I say prototyping prompts, I mean structured, layered instructions that guide gen ai systems to produce the right artifacts at the right fidelity. Instead of jumping straight to code, we generate concise problem briefs, user stories, interaction flows, low-fidelity UI descriptions, and test plans. Each pass is constrained by acceptance criteria and business outcomes, which keeps the work grounded in value rather than output.

    Here’s the playbook my product trios use to go from idea to a testable MVP in 48–72 hours. First, we anchor on outcomes vs output OKRs and clarify the customer job-to-be-done using evidence from customer interviews and support data. This is classic continuous discovery, but we compress it by focusing on the single riskiest assumption to de-risk this week.

    Second, we build a prompt scaffold. We specify the role, constraints, target users, success metrics, and the exact output format we expect. We also define evaluation upfront, borrowing from eval-driven development. For example, before any generation, we list the acceptance tests that a good solution must pass, including edge cases and compliance considerations. This discipline keeps hallucinations in check and improves repeatability.

    Third, we spin up multiple prototypes in parallel. One prompt generates a lean product brief; another outlines user flows; a third proposes UI states and error handling. If we’re exploring voice, we add prompt engineering for voice to script dialogs and repair strategies. For data-heavy features, we call out retrieval-first pipeline patterns so the model references source-of-truth data rather than guessing.

    Fourth, we validate with real users using the lightest-weight experiment possible. Fake-door tests, concierge workflows, and guided click-throughs let us measure intent before we invest. Where we can, we run quick A/B testing and size the effort using minimum detectable effect (MDE) so we don’t over- or under-sample. The point isn’t perfection; it’s fast, directional signal to inform the next iteration.

    Fifth, we instrument and ship behind feature flags. We track activation, task completion, and time-to-value from day one. On the delivery side, we watch DORA metrics and deployment frequency to ensure we’re learning continuously rather than batching big bets. This bridges discovery and delivery so roadmaps reflect real-world feedback, not assumptions.

    One recent example: we needed to evaluate a voice AI agent for appointment scheduling. In 72 hours, prompts produced the problem brief, dialog flows, error recovery strategies, and a sandbox to simulate inbound requests across three user personas. We exposed a thin slice to a pilot cohort, captured call outcomes, and iterated the repair prompts twice before writing any production code. The pilot converted at a higher rate than our control flow and gave us the confidence to invest in full integration.

    This approach only works if we treat governance as a first-class concern. We bake in privacy-by-design, clear data governance boundaries, and AI risk management from the start. Prompts include guardrails on personally identifiable information, explicit constraints on data use, and links to approved sources. We also maintain a prompt repository with versioning and automated evaluations so changes are observable and reversible.

    Practically, strong prompt scaffolds share three traits. They’re specific about context and constraints, they define success in measurable terms, and they separate concerns by artifact type. I’ll often ask for three variants with different tradeoffs, then run a quick synthesis prompt that highlights points of parity and differentiation. This gives the team structured options rather than a single, brittle path.

    If you’re starting from zero, begin with one high-leverage workflow. Write a crisp outcome statement, draft your acceptance tests, and create a prompt that outputs a one-page brief, three user flows, and the top five risks with mitigations. Validate with five users in 48 hours, then decide: double down, pivot, or park. Rinse and repeat, and your product roadmapping and sprint planning will shift from speculation to evidence.

    The bottom line is simple. Prototyping prompts won’t replace product judgment, but they will accelerate it. By turning ideas into testable artifacts in hours, you minimize waste, maximize learning, and ship better MVPs—fast.


    Inspired by this post on Product School.


    Book a consult png image