Goal-Setting for AI Products: How I Plan, Prioritize, and Confidently Ship in a Nonlinear GenAI World

Futuristic workspace with a glowing digital tree on a central monitor, linked by neon lines to icons for AI, cloud, security, charts, and neural networks, illustrating a data-driven knowledge graph and innovation hub.

I build and ship AI products in an environment where the frontier changes weekly, so my planning system has to be adaptive, evidence-driven, and unapologetically outcome-focused. In this piece, I share the frameworks I use to set goals for generative AI, balance research with product execution, and scale responsibly — drawing sharp lessons from one of the most influential applied AI companies operating today.

Consider Runway, an applied AI research company shaping the next era of art, entertainment, and human creativity. Runway has raised $237m and was one of Time Magazine’s “100 most influential companies” in 2023. Runway has been a persistent viral sensation in recent years, and is behind many of the most famous AI demos online.

The earliest stages of an AI company often begin with research breakthroughs, scrappy prototypes, and clever distribution. In practice, that means leveraging containerization (https://aws.amazon.com/what-is/containerization/) and Docker (https://www.docker.com/) to package models reproducibly, showcasing work where practitioners already gather — Hugging Face (https://huggingface.co/), Hugging Face Spaces (https://huggingface.co/spaces), and Hugging Face Model Hub (https://huggingface.co/docs/hub/models-the-hub) — and tapping infrastructure like Replicate (https://replicate.com/) to get demos into people’s hands. Early, magical use cases — like the Green screen tool by Runway (https://runwayml.com/green-screen/) — teach us which problems are both technically feasible and viscerally valuable.

I’ve learned to be cautious about “The limitations of being “customer-driven” when building in AI”. Traditional product discovery assumes needs are legible and solutions are relatively deterministic. In generative AI, user desire often follows model capability, not the other way around. The job is to triangulate: run tight user loops to validate perceived value, instrument objective model quality, and explore novel interaction patterns that customers can’t yet articulate. I treat this as a portfolio of discovery bets — some customer-led, some capability-led, all evaluated against clear outcome thresholds.

Balancing research development with product development requires organizational design that prevents context-switching tax while preserving velocity. I pair research pods with product pods, supported by forward deployed engineers and domain PMs who translate evaluation metrics into user-visible milestones. Safety and content moderation sit on the critical path, not as afterthoughts — think policy definition, classifier tooling, abuse red teaming, and clear escalation playbooks. This balance is how you move from a great demo to a dependable product without losing momentum.

Goal-setting amidst constant change in AI starts with outcomes vs output OKRs. I write OKRs in terms of user impact and model performance thresholds — for example, target ranges for latency, quality scores against a golden dataset, or creator retention — then let teams choose the highest-leverage outputs (data pipelines, fine-tuning, UX improvements) to get there. Why I don’t plan very far ahead: I treat the annual view as a vision and bet map, the quarterly view as a constrained slate of outcomes, and the 6–8 week cycle as the execution heartbeat. AI roadmaps are hypotheses; evaluation harnesses and launch gates are the truth.

Community is a force multiplier. Forming a vocal community and fostering community requires real access and real listening: early release cohorts, office hours, and transparent changelogs. How they picked users for early release matters — diversity of use cases, sophistication of workflows, and willingness to give crisp feedback. Expanding past the first 100 users of Gen-2 demands readiness: evaluation parity across modalities, scalable infra, and safety coverage. Done well, this motion compounds learning while building authentic advocacy.

For founders, my advice echoes the core lessons above. Start with a narrow, high-intent wedge and prove durable value fast; let founder-led GTM compress the feedback loop; instrument everything from day one; and resist the urge to over-plan features before you’ve nailed outcomes. Product-market fit lessons in AI often arrive via small, fast experiments — not grand, long-range plans. Ship thin slices that demonstrate unmistakable value, then iterate toward a system, not a single feature. When in doubt, shorten the loop and improve the evaluation harness.

People often ask: Will AI replace video editors? My view is that AI will replace zero editors who master these tools — and many who don’t. The winners blend taste, storytelling, and generative leverage. The products we build should honor this reality: design for control, iteration, and co-creation, not just automation.

If you’re mapping the progression of tech and use-cases, a few public references are instructive: Runway Gen-1 (https://research.runwayml.com/gen1) and Runway Gen-2 (https://research.runwayml.com/gen2) show how capability unlocks new workflows and demand. Runway’s 30 AI Magic Tools (https://runwayml.com/ai-magic-tools/) illustrates portfolio thinking — a suite of composable powers rather than a monolith.

For builders focused on gen ai for product prototyping through production: keep your demo muscle strong, your evaluation stronger, and your outcomes strongest. Invest in community, treat safety as a feature, and let your OKRs steer what ships — not the other way around.


Book a consult png image

What is the main focus of the author's planning system for AI products?

The author emphasizes adaptive, evidence-driven, and outcome-focused planning in a world where the frontier changes weekly. They share frameworks to set goals for generative AI, balance research with product execution, and scale responsibly.

How does the author balance research and product development?

They pair research pods with product pods, supported by forward deployed engineers and domain PMs who translate evaluation metrics into milestones. Safety and content moderation sit on the critical path with policy tooling, red teaming, and clear escalation playbooks.

What is the concept of outcomes vs output OKRs?

The author writes OKRs in terms of user impact and model performance thresholds, such as latency and quality scores. They view the annual horizon as a vision and the quarterly horizon as a constrained slate of outcomes, with a 6–8 week execution heartbeat; AI roadmaps are hypotheses evaluated by evaluation harnesses and launch gates.

What role does community play?

Community is a force multiplier that requires real access and listening, such as early release cohorts, office hours, and transparent changelogs. Selecting initial users with diverse use cases and feedback matters, and expanding beyond the first 100 users requires readiness across modalities, scalable infra, and safety coverage.

What advice does the author give to founders?

Start with a narrow, high-intent wedge and prove durable value fast. Let founder-led GTM compress the feedback loop and instrument everything from day one. Avoid over-planning features before outcomes are nailed.

Will AI replace video editors?

The author believes AI will replace zero editors who master these tools, and many who don’t. Winners will blend taste, storytelling, and generative leverage.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Signup for Weekly Digest Emails

Categories

Archieve