Avoiding Death on the Yellow Brick Road

a16z · Joe Schmidt IV

The application layer splits into a 'Yellow Brick Road' the labs will walk and a 'rest of Oz' where defensible value comes from owning the system of work in a specific vertical, with model capability as a fungible input underneath.

The stakes are which application companies survive as the labs expand: anyone building a thin tool on top of a system the customer already runs is in the labs' path, while anyone building the system itself — owning data capture, governance, workflow, and the record of what was done in a specific industry — is not. The labs' structural constraint is that they have to serve everyone, so depth on one customer set, with its edge cases and regulations, is the durable wedge. The practical test is simple: if a lab shipped a direct competitor tomorrow, would the customer still need you?

claim

The labs really are coming for a huge swath of the application surface, but "the application layer" isn't monolithic. The right framing is whether you're on the Yellow Brick Road the labs are walking, or somewhere else in Oz.

central 1.00 · novel 1.00

claim

Outside the Yellow Brick Road sit complex, often vertical problems where value comes from the scaffolding that makes output trustworthy, compliant, and operational inside a specific industry — not from the underlying model's raw capability.

central 0.95 · novel 0.21

claim

Every off-road advantage comes back to focus on one customer set — its workflows, edge cases, regulations. The labs have to be everywhere for everyone, which is exactly how they built the Yellow Brick Road. You can be everywhere at once, or great at one thing. Not both.

central 0.90 · novel 0.21

claim

Ask whether you're building a system the customer runs their work through, or a tool that sits on top of a system they already have. Systems own data capture, governance, and the record of what was done; tools just add intelligence on top. If the customer would still need you after a lab ships a competitor, you're a system. If not, you're a tool — even at high ACV.

central 0.85 · novel 0.26

implication

Rest-of-Oz winners own the system of work — the surface where the company's work actually executes and the data flows from it. As new models ship, the company becomes the integration layer that delivers them to the customer. The model is fungible underneath; the system of work is not.

central 0.90 · novel 0.17

Open

· What happens to system-of-work incumbents when labs partner with or acquire vertical players rather than build directly?
· How defensible is vertical scaffolding once models can generate compliance and integration logic on demand?

Pipeline

source kind: url
generated by: anthropic+voyage
candidates: 27 (selected 5)
embeddings: voyage-3.5

Coverage

100% covered

Each block is one paragraph of the source. Darker means the decomposition captures it well; lighter means it was left out — the part of the document the summary doesn’t cover.

Considered candidates (22)

Below top-k · 21

mechanismThe Yellow Brick Road is where raw model capability is the productc 0.80
Code generation, writing, and image creation improve directly with raw model capability, so every dollar the labs spend on pre- and post-training improves product quality. That makes the labs structurally best-suited for these problems.
claimOff-road businesses are vertical by defaultc 0.80
The companies with a real path forward build agentic experiences woven through complex tools, automations, and integrations — multi-step, multi-player work with sub-agents that horizontal platforms can't reach. This pushes them to be vertical by default.
mechanismTwo stacked data flywheels live inside customer workflowsc 0.80
Unwritten industry norms and tribal knowledge aren't on the public web and can't be substituted for by training compute. Vertical app companies stack an across-customer flywheel (patterns across variants) on top of a within-customer flywheel (the why behind specific decisions and exceptions).
claimOff-road performance is judged against the customer's P&L, not benchmarksc 0.80
Customers don't care about SWE-Bench or MMLU scores — they care whether the agent closed the deal, redlined the contract correctly, or bound the right policy. The best agent businesses execute like hedge funds, winning on alpha measured in customer P&L.
caveatWhy the obvious playbook is the most dangerous onec 0.75
The tempting startup move — take a strong model, plug in standard connectors like Drive/Slack/Salesforce, and ship an agentic orchestration layer — is exactly what the labs are doing with Cowork and Codex. They own the model, the architectural choices, the margins, the distribution, and the brand halo.
mechanismOff-road companies can route across the entire model marketc 0.75
The labs route internally between their own models, but they can't pick a competitor's model for a sub-task or use an open-source fine-tune where it's best. A rest-of-Oz company picks the right model from the whole market and absorbs the painful migration work on every upgrade.
mechanismBecoming the governance control plane is a vertical-only moatc 0.75
There's enormous value in being the control plane for permissions, auditing, what the agent can do, and what it actually did — built from use-case-specific guardrails that look completely different across industries. Horizontal players can't credibly absorb HIPAA, FRCP, FINRA, and state insurance regulation simultaneously.
implicationThe day-one workflow is not the moat — the production loop isc 0.75
Day one, the system automates manual work. Over time, every escalation, exception, and human correction feeds back, and the workflow becomes the carrier's operating memory. That understanding only comes from running the workflow in production many thousands of times.
evidenceOpenAI and Anthropic's forward-deployed JVs admit a generic coworker isn't enoughc 0.70
Both labs have announced massive forward-deployed joint ventures to build whole companies that configure and customize their models for enterprises. You don't pour billions into that if you believe the next model release solves the problem.
mechanismPattern recognition compounds even when customer data can't cross customersc 0.70
A company that has run agents through a hundred legal redlines or ten thousand SDR campaigns has internalized the shape of the problem in a way a fresh entrant can't replicate. Eval sets, labeled outputs, and edge-case taxonomies compound into a vertical-specific data flywheel.
mechanismCost discipline comes from knowing which sub-task needs which modelc 0.70
Running everything through a frontier model is the fastest path to negative gross margins. Off-road companies route across tiers — frontier for the hardest tasks, mid-tier for the bulk, fine-tuned small models where they've earned the right — pricing the lowest dollar cost for the specific intelligence the workflow actually requires.
implicationHalf of any real workflow is non-agentic and carries no lab advantagec 0.70
Roughly half of a real workflow is deterministic software the labs are no better at writing than you are. The agentic half still requires tuning, training, and constraining models against the specific result you want, with domain knowledge fed in at the right moment.
exampleIn insurance, the intelligence lives in the workflow, not the modelc 0.70
Two carriers run submissions through the same nominal path, but everything that distinguishes them — which risks escalate, which loss signals matter, which appetite rule wins — lives across SOPs, manager reviews, and underwriting philosophy. None of it sits in a clean rules engine a model can read.
claimGuardrails aren't a constraint — they're the productc 0.65
Guardrails are severely underestimated. A regulated financial customer demands different guarantees than a mid-market SaaS one, and those guarantees roll down into what the agent can write, who it can contact, what it can say on a call, and how every decision gets logged. That work sits squarely with the application company.
mechanismAgentic workflows split repeatability, variability, and judgmentc 0.65
Pure agents reasoning from scratch break, and rigid workflows break the moment reality gets messy. Agentic workflows split the load: the workflow provides repeatability and auditability, the agent handles variability, and the human stays in the loop where accountability matters.
mechanismUX is what makes vertical learning possible — and horizontal tools can't shape itc 0.60
Capturing tribal knowledge depends entirely on the workflow surfaces you give the user, and vertical players can shape those surfaces around exactly what the workflow needs. Horizontal tools structurally can't.
example11x starts from a specific outcome and decomposes from therec 0.60
11x began from a concrete customer outcome — generate more pipeline — and worked backward: which activities drive it, which tasks are agentic, which require domain insight. Half of a real workflow turns out to be deterministic software the labs hold no edge on.
mechanismThe ability to evolve workflows is itself the moatc 0.60
Domain skills go stale constantly — AI-written emails detectable today weren't yesterday, and the bar moves every few months. The compounding ability to evolve workflows and context to match market dynamics is where the moat actually builds.
claimThe tools-and-steps testc 0.60
Count the steps and the complexity of the tools the work requires. A one-step horizontal search across Drive with a forgiving outcome is nothing like a multi-step legal redline against three years of firm precedent that has to clear partner review. Only one of those requires the deep software a focused team takes years to build.
example"Don't email existing customers" is secretly a hard engineering problemc 0.50
A rule as trivial-sounding as "don't reach out to a contact at a current customer" collapses against subsidiaries, stale CRM fields, and parent-vs-subsidiary domains. Real-world data is messy, humans struggle with it, and models don't magically clear the bar — purpose-built agents do.
contextFounders keep asking whether the labs will eat the entire AI app layerc 0.40
Founders and prospective employees keep asking the same question: is there any AI application layer left to build, or will OpenAI and Anthropic kill everything downstream?

Redundant with selected · 1

implicationThe next generation of enterprise software gets built off the roadc 0.70 · sim 0.84
Both the labs and rest-of-Oz companies will produce massive winners, but the next generation of enterprise software is going to be built off the Yellow Brick Road — by companies that own the system of work in a specific vertical or function.
overlapped with: In the rest of Oz, value comes from scaffolding, not raw capability

Janitor

Non-content spans (acknowledgements, references, footnotes, headers, boilerplate) are dropped before the decomposition runs.

total spans: 51
kept: 47
dropped: 4
outliers: 3

content · 47
noise · 2
header_footer · 1
boilerplate · 1