DeepSeek's 10 trillion USD grand strategy

X · GDP (@bookwormengr) · 2026-05-22

DeepSeek is engineering its models around extreme KV cache compression and hardware-portable kernels so Chinese chipmakers can route around HBM and CUDA, with the payoff being a $10T domestic AI hardware ecosystem and a $1T valuation for DeepSeek itself.

The strategy is industrial policy disguised as model research. By cutting KV cache to the point where 1M-token context fits in 5.48GB of HBM — roughly a tenth of what GLM5 or Qwen3 need — DeepSeek makes long-horizon agents cheap to serve and makes the compressed cache small enough to park on SSDs, sidestepping the one component Chinese fabs cannot yet produce well. TileLang then breaks the CUDA lock-in by letting a single kernel target multiple backends, which helps domestic silicon and incidentally helps AMD too. Coding subscriptions and apps are not the point; enabling a sovereign hardware stack is.

claim

DeepSeek is not chasing near-term revenue from coding subscriptions, multimodal products, or applications. Its long game is to enable a 10T USD Chinese AI hardware ecosystem and ride that to a 1T USD valuation for itself.

central 1.00 · novel 1.00

claim

Slashing KV cache without quality loss is what makes long-horizon agents economically viable and unlocks the next wave of use cases. It is also what lets DeepSeek price cached hits at under 3% of Sonnet 4.6's rate and hold cache for hours.

central 0.85 · novel 0.27

evidence

On a 1M-token context benchmark, DeepSeek V4 needs only 5.48GB of HBM versus 60GB for GLM5 and 89GB for Qwen3-235B-A22B — and DeepSeek is the largest of the three at 1.6T parameters.

central 0.80 · novel 0.24

mechanism

TileLang lets a kernel be written once and run across multiple hardware backends, freeing Chinese chip makers from CUDA lock-in and also opening the door for AMD and other Western challengers. Other Chinese labs are expected to join.

central 0.75 · novel 0.29

mechanism

Because the compressed cache is small, it can be parked on SSDs and reloaded cheaply, avoiding recomputation and sidestepping HBM — the hardest memory for the Chinese AI hardware industry to produce.

central 0.80 · novel 0.20

Open

· Will other Chinese labs actually adopt TileLang and converge on a shared non-CUDA stack?
· Can Chinese chipmakers deliver hardware that exploits these software advantages at scale?
· Does SSD-offloaded KV cache hold up in production latency terms, not just benchmarks?

Pipeline

source kind: url
generated by: anthropic+voyage
candidates: 25 (selected 5)
embeddings: voyage-3.5

Coverage

100% covered

Each block is one paragraph of the source. Darker means the decomposition captures it well; lighter means it was left out — the part of the document the summary doesn’t cover.

Considered candidates (20)

Below top-k · 16

implicationNAND offload creates a market for YMTC and SSD vendorsc 0.70
By making SSD-based KV offload the standard pattern, DeepSeek hands a large new market to Chinese 3D NAND players like YMTC, and to SSD vendors globally.
mechanismEngram trades cheap memory for expensive computec 0.70
Engram replaces parts of transformer computation with O(1) hash-based N-gram lookups into a large embeddings table held in LPDDR. A memory lookup is dramatically cheaper per bit retrieved than a forward pass, so the trade scales favorably.
claimAGI-for-everyone and capitalist self-interest are the same planc 0.70
Liang Wenfeng can pursue "AGI for everyone" and make enormous money at the same time by enabling an alternative hardware stack rather than fighting for application-layer margins. The two goals are not in tension; they are the same strategy.
caveatChinese GPUs will keep lagging on raw FLOPsc 0.65
Without EUV and with weaker packaging, Chinese GPUs and ASICs will trail Western chips in transistor density and raw FLOPs for the foreseeable future. Memory-for-compute trades are worth making precisely because the FLOPs gap is structural.
implicationCheaper compute unlocks large-scale RL and recursive self-improvementc 0.65
More hardware options plus lower compute demand per token mean DeepSeek can afford the trillion-token trajectory generation needed for serious RL post-training and for automated AI research (RSI) — both of which are prerequisites to AGI work.
evidenceCXMT is close enough on LPDDR to matterc 0.60
CXMT trails leading LPDDR makers by only half a generation on speed and one on density, so abundant Chinese LPDDR is realistically near. That makes the weight-streaming scheme a credible escape route, not a hypothetical.
exampleOpenAI's warrants with AMD and Cerebras as the templatec 0.60
OpenAI received warrants to buy AMD and Cerebras stock at low strike prices tied to consumption milestones, aligning OpenAI with the success of those chip vendors. This is the deal structure DeepSeek can replicate.
contextDeepSeek's conspicuous absences look like strategic choices, not gapsc 0.55
DeepSeek has no coding plans, no multimodal/audio/video models, and only recently started building a harness, while remaining committed to open source. These omissions are puzzling for a lab about to raise 10B USD unless the strategy is elsewhere.
evidenceDeepSeek's tricks are already the industry defaultc 0.55
MoE, MLA, and DSA from DeepSeek have been adopted by labs around the world; GLM uses MLA and DSA, and Moonshot's Kimi openly bases its architecture on DeepSeek's. In return DeepSeek adopted Kimi's Muon optimizer.
contextA pattern of going against the grain on architecturec 0.50
While others built dense models, DeepSeek committed early to hard-to-train Mixture-of-Expert architectures and worked from first principles on training algorithms. This contrarian streak is the through-line for understanding their later moves.
evidencemHC delivers large benchmark gains at near-zero FLOP costc 0.50
At 27B parameters, mHC adds only 6.7% wall-clock training overhead but lifts BIG-Bench Hard by 7.2 points, DROP by 3.2, GSM8K by 2.8 and MMLU by 1.4. It buys real intelligence-per-parameter almost for free.
mechanismZero-bubble pipelines and Wide Expert Parallel servingc 0.45
DeepSeek perfected zero-bubble pipelines and published an Expert Load Balancer with a Wide Expert Parallel strategy, so MoE models can be served economically at large batch sizes on constrained GPU fleets.
mechanismmHC stabilizes training at scale via doubly-stochastic mixingc 0.45
Manifold-Constrained Hyper-Connections expand the residual stream into parallel highways and constrain mixing matrices to be doubly stochastic, preserving signal magnitude at depth. This tames the 3000× amplification that collapsed prior Hyper-Connections at 27B.
mechanismReplacing PPO with GRPO to cut RL costc 0.35
DeepSeek invented GRPO as a cheaper alternative to PPO for reinforcement learning, lowering the cost of post-training at scale.
mechanismRLVR as the route to reasoningc 0.30
DeepSeek identified Reinforcement Learning from Verified Rewards as a central technique for improving model reasoning ability.
mechanismMulti-Token Prediction densifies the training signalc 0.30
Their Multi-Token Prediction approach to speculative decoding doubles as a way to make each training step carry more learning signal.

Redundant with selected · 4

implicationEquity-for-anchor-tenant deals with Chinese hardware vendorsc 0.85 · sim 0.87
DeepSeek is forecast to take equity stakes in Chinese memory, ASIC, CPU and networking firms in exchange for making their stacks viable for frontier AI workloads. The combined Western AI stack is worth well over 10T USD, leaving room for DeepSeek to mint a 1T USD valuation by midwifing the Chinese counterpart.
overlapped with: DeepSeek's real prize is a Chinese AI hardware ecosystem, not coding plans
mechanismStreaming weights from LPDDR to relieve HBMc 0.70 · sim 0.86
MoE architectures with many experts and 4-bit weights make it natural to hold weights in LPDDR and stream them just-in-time into HBM. Combined with compressed KV cache, this drops HBM demand sharply.
overlapped with: Offloading KV cache to NAND/SSD instead of HBM
mechanismDSA keeps long-context compute roughly flatc 0.50 · sim 0.83
DeepSeek Sparse Attention, introduced in V3.2 Exp, prevents per-step compute from scaling with context length and reduces HBM bandwidth pressure, with processing time staying flat as context grows.
overlapped with: DeepSeek V4 needs 5.48GB of HBM for 1M context
implicationWestern open source and new entrants also winc 0.50 · sim 0.84
Lower resource demands and open-source releases of these techniques don't only help Chinese hardware — they also make life easier for Western open-source labs and for emerging GPU, ASIC and networking-chip makers outside the incumbent duopoly.
overlapped with: TileLang as an end-run around the CUDA moat

Janitor

Non-content spans (acknowledgements, references, footnotes, headers, boilerplate) are dropped before the decomposition runs.

total spans: 79
kept: 77
dropped: 2
outliers: 4

content · 77
acknowledgements · 1
metadata · 1