← Library

Building a Datacenter Part II

Crucible Capital · Kelly Greer, Meltem Demirors

AI rack densities are climbing from 100kW toward 1MW so fast that Nvidia is forcing the datacenter industry onto 800V DC distribution, with the entire power and cooling stack having to mature simultaneously by 2027.

Rack power, not chip performance, is now the binding constraint on AI scaling, and the legacy AC plant that converts power three or four times between the grid and the GPU can't survive at megawatt densities. Nvidia's answer, announced at the 2025 OCP Summit and targeted at Rubin and Rubin Ultra, is to rectify grid AC once at the site perimeter and distribute 800V DC the rest of the way, borrowing from EV and HVDC practice. The catch is that this only works if solid-state transformers, supercapacitors, liquid cooling, and 800V busways all arrive together on Nvidia's 2027 timeline, dragging the entire supply chain with them.


claim

With Blackwell at 1.35kW per GPU and Rubin headed to 3.6kW, racks are jumping from 10-20kW to 100kW-1MW+, demanding 200kg copper busbars and exacerbating losses. The inefficiencies of AC at this density are now unmanageable.

central 1.00
claim

Nvidia's 2025 OCP Summit announcement formalized 800V DC for Vera Rubin, drawing inspiration from EV 800V batteries and HVDC transmission. Grid AC will be rectified once at the perimeter to 800V DC, distributed via busways, and stepped down at the rack to 54V/12V.

central 1.00
claim

Rubin Ultra's 1 MW peak rack density requires solid-state transformers, supercapacitors, 800 VDC distribution, and liquid cooling to all mature simultaneously for the 2027 revenue ramp. Nvidia is effectively dragging the rest of the supply chain along with it.

central 1.00
claim

Dense AI racks already push hundreds of kW and Rubin-era racks will draw 900 kW, but electrical and cooling backbones cannot scale as fast as GPU power draw. That means rack density — not chip performance — sets the practical ceiling on how far a powered site can go.

central 0.95
mechanism

Modern GPUs and storage inherently operate on DC, but AC plants rectify to DC for UPS batteries, invert back to AC for facility distribution, then rectify to DC again at the rack. Each conversion bleeds power, with typical total losses of 10-20%.

central 0.90

Open

  • · Can solid-state transformers, supercapacitors, and 800V busway hardware actually be productionized in time for the 2027 Rubin Ultra ramp?
  • · How will existing AC-based datacenter sites be retrofitted, or will they be stranded?

Pipeline

source kind
url
generated by
anthropic
candidates
66 (selected 5)
embeddings

Sections

Candidate pool grouped by section. Selected candidates are bolded.

Considered candidates (61)

Below top-k · 61

  • mechanismSolid state transformers collapse multiple conversion stages into onec 0.90

    SSTs use high-frequency switching via SiC or GaN transistors to replace passive magnetic transformers, enabling bidirectional flow, voltage regulation, and fault isolation in 30-50% smaller form factors. They deliver DC directly and allow 150% more power through existing conductors, eliminating ~200kg of copper busbar per rack.

  • claim800V DC lets racks skip rack-level AC-to-DC conversion entirelyc 0.90

    Moving to 800V DC means racks take high-voltage DC directly and use SiC or GaN DC-to-DC converters at over 98% efficiency, eliminating the old rack-level PSU step and its losses.

  • claimNvidia's reference architecture is the de facto standard the rest of the stack adapts toc 0.85

    Nvidia sets the output, kW per rack, and cooling regime of modern compute systems, and site developers, OEMs, and competitors adapt to meet its specs each generation. When Nvidia jumps, everyone else asks how high — including on the new shift to high-voltage DC power.

  • claimAir cooling is physically impossible at modern rack densitiesc 0.85

    Liquids carry far more heat per unit volume and per degree than air, keeping GPUs in the mid-40s to low-50s °C under load. At current power densities, physics simply does not permit air cooling.

  • caveatThe AI power problem is not just "more power"c 0.80

    Framing the AI buildout as a generation-capacity problem solvable by SMRs or solar is naive. It is jointly a more-power, better-power-systems, and better-heat-management problem, and ignoring any of the three misses where the real bottlenecks bind.

  • claimTraditional VRLA UPS can't handle AI's millisecond power spikesc 0.80

    Model training can spike power draw to 3x nominal in a single millisecond, but VRLA-based UPS respond in 10-20ms and only last 200-500 cycles. The legacy UPS architecture is fundamentally mismatched to AI workloads.

  • implication800V architecture mandates 100% liquid cooling at 45°C inletsc 0.80

    The incoming 800V stack requires fully liquid-cooled racks with 45°C inlets, row-based CDUs integrated into the rack, and liquid-cooled busbars. Eliminating PSU fans and conversion losses cuts cooling energy by 20-30%.

  • mechanismOpen Compute Project turns Nvidia's designs into industry-wide reference standardsc 0.75

    Nvidia uses OCP to codify its rack, power, cooling, and networking designs as open reference architectures. OEMs like Vertiv and SuperMicro then build compliant ecosystems around them, so the entire value chain follows the playbook instead of re-inventing the wheel.

  • implicationReference architectures cut deployment friction but deepen Nvidia's entrenchmentc 0.75

    Standardized blueprints give operators validated designs and supply-chain flexibility through multi-vendor compliance. The same standardization also reinforces Nvidia's lock-in as the "Arrakis of compute," a trade-off becoming contested as other chipmakers push more heterogeneous standards.

  • implicationIntegrated stacks accelerate adoption but deepen vendor moatsc 0.75

    The same integration that makes Rubin-era infrastructure deployable also locks customers into the integrators' ecosystems, entrenching the moats of incumbents like SuperMicro and Nvidia.

  • mechanismComputing is exothermic and air cooling tops out around 15-20 kW per rackc 0.70

    GPUs convert nearly all consumed electrical power into heat, and traditional air-cooled designs hit practical limits around 15-20 kW per rack. Liquid cooling raises that ceiling but per-rack heat limits still force operators to spread GPUs across more racks, constraining cluster scale and networking topology.

  • evidencePower and cooling are a third of capex and the majority of opexc 0.70

    Power and cooling systems make up roughly 35% of datacenter capex spend and an even larger majority of operating expenditure, making them the dominant economic lever in datacenter design.

  • contextGPU per-rack power has climbed from tens of kW toward 1 MW per rackc 0.70

    Each Nvidia generation has raised peak power draw per device, with rack configurations moving from tens of kW to hundreds and approaching 1,000 kW (1 MW) for upcoming designs like Rubin. This is forcing structurally stronger racks with new materials, geometries, and thermal integration.

  • contextThe power chain from utility to chip has many discrete stagesc 0.70

    Power enters via high-voltage transmission, gets stepped down by a substation transformer to medium voltage, then again to low voltage near the data hall, before flowing through UPS, PDUs, and finally PSUs and VRMs at the chip. Each stage is a place where loss and complexity accumulate.

  • implication800V DC forces a full reconfiguration of the datacenterc 0.70

    Adopting 800V means centralizing high-voltage rectification at the facility perimeter, distributing DC via busways to rows, and performing local DC-DC step-downs at racks. This touches nearly every component of the power system.

  • mechanismSupercapacitors smooth sub-second spikes that batteries cannotc 0.70

    Supercapacitors discharge in microseconds to milliseconds without chemical reactions, offering 10kW/kg power density and over a million cycles. They stabilize voltage under the 1.35kW+ per-GPU draws that Blackwell and successors require.

  • implicationKilling the rack PSU frees up to 60% more space for computec 0.70

    Redesigning racks without integrated PSUs reclaims up to 60% of rack volume for compute and removes failure points, while 800V busways cut copper use by about 45% via N+N redundancy.

  • caveatImmersion cooling breaks GPUs as fungible collateralc 0.70

    Submerged GPUs are hard to service, complicate warranty and liability with OEMs, and have an illiquid secondary market because value is tied to an integrated system. That weakens their profile as collateral for the leverage that compute deals depend on.

  • implicationTelemetry could settle the GPU useful-life debate empiricallyc 0.70

    GPU longevity is not Michael Burry vs Coreweave — it's a question of how GPUs are actually used. Tools like Aravolta feed real wear-and-tear telemetry to GPU-backed lenders, turning the depreciation debate into something measurable.

  • implicationIntegrated building blocks turn a multi-vendor nightmare into a packaged productc 0.70

    DCBBS-style integrated stacks collapse what would otherwise be a five-vendor coordination problem into a single procurable bundle, accelerating adoption by chip distributors.

  • evidenceElkhorn pilot hit COP 11.8 and dropped PUE from 1.18 to 1.09c 0.65

    At KRAMBU's Newport site, Elkhorn logged a live COP of ~11.8 and cut cooling power from ~170 kW/MW to ~85 kW/MW of IT load, improving facility PUE from 1.18 to 1.09 while eliminating high-GWP refrigerants.

  • claimCapex narrative obscures the coming dominance of opexc 0.65

    The current discourse fixates on the massive upfront capex of AI datacenters, but as deployed capex grows, operational expenditure for monitoring and maintenance takes an increasing share of total cost.

  • context90x rack density increase makes monitoring existentialc 0.65

    A 1 MW B300 cluster costs around $40M in compute alone, and rack density is set to rise 90x over a datacenter's life. Owners and creditors need a single pane of glass across countless hardware vendors to protect that asset base.

  • caveatSolid-state transformers are still new, expensive, and grid-sensitivec 0.65

    SSTs are faster and more flexible than traditional transformers, but they remain immature, costly, and require careful monitoring of their interactions with the grid. They are not a drop-in replacement.

  • implicationAI could claim a double-digit share of new generation capacityc 0.60

    Macro projections suggest AI-driven data centers could absorb a double-digit share of new electricity generation capacity in the coming years, tightly coupling AI compute growth to system-wide rises in power consumption.

  • contextThe report focuses on Nvidia's shift to 800V DC powerc 0.60

    Subsequent sections center on Nvidia's announced move to 800V direct current power systems in upcoming reference architecture and the second-order implications across cooling, rack design, and site infrastructure.

  • mechanismBESS handles longer ramps and unlocks new revenue streamsc 0.60

    Lithium-iron-phosphate BESS provides MWh-scale storage with 95%+ round-trip efficiency and 6,000+ cycles, covering seconds-to-hours ramps. Beyond backup, BESS enables peak shaving, demand response participation, and native DC storage of renewables.

  • caveatSupercapacitors are 10-50x more expensive per kWh than batteriesc 0.60

    Supercaps are not a UPS replacement but an expensive addition to the power stack, costing 10-50x more per kWh than lithium-ion for a 1MW, 15-second backup. They're justified only because they solve filtering, load shaping, and utility-relationship problems batteries can't.

  • contextComputing is fundamentally an exothermic processc 0.60

    The primary physical output of a semiconductor is heat, and despite experimental alternatives like photonic or adiabatic chips, the transistor regime will dominate for the next decade, so the real lever is downstream heat mitigation rather than upstream emission reduction.

  • caveatEvery liquid-cooled deployment still feels custom and fragilec 0.60

    Liquid cooling is already standard for H200/B200/B300, but operators report each install feels bespoke, with chillers, pumps, tanks, and miles of piping where mistakes cost tens of millions. Supply chains for these systems are also strained.

  • evidenceChillers are 15-20% of datacenter capex, and Nvidia is engineering them outc 0.60

    Chillers cost roughly $2M per MW and represent 15-20% of capex. Nvidia's 45°C single-phase D2C design eliminates them entirely, though competitors like AMD still require colder coolant.

  • claimModern DCIM must be real-time, GPU-native, and API-drivenc 0.60

    Legacy datacenter management was built for slower, more homogeneous infrastructure. Modern DCIM needs sub-second telemetry, first-class understanding of GPU workloads, and an Asset API that normalizes data across every vendor and protocol.

  • mechanismMonitoring is only useful if it produces a queue of concrete jobsc 0.60

    Telemetry, manuals, SOPs, and incident history get stitched together so the system can say what is wrong, why, and when to fix it. That feeds the ticketing systems operators already use, replacing alert fatigue with a prioritized maintenance graph that links work done to failure rates and power use.

  • context2027 is effectively tomorrow for multi-billion-dollar OEMsc 0.60

    Rubin Ultra GPUs and systems begin shipping in 2027, which on the design and qualification timescales of large OEMs leaves almost no slack. The deadline reframes what looks like a distant date as an immediate engineering crisis.

  • evidenceSuperMicro is building SSTs, supercaps, and 1.1 MW Kyber racks for Rubin Ultrac 0.60

    Through its Data Center Building Block Solutions (DCBBS) ecosystem, SuperMicro is actively developing solid-state transformers, supercapacitors, and 1.1 MW Kyber racks aligned to Rubin Ultra's launch window.

  • caveatHigh-density power delivery itself introduces fault riskc 0.55

    Supplying 100 kW or more to a single rack requires multiple high-capacity circuits or higher voltage distribution, increasing complexity, cost, and fault risk at the rack and row level — a safety dimension distinct from cooling.

  • mechanismLiquid cooling is coupled heat-exchanger loops with real overheadc 0.55

    Liquid cooling moves entropy from chip junction to ambient air through coupled loops of pumps, heat exchangers, chillers, and towers. That infrastructure itself consumes substantial power and adds significant operational complexity.

  • exampleDirect-to-chip cooling has won for AI workloadsc 0.55

    D2C delivers coolant via cold plates bolted to the hottest components — CPUs, GPUs, and memory — and is the dominant liquid cooling approach for AI racks today, in either two-phase refrigerant or single-phase water-glycol variants.

  • exampleElkhorn's water-vacuum chiller deletes the compressorc 0.55

    Elkhorn's Hydrovaporization architecture runs a sealed refrigeration cycle on pure water under vacuum, using phase change instead of mechanical vapor compression. This removes oil management, high-speed compressors, and synthetic refrigerants from the service burden.

  • exampleEarly liquid cooling failed at seals, pipes, and powerc 0.55

    When liquid cooling first rolled out, fluid seals commonly failed, pipes eroded, and power outages cascaded into system failures. It's a concrete precedent for the kinds of teething problems the next power transition will face.

  • evidenceUS datacenter demand could nearly triple by 2030c 0.50

    Goldman Sachs estimates US datacenter capacity at 70 GW in 2025 and forecasts 122-146 GW of demand by 2030, driven by the post-ChatGPT compute supply-demand inflection.

  • contextToken consumption growth maps directly onto datacenter loadc 0.50

    Exponential growth in token consumption translates straight into demand for parallel GPU clusters, taking datacenter loads from tens of megawatts in the early 2010s to tens of gigawatts by the mid-2020s.

  • claimAC won the grid in the 19th century and that legacy still shapes datacentersc 0.50

    AC beat DC at the 1893 Chicago World's Fair and the 1896 Niagara Falls project because transformable voltages allowed efficient long-distance transmission. That victory locked in AC as the foundation of datacenter power systems from the 1960s onward.

  • contextGallium is becoming a strategic material for next-gen power electronicsc 0.50

    Gallium-based materials enable faster, more efficient chips and power infrastructure as silicon hits its limits, showing up both as a semiconductor and in liquid-metal interconnects. Critical mineral supply is dominated by US adversaries, mirroring the nuclear fuel chokepoints.

  • implicationHigher voltage redesigns the rack itselfc 0.50

    Traditional AC racks run at 415V or 480V and handle 10-50kW with PSUs converting AC to DC at the rack. Higher-voltage DC systems waste less energy converting power and let racks support far higher density, but the electrical and physical design must be rebuilt to handle the voltage safely.

  • evidenceCooling failures already drive 13% of datacenter outagesc 0.50

    The Uptime Institute attributes 13% of 2024 datacenter failures to cooling, with recent incidents at CyrusOne, Azure Western Europe, and others underscoring how brittle the current cooling stack is.

  • exampleImmersion cooling boosts reliability and unlocks waste-heat reusec 0.50

    Dunking chips in fluid gives uniform contact, reduces hotspots and thermal cycling, and delivers waste heat at temperatures usable for district heating. It improves thermal headroom and cuts the energy spent moving air.

  • evidenceSchneider Electric has committed to 800 VDC compatible PSUsc 0.50

    Schneider's public commitment to support 800 VDC via compatible power supply units signals that a top-tier infrastructure vendor is aligning with Nvidia's roadmap.

  • caveatNew infrastructure deployments are historically messyc 0.50

    First-generation rollouts of novel datacenter tech rarely go cleanly, and the 2027 stack is no exception. Optimism about vendor readiness should be tempered by the track record of new deployments.

  • mechanismSSTs depend on digital control, which introduces new failure modesc 0.50

    The flexibility of solid-state transformers comes from digital control loops, but that dependence brings software and cybersecurity risks that traditional iron-and-copper transformers never had.

  • contextCompute demand has scaled from petaFLOPs to trillions of teraFLOPs in a generationc 0.40

    Global compute capacity grew from an estimated 10-50 million MFLOPs in the 1990s PC era to 1-10 trillion TFLOPs today, with datacenter capacity doubling from 24 GW in 2006 to roughly 48 GW by 2022 before ChatGPT accelerated it again.

  • exampleA new wave of GPU marketplaces is emergingc 0.40

    Fluidstack, San Francisco Compute Company, and Andromeda each grew from GPU-hour marketplaces or portfolio compute vehicles into significant neoclouds, opening opportunities in stablecoin payments, billing automation, and settlement financing for the compute economy.

  • contextEarly DC efforts started with Open Compute in 2011c 0.40

    As datacenter electricity use climbed to 1-2% of global consumption, the Open Compute Project introduced 380V-400V DC systems that reduced conversions to a single grid-to-DC step and improved efficiency by 5-10%.

  • evidenceSST market is small but scaling quicklyc 0.40

    The global SST market is $115M in 2025 and projected to reach $375M by 2033 at 16% CAGR, with Amperesand and SolarEdge/Infineon piloting MW-scale prototypes for 800V AI infrastructure.

  • evidenceBESS is the fastest way to add grid capacityc 0.40

    The EIA projects a record 18 GW of battery storage was installed in 2025, far quicker to build than solar (months to years) or fossil fuels (years).

  • exampleShatterdome optimizes battery dispatch for both arbitrage and uptimec 0.40

    Shatterdome uses physics-based network modeling and AI capacity trading to push battery returns from $25-30/kW/year Texas rates toward $150-200/kW/year. The same optimization lets batteries buffer gas turbines and cover ramp shortfalls so datacenters get firm, responsive power.

  • exampleRear-door heat exchangers fit retrofits but cap out on densityc 0.40

    RDHx replaces a rack's rear door with a liquid-cooled exchanger that cools server exhaust, avoiding direct liquid contact with IT gear. It suits gradual upgrades of legacy air-cooled sites but doesn't scale to extreme densities.

  • exampleThe Boston Computer Exchange as a template for today's GPU marketplacesc 0.30

    BoCoEx launched in 1982 as the world's first e-commerce company, running a bulletin-board marketplace for used computers with escrow, a price index, and ~150 worldwide affiliates. The authors see it as a seminal case study for the resurgent wave of GPU-hour marketplaces.

  • contextHyperscale substations alone cost $500k to $50Mc 0.30

    Hyperscale sites sit near high-voltage transmission lines and require on-site substations with transformers and switchgear, either self-built or sourced from the utility. These substations range from $500k to $50M depending on size and voltage class.

  • contextBackup generators and ATS bridge outages until UPS hands offc 0.30

    A diesel generator sits alongside the second transformer and an Auto Transfer Switch flips power over during outages, while the UPS battery covers the 5-10 minute gap until the generator comes online.

  • exampleSupra recovers gallium from mining waste using supramolecular receptorsc 0.30

    Supra embeds ion-specific supramolecular receptors into porous polymers to refine gallium and scandium from tailings that solvent extraction and ion exchange can't economically handle. The modular process targets a potential multibillion-dollar strategic reserve of critical minerals in the Western hemisphere.

Janitor

Non-content spans (acknowledgements, references, footnotes, headers, boilerplate) are dropped before the decomposition runs.

total spans
122
kept
115
dropped
7
  • content · 115
  • noise · 4
  • boilerplate · 2
  • metadata · 1