Reserved and Spot GPU Mix Strategy

The cheapest GPU strategy is almost never all of one pricing model. Commit to everything with reserved capacity and you pay for idle GPUs whenever demand dips below your floor. Run everything on spot and you save aggressively but expose critical work to interruption and capacity gaps. The optimal answer is a deliberate blend that layers reserved, on-demand, and spot capacity against the actual shape of your demand. This is portfolio thinking applied to compute, and it consistently beats any single-model approach.

The Three Capacity Types

Each pricing model trades cost against commitment and reliability differently. Understanding the tradeoffs is the prerequisite for blending them well.

Capacity type	Cost	Reliability	Best for
Reserved or committed	Lowest for steady use	Guaranteed	Predictable baseload
On-demand	Highest per hour	Guaranteed, flexible	Unpredictable bursts, deadlines
Spot or preemptible	Lowest per hour	Interruptible	Fault-tolerant, restartable work

Map Your Demand Curve First

Blending well starts with knowing your demand profile. Plot GPU usage over a representative period and you will usually see a stable floor that is almost always in use, a variable middle that flexes with activity, and occasional spikes. Those three bands map cleanly onto the three capacity types.

The floor is your baseload, the capacity you use nearly all the time. This is what reserved or committed-use discounts are made for.
The middle is flexible, fault-tolerant work that can ride interruptible spot capacity for deep savings.
The spikes are unpredictable or deadline-bound bursts where on-demand reliability is worth the premium.

Building the Layered Model

With the demand curve mapped, the blend follows naturally. The art is in sizing each layer conservatively so you capture savings without overcommitting. The instinct to maximize the reserved layer for its low rate is exactly the instinct to resist, because every reserved GPU you do not use is a sunk cost that quietly erodes the discount you committed to capture.

Size the reserved layer to your reliable floor, not your average and certainly not your peak. Commit only to capacity you are confident you will use almost continuously, because an unused reservation is a sunk cost.
Push fault-tolerant work onto spot. Training with checkpointing, batch inference, and preprocessing belong here. The savings on this layer are typically the largest.
Reserve on-demand for the unpredictable and the urgent. Use it as the flexible buffer that absorbs spikes and guarantees deadline-critical runs that cannot tolerate preemption.
Let spot and on-demand trade off dynamically. When spot is plentiful, lean on it. When spot capacity is scarce or a deadline looms, fall back to on-demand.

Sizing the Commitment Carefully

The biggest risk in this strategy is over-committing the reserved layer. Reservations lock in savings only if you actually use the capacity. A few guardrails keep the commitment safe:

Commit below your floor, not at it. Leave a margin so a quiet week does not turn your discount into waste.
Prefer shorter or more flexible commitments when your demand is still evolving, accepting a smaller discount for less lock-in risk.
Ladder commitments by staggering their start and end so you can adjust the floor as workloads change rather than facing one large renewal cliff.
Revisit the floor regularly, because baseload grows and shrinks as projects come and go.

Making Spot Reliable Enough to Lean On

The blend only delivers if the spot layer is genuinely usable, which depends on resilience engineering. Checkpoint training jobs so interruptions cost little. Diversify spot requests across instance types and regions so you are not preempted everywhere at once. And design an automatic fallback to on-demand when spot capacity dries up, so a capacity gap degrades gracefully instead of stalling the pipeline. With those pieces in place, you can route a large share of work through the cheapest tier with confidence.

A Worked Example of the Layers

To make the blend concrete, picture a team whose GPU usage rarely drops below a steady floor, flexes through a wide middle band during active project work, and occasionally spikes for big experiments. The layered model maps onto that shape directly. The reserved layer covers the floor at the lowest rate. The spot layer absorbs most of the flexible middle for fault-tolerant training and batch work. The on-demand layer sits ready to catch spikes and guarantee any run that cannot tolerate interruption.

Demand band	Capacity layer	Why
Always-on floor	Reserved	Lowest rate for guaranteed steady use
Flexible middle	Spot	Deep discount for restartable work
Occasional spikes	On-demand	Guaranteed, no commitment, deadline-safe

The exact proportions differ for every team, which is the whole point: you size each layer to your own demand curve rather than copying a generic ratio.

Routing Workloads to the Right Layer

The blend only delivers if individual jobs land on the right capacity type, which is a scheduling and policy decision as much as a purchasing one. A useful default is to classify each workload by its tolerance for interruption and its urgency, then route accordingly.

Interruptible and not urgent goes to spot first, falling back to on-demand only when spot is scarce.
Steady and continuous runs on the reserved floor you already pay for.
Urgent or non-restartable goes to on-demand, where guaranteed availability is worth the premium.
Latency-sensitive production inference generally avoids spot entirely, living on reserved or on-demand for reliability.

Review and Rebalance

A capacity blend is not set-and-forget. Demand shifts, spot prices and availability fluctuate, and new hardware changes the price-performance picture. Schedule a recurring review that checks reserved utilization, spot interruption rates, and the share of spend in each layer, then rebalance. Catch an under-used reservation early and you can let it lapse rather than renew. Notice spot serving most of your bursts reliably and you can shrink the on-demand buffer.

Done well, a reserved-plus-spot blend gives you the best of every model: the rock-bottom rate of commitment for the work you always run, the deep discounts of interruptible capacity for the work that can tolerate it, and the guaranteed flexibility of on-demand for the moments that demand it. Match the layers to your real demand curve, size the commitment with discipline, and rebalance on a schedule, and your GPU bill will track your actual needs far more tightly than any single pricing model could.

Blending Reserved and Spot Capacity for Maximum GPU Savings