Spot vs on-demand vs reserved GPUs
A practical guide to spot, on-demand, and reserved GPU pricing - the discounts, the risks, and how to match each mode to your workload.
The same GPU can cost you three very different amounts depending on how you buy it. On-demand, spot, and reserved are not just price tiers - they are different bargains about flexibility, risk, and commitment. Picking the right one for each workload is one of the highest-leverage cost decisions you can make on the cloud, often cutting a GPU bill by half or more without touching your code. Here is how the three modes work and when each wins.
On-demand: maximum flexibility, maximum hourly cost
On-demand is the default. You launch an instance, you pay by the second or hour, and you stop it whenever you like. There is no commitment and, crucially, the provider will not pull the instance out from under you. That guarantee is exactly what you pay a premium for.
On-demand is the right call when:
- You are developing, debugging, or running a short experiment.
- Your job cannot tolerate interruption and you cannot checkpoint it cleanly.
- You need a GPU right now for an unpredictable, one-off task.
The trap is leaving on-demand instances running for steady, predictable workloads - that is where you overpay the most.
Spot: deep discounts in exchange for interruption risk
Spot (also called interruptible, preemptible, or community capacity) is spare GPU inventory sold cheap. Discounts of 40-70% off on-demand are common, sometimes more. The catch: the provider can reclaim the instance with little or no warning when it needs the capacity back or the market price moves.
Spot shines for fault-tolerant work:
- Training runs that checkpoint regularly and can resume.
- Batch inference, data preprocessing, and hyperparameter sweeps where losing one worker is harmless.
- Embarrassingly parallel jobs that can shed and re-add workers freely.
Browse which GPUs currently offer spot tiers in the GPU catalog, and remember that spot price floors move with demand - the cheapest spot GPU this week may not be cheapest next week.
Reserved: commit for predictable savings
Reserved (or committed-use) pricing trades flexibility for a guaranteed discount. You commit to a term - weeks, months, or a year - and pay 30-60% less than on-demand in return, while keeping the no-interruption guarantee. It is the opposite trade from spot: you give up flexibility instead of accepting risk.
Reserved is ideal for:
- Production inference fleets with steady, predictable load.
- Long training programs where you know you will need N GPUs for months.
- Teams that want budget certainty rather than a fluctuating spot bill.
Side-by-side comparison
| Dimension | On-demand | Spot | Reserved |
|---|---|---|---|
| Typical discount vs on-demand | 0% (baseline) | 40-70% off | 30-60% off |
| Interruption risk | None | High | None |
| Commitment | None | None | Weeks to a year |
| Best for | Dev, short jobs, unpredictable bursts | Checkpointed training, batch, sweeps | Steady production, long runs |
| Main downside | Highest hourly rate | Can vanish mid-run | Locked in even if needs change |
Surviving spot interruptions: checkpointing
The entire spot strategy lives or dies on checkpointing. If you can save and resume state cheaply, interruptions become an inconvenience rather than lost work.
- Checkpoint on a cadence tied to your interruption risk. If you save every 10-15 minutes, the most you ever lose is that window of compute.
- Write checkpoints to durable, fast storage. Push to object storage or a persistent block volume that survives the instance, not just local disk.
- Make resume automatic. On launch, detect the latest checkpoint and continue from it so a replacement worker self-heals.
- Handle the termination signal. Many providers send a short warning before reclaiming; use it to flush a final checkpoint.
- Mind egress. Frequent checkpoint reads and writes can add up; check the egress comparison if you move data across networks.
A simple decision framework
Map each workload to a mode with three questions:
- Is the load steady and long-term? Yes - reserved. The commitment pays for itself.
- Can it tolerate interruption (is it checkpointed or parallel)? Yes - spot. Capture the biggest discount.
- Neither? On-demand. Pay for the flexibility you genuinely need, but turn instances off when idle.
Many teams blend all three: reserved for the baseline production fleet, spot for training and batch, and on-demand for spiky development. Before committing, model the total cost of each mix with the GPU training cost calculator - factor in expected interruptions for spot, since a job that restarts often can erode the discount.
Common mistakes
- Running on-demand for steady production. The most expensive habit in GPU cloud; move predictable load to reserved.
- Putting non-checkpointed training on spot. One reclaim and you restart from zero, wiping out the savings.
- Over-committing to reserved. Locking in capacity you stop needing leaves you paying for idle GPUs.
- Ignoring utilization. A cheap GPU at 30% utilization is more expensive per unit of work than a pricier one at 90%.
Takeaway
On-demand buys flexibility, spot buys the deepest discount in exchange for interruption risk, and reserved buys predictability at a guaranteed discount. The winning strategy is rarely one mode - it is reserved for your steady baseline, spot for fault-tolerant bursts, and on-demand for the unpredictable middle. Compare the live tiers in the GPU catalog and validate your blend with the training cost calculator before you commit a dollar.