GPU Cloud Pricing Models Compared | DeployCue Skip to content
DeployCue

GPU Cloud Pricing Models Compared: On-Demand, Spot, Reserved, Committed

Jun 20, 2026

An overview of the four main GPU cloud pricing models and how on-demand, spot, reserved, and committed use pricing trade flexibility for savings.

Cloud GPU capacity can be purchased in several distinct ways, and the model you choose often matters more to your bill than which provider you pick. The same H100 in the same region can cost dramatically different amounts depending on whether you buy it on-demand, grab it on the spot market, lock it into a reservation, or wrap it in a committed use agreement. This guide lays out the four main pricing models, the tradeoffs each one carries, and how to assemble a sensible mix for a real workload.

The Four Core Models

Almost every GPU cloud pricing scheme is a variation on four archetypes. Understanding the archetypes lets you decode any provider's offerings quickly.

ModelCommitmentTypical savingsInterruption risk
On-demandNoneNone (baseline)None
SpotNoneLargestHigh
Reserved1 to 3 yearsSignificantNone
Committed useSpend or usage pledgeSignificantNone

On-Demand

On-demand is the default. You pay a published hourly rate, you can start and stop whenever you like, and you commit to nothing. This flexibility is its whole value. It is ideal for unpredictable bursts, early experiments, and any workload whose future you cannot forecast. The downside is price: on-demand is the most expensive way to run steady capacity, and a workload left running on-demand for months is usually overpaying.

Spot

Spot capacity, sometimes called preemptible or interruptible, sells the provider's spare GPUs at a steep discount. The catch is that the provider can reclaim the instance with little warning when it needs the hardware back. Spot is exceptional for fault-tolerant, checkpointable work such as batch training, hyperparameter sweeps, rendering, and offline inference. It is poorly suited to anything that must stay up, like a user-facing inference endpoint, unless you engineer careful fallback.

Reserved

Reserved capacity trades flexibility for a discount. You commit to a term, typically one to three years, and you receive a reduced rate on matching usage. There is no interruption risk, which makes reserved the natural home for stable, always-on baselines. The risk is forecasting: if your usage shifts or the hardware generation turns over, you may be paying for a reservation that no longer fits.

Committed Use

Committed use agreements resemble reservations but pledge a level of spend or usage rather than a specific instance. They are often more flexible about which instance types the discount applies to, which helps teams whose exact hardware mix changes over time. Like reservations, they reward predictable demand and punish over-commitment.

Flexibility Versus Savings

Every model sits on a single spectrum that trades flexibility for price. On-demand gives you total flexibility at the highest cost. Spot gives you the lowest price at the cost of reliability. Reserved and committed sit in between, offering strong savings in exchange for a promise about the future. There is no universally best model, only the model that matches a given slice of your workload.

Building the Right Mix

Mature GPU operations rarely pick one model. They layer several, mapping each to the part of demand it suits best.

  • Reserved or committed for the baseline: the steady, always-on capacity you are confident you will keep running.
  • On-demand for the variable layer: the predictable peaks that exceed the baseline but still require reliability.
  • Spot for the elastic, fault-tolerant layer: training jobs, batch pipelines, and other work that can absorb interruptions.

This layered approach captures deep discounts on the part of your usage that is predictable while keeping the flexibility you need for everything else. The classic mistake is to run an entire workload on a single model, either overpaying with all on-demand or over-committing with reservations that outlast the workload.

Designing for Spot Without Pain

Spot capacity offers the deepest discount, but it only pays off if your system can survive an interruption gracefully. The engineering pattern that makes spot safe is checkpointing: save progress frequently so that when an instance is reclaimed, a fresh one can resume from the last checkpoint rather than starting over. For training jobs this means writing model state to durable storage at regular intervals. For batch processing it means designing idempotent units of work that can be retried without corrupting results.

A second pattern is graceful fallback. Many providers give a short warning before reclaiming a spot instance. Software that listens for this signal can finish in-flight work, persist state, and request replacement capacity, sometimes failing over to on-demand for the few minutes until spot capacity returns. With these patterns in place, spot stops being risky and becomes a routine source of savings on a large share of compute-heavy work. Without them, spot interruptions turn into lost work and frustrated engineers, which is why so many teams wrongly conclude that spot is not worth it.

Watching the Hardware Cycle

GPU pricing is uniquely sensitive to the hardware refresh cycle. New accelerator generations can shift price-to-performance sharply, which has direct consequences for commitment-based models. A 3-year reservation signed just before a major new GPU arrives can leave you locked to last-generation hardware at a rate that no longer looks like a bargain. This is why many GPU teams favor shorter commitments or spend-based plans that can flex toward newer instances. Keep one eye on the roadmap and let it temper how far into the future you are willing to commit.

A Decision Checklist

  1. Can the workload tolerate sudden interruption? If yes, spot is on the table.
  2. Is the demand stable and long-lived? If yes, reserved or committed pricing fits the baseline.
  3. Is the demand unpredictable or short-lived? On-demand protects you from bad commitments.
  4. Will the hardware generation likely turn over during the term? Favor shorter commitments.
  5. Does your instance mix change often? Committed use may be more forgiving than instance-specific reservations.

Conclusion

GPU cloud pricing is less about finding a single cheap provider and more about matching each pricing model to the right portion of your demand. Reserve or commit the stable floor, run the unpredictable middle on-demand, and push fault-tolerant work onto spot. Revisit the mix as your workload matures, because the optimal split shifts as your usage becomes more predictable. Master this taxonomy and you will consistently pay closer to the true cost of your compute instead of the convenience premium.