GPU Cloud Billing Units: Per-Second, Per-Minute, and Per-Hour Compared
A comparison of per-second, per-minute, and per-hour GPU cloud billing, showing how granularity affects cost for bursty, batch, and steady workloads.
Two providers can advertise the same GPU at the same hourly rate and still bill you very differently, because the unit they meter in matters as much as the rate itself. Per-second, per-minute, and per-hour billing each round your usage in a different way, and for certain workloads the granularity alone can swing the bill by a wide margin. This guide compares the three billing units, shows where each helps or hurts, and gives you a simple way to factor granularity into provider selection.
What billing granularity actually means
Billing granularity is the smallest increment of time a provider charges for. With per-hour billing, a job that runs for ten minutes is often billed as a full hour, because usage rounds up to the next whole unit. With per-second billing, you pay for roughly the seconds you used, often after a small minimum. Per-minute sits in between. The headline rate is usually quoted per hour regardless, so the granularity hides in the terms rather than the price.
Rounding is the crux. The finer the unit, the less you overpay for the unused tail at the end of a job. The coarser the unit, the more that tail costs you, and the effect compounds across many short jobs.
Per-hour billing
Per-hour billing is the simplest and historically the most common. You are charged for each hour or partial hour a resource is allocated. For long-running, steady workloads that occupy a GPU for many hours, the rounding is negligible because the wasted minutes at the end are tiny relative to total runtime.
The problem appears with short or bursty jobs. A five-minute inference burst billed as a full hour means you pay roughly twelve times the compute you used. Teams that spin instances up and down frequently can quietly lose a large fraction of their budget to rounding under per-hour billing.
Per-minute billing
Per-minute billing rounds to the nearest minute, often with a minimum charge of a minute or a few minutes. It dramatically reduces the rounding penalty compared with per-hour, making it a comfortable middle ground for many workloads. A job lasting several minutes pays close to its true usage, and even short jobs waste at most a fraction of a minute.
For interactive development, notebooks, and medium-length batch jobs, per-minute billing usually captures most of the savings that finer granularity offers, without requiring you to think hard about seconds.
Per-second billing
Per-second billing meters usage to the second, typically after a one-minute minimum. It is the most forgiving for very short or highly variable jobs, and it shines for high-volume, intermittent inference where each request occupies a GPU briefly. When you launch and terminate instances constantly, per-second billing ensures you pay for compute and almost nothing else.
The catch is that the minimum charge still applies, so spinning up an instance for a two-second task may bill a full minute. Per-second billing helps most when individual jobs run for seconds to minutes and recur often.
Comparing the three at a glance
| Billing unit | Best for | Rounding waste |
|---|---|---|
| Per-hour | Long steady jobs, training runs | High for short jobs |
| Per-minute | Dev sessions, medium batches | Low |
| Per-second | Bursty inference, frequent restarts | Lowest, subject to minimums |
How to match granularity to your workload
The right unit depends entirely on how your jobs behave. Profile your typical run length and frequency before you choose.
- Steady, multi-hour workloads: granularity barely matters. Pick on rate, region, and availability instead.
- Frequent short bursts: finer granularity matters enormously. Favor per-second or per-minute providers.
- Unpredictable, spiky traffic: per-second billing protects you from paying for idle tails between spikes.
- Mixed fleets: you may route long jobs to a cheaper per-hour provider and bursty jobs to a per-second one.
A quick way to quantify the difference
- Estimate your average job length in minutes.
- For per-hour billing, round that length up to the next 60-minute boundary to find billed time.
- For per-minute, round up to the next minute. For per-second, round up to the minimum then to the second.
- Divide billed time by actual time to get your rounding overhead factor.
- Multiply each provider's hourly rate by its overhead factor for your workload to compare true effective cost.
This calculation often reverses a naive comparison. A provider with a slightly higher hourly rate but per-second billing can beat a cheaper per-hour competitor for bursty work once rounding is included.
Beyond granularity
Billing unit is one factor among several. Minimum charges, startup billing that begins before your workload is ready, and shutdown lag that keeps billing after your job finishes all interact with granularity. Read the terms to learn exactly when the meter starts and stops, because per-second billing loses its edge if you are charged for a long provisioning period on every launch.
How granularity interacts with autoscaling
Modern GPU workloads rarely run on a single static instance. They scale up and down with demand, spinning new instances during traffic spikes and tearing them down when load falls. Under coarse per-hour billing, this autoscaling pattern can backfire: an instance launched to handle a brief spike, then terminated minutes later, still bills a full hour. The more aggressively you scale, the more rounding waste you accumulate. Fine-grained billing aligns far better with elastic infrastructure, because scaling decisions no longer carry a hidden rounding tax. If your platform scales frequently, billing granularity should weigh heavily in provider choice, since it directly shapes how efficient your autoscaling can be.
Reading the fine print on the meter
Two providers with the same nominal granularity can still bill differently depending on exactly when their meter starts and stops. Some begin charging the moment an instance is requested, including provisioning and boot time during which your workload cannot yet run. Others start the meter only when the instance is ready. Likewise, some keep billing through a shutdown or deallocation window. For short jobs, provisioning overhead can rival the useful runtime, so a provider that bills boot time erodes the benefit of fine granularity. Always confirm the meter boundaries, not just the increment, before trusting a per-second claim to save you money.
Granularity is a quiet but real lever on GPU cost. For long training runs it scarcely registers, but for the bursty, intermittent workloads that increasingly define AI inference, the choice between per-second, per-minute, and per-hour billing can reshape your bill. Profile your jobs, compute the rounding overhead, and weigh billing unit alongside rate so you compare providers on the cost you will actually pay.