Benchmark H100 vs A100 Yourself: A Reproducible Test Guide
A reproducible guide to benchmarking H100 against A100 on your own workload, covering setup, fair comparison, and cost per result.
The H100 and A100 are two of the most rented data center GPUs, and the question of which to pick comes up constantly. Published benchmarks help, but they rarely match your model, your precision, or your batch size. The honest way to decide is to run a reproducible benchmark on your own workload. This guide shows how to compare H100 against A100 fairly, so the decision rests on numbers from your code rather than from a vendor slide.
Understand the architectural differences
Before measuring, it helps to know where the H100 tends to pull ahead. The H100 is a newer generation with more memory bandwidth, faster interconnect, and improved tensor cores, including stronger support for lower precision formats used in modern training and inference. The A100 remains a capable and widely available GPU, often at a lower hourly rate.
The practical consequence is that the H100 advantage varies by workload. Tasks that lean on its newer low precision support or its higher memory bandwidth can see large gains. Tasks that do not exercise those features may show a smaller gap, which changes the cost calculus entirely.
Control your variables
A fair benchmark holds everything constant except the GPU. If you change the precision, batch size, or software version between runs, you are no longer comparing the hardware. Lock these down before you start.
- The exact model and its weights.
- The precision or quantization format.
- The batch size and sequence or input length.
- The framework and library versions, ideally inside the same container image.
- The same warmup procedure for both GPUs.
Running both benchmarks from one pinned container image is the cleanest way to guarantee the software stack matches. Provision an H100 instance and an A100 instance, pull the same image onto each, and run the identical script.
Pick the right metric for your workload
The metric should reflect what you actually care about. Choose deliberately.
| Workload | Primary metric | Why |
|---|---|---|
| LLM inference | Tokens per second | Directly tracks serving capacity |
| Training | Samples or steps per second | Tracks time to finish a run |
| Latency sensitive serving | Time to first token and tail latency | Tracks user experience |
| Batch processing | Total job wall clock time | Tracks throughput end to end |
For most readers comparing GPUs for inference, tokens per second under realistic concurrency is the right headline. For training, steps per second or time to complete a fixed number of steps is more honest.
Run the benchmark
The procedure mirrors any good benchmark: warm up, then measure many iterations. Concretely:
- Provision an A100 instance and an H100 instance with the same image.
- Run a warmup pass on each to load weights and trigger any compilation.
- Run the measured workload many times and record the metric.
- Capture not just the average but the variation, so you know how stable the result is.
- Record the exact instance type and hourly price for each GPU.
Keep the runs back to back where possible, since instance availability and pricing shift over time. Save the raw logs so the result is auditable later.
Convert speed into cost per result
Raw speed is only half the answer. The H100 is usually faster but also costs more per hour, so the deciding question is cost per unit of work. Take the metric, convert it to work per hour, and divide the hourly rate by it.
- For inference, compute cost per million tokens on each GPU.
- For training, compute cost to complete a fixed number of steps on each GPU.
Often the faster GPU wins on cost per result even at a higher hourly rate, because it finishes the same work in much less time. But not always. If your workload does not exploit the H100's strengths, the A100 can be the cheaper path to the same output. Only your own numbers settle it.
Document and reuse the benchmark
A benchmark you can rerun is worth far more than a one off. Keep the script, the container image reference, and the recorded conditions in version control. When prices change, or a new GPU generation appears, you can rerun the same test in minutes and refresh your decision. On DeployCue you can pair these self measured results with current hourly pricing across providers to find the best value instance for your specific job.
Factor in availability and interconnect
Raw speed and price are not the only inputs to a real decision. Availability matters, because the cheapest GPU you cannot rent when you need it has an effective price of infinity. The A100 has been in the market longer and is often easier to find on demand at short notice, while the H100 can be scarce or carry a premium during high demand periods. If your workload runs on a schedule or must scale quickly, weigh how reliably each GPU is actually obtainable from your providers.
Interconnect matters too once you scale beyond a single GPU. The H100 generation offers faster links between GPUs, which helps multi GPU training where devices exchange large amounts of data. For a single GPU job that difference is irrelevant, but for distributed training it can widen the H100 advantage well beyond what a single device benchmark suggests. Match your benchmark to your real topology rather than testing one GPU and extrapolating to a cluster.
Avoid benchmark pitfalls
Self run benchmarks go wrong in predictable ways, and a few checks keep your comparison honest.
- Skipping warmup, which penalizes whichever GPU you happened to test first.
- Letting thermal throttling creep in on a long run, which understates a GPU that runs hot.
- Comparing different software versions or precisions between the two GPUs.
- Measuring a single run instead of many, so noise masquerades as a real difference.
- Forgetting to record the exact hourly price, which makes the cost comparison guesswork.
Treat the benchmark like an experiment with controls. Change one thing, the GPU, and hold everything else fixed. That rigor is what lets you defend the conclusion when someone questions it.
Conclusion
Comparing H100 against A100 well is about discipline, not opinion. Hold every variable constant except the GPU, run the same pinned image on both, measure the metric that matches your workload, and convert speed into cost per result. The faster GPU frequently wins on cost despite its premium, but only a reproducible test on your own workload tells you for sure, and that test is yours to keep and rerun whenever prices move.