NVIDIA H100 cloud price comparison | DeployCue Skip to content
DeployCue

NVIDIA H100 cloud pricing

Vendor
NVIDIA
VRAM
80 GB
Architecture
Hopper
FP16
989 TFLOPS
Launched
2022
Lowest
$1.19
Median
$2.00
Highest
$3.50

27 results

Provider Plan Price Regions Visit
Hyperstack H100 SXM (per GPU) On-demand 1 80 GB 28 180 GB $1.95 /GPU-hr
Verified
2 countries Visit →
Hyperstack H100 SXM (per GPU) Reserved 1 80 GB 28 180 GB $1.27 /GPU-hr
Verified
2 countries Visit →
Vast.ai H100 SXM (marketplace) On-demand 1 80 GB 16 128 GB $1.99 /GPU-hr
Verified
3 countries Visit →
Vast.ai H100 SXM (marketplace) Spot 1 80 GB 16 128 GB $1.19 /GPU-hr
Verified
3 countries Visit →
Vast.ai H100 SXM (marketplace) Reserved 1 80 GB 16 128 GB $1.29 /GPU-hr
Verified
3 countries Visit →
Nebius H100 SXM (per GPU) On-demand 1 80 GB 20 200 GB $2.00 /GPU-hr
Verified
2 countries Visit →
Nebius H100 SXM (per GPU) Reserved 1 80 GB 20 200 GB $1.30 /GPU-hr
Verified
2 countries Visit →
CoreWeave HGX H100 (per GPU) On-demand 1 80 GB 22 256 GB $2.23 /GPU-hr
Verified
2 countries Visit →
CoreWeave HGX H100 (per GPU) Reserved 1 80 GB 22 256 GB $1.45 /GPU-hr
Verified
2 countries Visit →
Paperspace H100 machine On-demand 1 80 GB 20 250 GB $2.24 /GPU-hr
Verified
2 countries Visit →
Paperspace H100 machine Reserved 1 80 GB 20 250 GB $1.46 /GPU-hr
Verified
2 countries Visit →
RunPod H100 PCIe (Secure Cloud) On-demand 1 80 GB 16 188 GB $2.39 /GPU-hr
Verified
3 countries Visit →
Together AI H100 SXM cluster (per GPU) On-demand 1 80 GB 20 200 GB $2.39 /GPU-hr
Verified
1 country Visit →
RunPod H100 PCIe (Secure Cloud) Reserved 1 80 GB 16 188 GB $1.55 /GPU-hr
Verified
3 countries Visit →
Together AI H100 SXM cluster (per GPU) Reserved 1 80 GB 20 200 GB $1.55 /GPU-hr
Verified
1 country Visit →
Crusoe H100 SXM (per GPU) On-demand 1 80 GB 24 240 GB $2.45 /GPU-hr
Verified
1 country Visit →
Crusoe H100 SXM (per GPU) Reserved 1 80 GB 24 240 GB $1.59 /GPU-hr
Verified
1 country Visit →
Oracle Cloud Infrastructure BM.GPU.H100.8 On-demand 8 640 GB 112 2,048 GB $2.90 /GPU-hr
Verified
2 countries Visit →
Oracle Cloud Infrastructure BM.GPU.H100.8 Reserved 8 640 GB 112 2,048 GB $1.89 /GPU-hr
Verified
2 countries Visit →
Lambda H100 SXM 1x On-demand 1 80 GB 26 225 GB $2.99 /GPU-hr
Verified
1 country Visit →
Lambda H100 SXM 1x Reserved 1 80 GB 26 225 GB $1.94 /GPU-hr
Verified
1 country Visit →
Google Cloud A3 (8x H100) On-demand 8 640 GB 208 1,872 GB $3.25 /GPU-hr
Verified
2 countries Visit →
Google Cloud A3 (8x H100) Reserved 8 640 GB 208 1,872 GB $2.11 /GPU-hr
Verified
2 countries Visit →
Amazon Web Services P5 (8x H100) On-demand 8 640 GB 192 2,048 GB $3.40 /GPU-hr
Verified
1 country Visit →
Amazon Web Services P5 (8x H100) Reserved 8 640 GB 192 2,048 GB $2.21 /GPU-hr
Verified
1 country Visit →
Microsoft Azure ND H100 v5 (8x H100) On-demand 8 640 GB 96 1,900 GB $3.50 /GPU-hr
Verified
2 countries Visit →
Microsoft Azure ND H100 v5 (8x H100) Reserved 8 640 GB 96 1,900 GB $2.27 /GPU-hr
Verified
2 countries Visit →

Providers offering this GPU

Amazon Web Services is the world's largest cloud provider with 200+ services across compute, storage, databases, ML, and networking. Dominates in enterprise with the broadest global region footprint and the deepest service catalog, but pricing complexity and egress fees add up at scale.

Google Cloud Platform combines world-class data analytics, AI infrastructure (TPUs, Vertex AI), and the original managed Kubernetes. Its global fiber backbone and Preemptible VMs offer compelling price-performance for data-heavy and containerized workloads.

Lambda logo 4

Lambda Labs is purpose-built for ML teams - simple, transparent per-hour rates on H100, H200, B200, and GB200 instances with zero hidden fees. Known for responsive support and direct hardware access, it is a top choice for training runs that need predictable pricing without cloud-platform complexity.

CoreWeave logo 4

CoreWeave is a specialized GPU cloud operator with massive fleets of HGX H100, H200, GB200 NVL72, and B200 systems interconnected with high-speed InfiniBand networking. Purpose-built for large-scale AI training and inference at enterprise-grade reliability, it has become a preferred alternative to hyperscalers for GPU-intensive workloads.

Microsoft Azure is the enterprise cloud tightly woven into the Microsoft ecosystem - Active Directory, Windows Server, Visual Studio, and Microsoft 365. Deep AI partnerships with OpenAI and a massive compliance portfolio make it the default choice for Fortune 500 hybrid deployments.

Modal logo 4

Modal is a serverless compute platform purpose-built for AI workloads, offering sub-second cold starts, per-second GPU billing, and a Python-native developer experience. Scale-to-zero semantics on H100, A100, and L40S accelerators eliminate idle costs entirely, making it exceptionally cost-efficient for bursty inference, fine-tuning jobs, and scheduled pipelines.

Baseten logo 3

Baseten is a production model-serving platform with built-in autoscaling, per-minute GPU billing, and SOC 2/HIPAA compliance. Designed for teams deploying LLMs and diffusion models at scale, it handles cold starts, traffic spikes, and infrastructure tuning so engineers can focus on model quality rather than platform reliability.

RunPod logo 4

RunPod operates a dual-tier marketplace: community GPUs at ultra-low spot prices and a SOC 2-compliant Secure Cloud for production inference. Per-second billing, instant provisioning, and a broad catalog spanning H100, A100, RTX, and even MI300X accelerators make it flexible for projects of any scale.

Nebius logo 4

Nebius is a European AI cloud provider spun out of Yandex with large clusters of H100 and H200 GPUs in EU data centers. Competitive on-demand pricing, ISO 27001 compliance, and EU data residency make it a compelling choice for European AI startups and enterprises that need sovereignty over their training infrastructure.

Replicate logo 3

Replicate makes it trivially easy to run thousands of open-source AI models via a simple API, billing per second of GPU time with no cold-start penalties. It abstracts away all infrastructure concerns so developers can integrate image generation, video models, speech synthesis, and LLMs with a single line of code.

Together AI provides blazing-fast hosted inference for open-weight models including Llama 3.1 (8B through 405B), DeepSeek V3, Qwen 2.5, and Mistral - all at prices far below closed-model APIs. Its optimized serving infrastructure and free tier for experimentation make it the go-to platform for teams that prefer open models without self-hosting overhead.

Crusoe logo 4

Crusoe is a climate-aligned GPU cloud that runs H100, H200, and MI300X workloads on stranded or flare-captured energy, drastically reducing the carbon footprint of AI compute. For teams that care about sustainability without sacrificing performance, it offers enterprise-grade infrastructure with genuine environmental accountability.

Frequently asked questions

How much does an NVIDIA H100 cost per hour in the cloud?
The lowest on-demand NVIDIA H100 price we track is $1.19 per GPU-hour. Spot and reserved rates are usually lower; sort the table above by price to see the current rate from every provider.
What is the cheapest NVIDIA H100 cloud provider?
Sort the table by price (low to high) to see the cheapest NVIDIA H100 provider right now. Marketplace and spot providers often undercut hyperscalers by a wide margin for the same NVIDIA H100.
Which cloud providers offer NVIDIA H100 GPUs?
Every provider with published NVIDIA H100 availability is listed above, with per-hour pricing, the number of GPUs per instance, region coverage, and on-demand, spot, and reserved rates.
Is spot NVIDIA H100 cheaper than on-demand?
Yes. Spot (preemptible) capacity is typically 40-70% cheaper than on-demand but can be reclaimed at short notice. Use the pricing-mode filter to compare on-demand, spot, and reserved rows side by side.
How much VRAM does the NVIDIA H100 have?
The NVIDIA H100 ships with 80 GB of VRAM. Larger VRAM lets you fit bigger models and batch sizes without sharding.
Is the NVIDIA H100 good for AI training and inference?
The NVIDIA H100 is used for both LLM training and inference. Match its VRAM and throughput (shown above) to your model size, and use spot capacity for fault-tolerant training to cut costs.