NVIDIA B200 cloud pricing

Vendor: NVIDIA
VRAM: 192 GB
Architecture: Blackwell
FP16: 2250 TFLOPS
Launched: 2025

Lowest

$2.27

Median

$3.37

Highest

$4.99

8 results

Provider	Plan	GPUs	VRAM	vCPUs	RAM	Price	Regions	Visit
Hyperstack	B200 (per GPU) On-demand	1	180 GB	28	283 GB	$3.50 /GPU-hr Verified Jun 20, 2026	1 country	Visit →
Hyperstack	B200 (per GPU) Reserved	1	180 GB	28	283 GB	$2.27 /GPU-hr Verified Jun 20, 2026	1 country	Visit →
Together AI	B200 cluster (per GPU) On-demand	1	180 GB	28	283 GB	$3.99 /GPU-hr Verified Jun 20, 2026	1 country	Visit →
Together AI	B200 cluster (per GPU) Reserved	1	180 GB	28	283 GB	$2.59 /GPU-hr Verified Jun 20, 2026	1 country	Visit →
CoreWeave	HGX B200 (per GPU) On-demand	1	180 GB	28	384 GB	$4.25 /GPU-hr Verified Jun 20, 2026	1 country	Visit →
CoreWeave	HGX B200 (per GPU) Reserved	1	180 GB	28	384 GB	$2.76 /GPU-hr Verified Jun 20, 2026	1 country	Visit →
Lambda	B200 1x On-demand	1	180 GB	28	283 GB	$4.99 /GPU-hr Verified Jun 20, 2026	1 country	Visit →
Lambda	B200 1x Reserved	1	180 GB	28	283 GB	$3.24 /GPU-hr Verified Jun 20, 2026	1 country	Visit →

Providers offering this GPU

Lambda

Lambda Labs is purpose-built for ML teams - simple, transparent per-hour rates on H100, H200, B200, and GB200 instances with zero hidden fees. Known for responsive support and direct hardware access, it is a top choice for training runs that need predictable pricing without cloud-platform complexity.

CoreWeave

CoreWeave is a specialized GPU cloud operator with massive fleets of HGX H100, H200, GB200 NVL72, and B200 systems interconnected with high-speed InfiniBand networking. Purpose-built for large-scale AI training and inference at enterprise-grade reliability, it has become a preferred alternative to hyperscalers for GPU-intensive workloads.

Together AI

Together AI provides blazing-fast hosted inference for open-weight models including Llama 3.1 (8B through 405B), DeepSeek V3, Qwen 2.5, and Mistral - all at prices far below closed-model APIs. Its optimized serving infrastructure and free tier for experimentation make it the go-to platform for teams that prefer open models without self-hosting overhead.

Hyperstack

Hyperstack is a next-generation GPU cloud platform offering H100, A100, B200, L40S, and RTX-class accelerators at aggressive on-demand and reserved rates. With data centers in London and Oslo, Terraform support, and fast API-driven provisioning, it targets teams that want hyperscaler-grade GPU availability without the lock-in.

Frequently asked questions

How much does an NVIDIA B200 cost per hour in the cloud?

The lowest on-demand NVIDIA B200 price we track is $2.27 per GPU-hour. Spot and reserved rates are usually lower; sort the table above by price to see the current rate from every provider.

What is the cheapest NVIDIA B200 cloud provider?

Sort the table by price (low to high) to see the cheapest NVIDIA B200 provider right now. Marketplace and spot providers often undercut hyperscalers by a wide margin for the same NVIDIA B200.

Which cloud providers offer NVIDIA B200 GPUs?

Every provider with published NVIDIA B200 availability is listed above, with per-hour pricing, the number of GPUs per instance, region coverage, and on-demand, spot, and reserved rates.

Is spot NVIDIA B200 cheaper than on-demand?

Yes. Spot (preemptible) capacity is typically 40-70% cheaper than on-demand but can be reclaimed at short notice. Use the pricing-mode filter to compare on-demand, spot, and reserved rows side by side.

How much VRAM does the NVIDIA B200 have?

The NVIDIA B200 ships with 192 GB of VRAM. Larger VRAM lets you fit bigger models and batch sizes without sharding.

Is the NVIDIA B200 good for AI training and inference?

The NVIDIA B200 is used for both LLM training and inference. Match its VRAM and throughput (shown above) to your model size, and use spot capacity for fault-tolerant training to cut costs.