NVIDIA L40S cloud pricing

Vendor: NVIDIA
VRAM: 48 GB
Architecture: Ada Lovelace
FP16: 362 TFLOPS
Launched: 2023

Lowest

$0.650

Median

$0.906

Highest

$1.25

6 results

Provider	Plan	GPUs	VRAM	vCPUs	RAM	Price	Regions	Visit
Hyperstack	L40S (per GPU) On-demand	1	48 GB	16	90 GB	$1.00 /GPU-hr Verified Jun 20, 2026	1 country	Visit →
Nebius	L40S (per GPU) On-demand	1	48 GB	16	96 GB	$1.00 /GPU-hr Verified Jun 20, 2026	2 countries	Visit →
Hyperstack	L40S (per GPU) Reserved	1	48 GB	16	90 GB	$0.650 /GPU-hr Verified Jun 20, 2026	1 country	Visit →
Nebius	L40S (per GPU) Reserved	1	48 GB	16	96 GB	$0.650 /GPU-hr Verified Jun 20, 2026	2 countries	Visit →
CoreWeave	L40S (per GPU) On-demand	1	48 GB	16	128 GB	$1.25 /GPU-hr Verified Jun 20, 2026	2 countries	Visit →
CoreWeave	L40S (per GPU) Reserved	1	48 GB	16	128 GB	$0.812 /GPU-hr Verified Jun 20, 2026	2 countries	Visit →

Providers offering this GPU

CoreWeave

CoreWeave is a specialized GPU cloud operator with massive fleets of HGX H100, H200, GB200 NVL72, and B200 systems interconnected with high-speed InfiniBand networking. Purpose-built for large-scale AI training and inference at enterprise-grade reliability, it has become a preferred alternative to hyperscalers for GPU-intensive workloads.

Modal

Modal is a serverless compute platform purpose-built for AI workloads, offering sub-second cold starts, per-second GPU billing, and a Python-native developer experience. Scale-to-zero semantics on H100, A100, and L40S accelerators eliminate idle costs entirely, making it exceptionally cost-efficient for bursty inference, fine-tuning jobs, and scheduled pipelines.

RunPod

RunPod operates a dual-tier marketplace: community GPUs at ultra-low spot prices and a SOC 2-compliant Secure Cloud for production inference. Per-second billing, instant provisioning, and a broad catalog spanning H100, A100, RTX, and even MI300X accelerators make it flexible for projects of any scale.

Nebius

Nebius is a European AI cloud provider spun out of Yandex with large clusters of H100 and H200 GPUs in EU data centers. Competitive on-demand pricing, ISO 27001 compliance, and EU data residency make it a compelling choice for European AI startups and enterprises that need sovereignty over their training infrastructure.

Replicate

Replicate makes it trivially easy to run thousands of open-source AI models via a simple API, billing per second of GPU time with no cold-start penalties. It abstracts away all infrastructure concerns so developers can integrate image generation, video models, speech synthesis, and LLMs with a single line of code.

Hyperstack

Hyperstack is a next-generation GPU cloud platform offering H100, A100, B200, L40S, and RTX-class accelerators at aggressive on-demand and reserved rates. With data centers in London and Oslo, Terraform support, and fast API-driven provisioning, it targets teams that want hyperscaler-grade GPU availability without the lock-in.

Frequently asked questions

How much does an NVIDIA L40S cost per hour in the cloud?

The lowest on-demand NVIDIA L40S price we track is $0.650 per GPU-hour. Spot and reserved rates are usually lower; sort the table above by price to see the current rate from every provider.

What is the cheapest NVIDIA L40S cloud provider?

Sort the table by price (low to high) to see the cheapest NVIDIA L40S provider right now. Marketplace and spot providers often undercut hyperscalers by a wide margin for the same NVIDIA L40S.

Which cloud providers offer NVIDIA L40S GPUs?

Every provider with published NVIDIA L40S availability is listed above, with per-hour pricing, the number of GPUs per instance, region coverage, and on-demand, spot, and reserved rates.

Is spot NVIDIA L40S cheaper than on-demand?

Yes. Spot (preemptible) capacity is typically 40-70% cheaper than on-demand but can be reclaimed at short notice. Use the pricing-mode filter to compare on-demand, spot, and reserved rows side by side.

How much VRAM does the NVIDIA L40S have?

The NVIDIA L40S ships with 48 GB of VRAM. Larger VRAM lets you fit bigger models and batch sizes without sharding.

Is the NVIDIA L40S good for AI training and inference?

The NVIDIA L40S is used for both LLM training and inference. Match its VRAM and throughput (shown above) to your model size, and use spot capacity for fault-tolerant training to cut costs.