Serverless GPU pricing comparison

14 results

Provider	Plan	GPU	VRAM	Price	Regions	Visit
Baseten	L4 (autoscaling)	NVIDIA L4	24 GB	$1.06 /hr Verified Jun 20, 2026	1 country	Visit →
RunPod	L40S flex worker	NVIDIA L40S	48 GB	$1.69 /hr Verified Jun 20, 2026	3 countries	Visit →
RunPod	Serverless L40S	NVIDIA L40S	48 GB	$1.90 /hr Verified Jun 20, 2026	3 countries	Visit →
Modal	L40S (serverless)	NVIDIA L40S	48 GB	$1.95 /hr Verified Jun 20, 2026	1 country	Visit →
Modal	A100 80GB (serverless)	NVIDIA A100 80GB	80 GB	$2.78 /hr Verified Jun 20, 2026	2 countries	Visit →
RunPod	A100 80GB flex worker	NVIDIA A100 80GB	80 GB	$2.88 /hr Verified Jun 20, 2026	3 countries	Visit →
Baseten	A100 80GB (autoscaling)	NVIDIA A100 80GB	80 GB	$3.18 /hr Verified Jun 20, 2026	1 country	Visit →
Replicate	L40S (per second)	NVIDIA L40S	48 GB	$3.51 /hr Verified Jun 20, 2026	1 country	Visit →
Modal	H100 (serverless)	NVIDIA H100	80 GB	$3.95 /hr Verified Jun 20, 2026	2 countries	Visit →
RunPod	H100 flex worker	NVIDIA H100	80 GB	$4.18 /hr Verified Jun 20, 2026	3 countries	Visit →
Baseten	H100 (autoscaling)	NVIDIA H100	80 GB	$4.32 /hr Verified Jun 20, 2026	1 country	Visit →
RunPod	Serverless H100	NVIDIA H100	80 GB	$4.32 /hr Verified Jun 20, 2026	3 countries	Visit →
Replicate	A100 80GB (per second)	NVIDIA A100 80GB	80 GB	$5.04 /hr Verified Jun 20, 2026	1 country	Visit →
Replicate	H100 (per second)	NVIDIA H100	80 GB	$5.04 /hr Verified Jun 20, 2026	1 country	Visit →

Frequently asked questions

How is serverless GPU priced?

You are billed per second (or per request) only while the GPU runs your job, so idle time is free and the GPU scales to zero. The table shows the GPU type and per-unit rate for each provider.

When should I use serverless GPU instead of a dedicated GPU?

Serverless GPU suits spiky or low-volume inference where a dedicated GPU sits idle. For steady, high-use workloads a rented GPU on our GPU cloud page is usually cheaper per hour.

What is the cheapest serverless GPU provider?

Sort by the per-second or per-request rate for the GPU you need. Because billing is usage-based, the cheapest choice also depends on how bursty your traffic is.

What are cold starts on serverless GPU?

A cold start is the time to load the model onto a fresh GPU after idle, which can be several seconds. Where a provider publishes cold-start time it is shown; it matters for latency-sensitive inference.

Which GPUs are available serverless?

The GPU column lists the accelerator behind each serverless endpoint and its VRAM. Match the VRAM to your model size just as you would for a dedicated GPU.

Does serverless GPU scale to zero?

Yes - that is the core benefit. Instances spin up on demand and shut down when idle, so you pay nothing between requests, unlike an always-on rented GPU.