Serverless GPU pricing - compare providers | DeployCue Skip to content
DeployCue

Serverless GPU pricing comparison

14 results

Provider Plan Price Regions Visit
Baseten L4 (autoscaling) NVIDIA L4 24 GB $1.06 /hr
Verified
1 country Visit →
RunPod L40S flex worker NVIDIA L40S 48 GB $1.69 /hr
Verified
3 countries Visit →
RunPod Serverless L40S NVIDIA L40S 48 GB $1.90 /hr
Verified
3 countries Visit →
Modal L40S (serverless) NVIDIA L40S 48 GB $1.95 /hr
Verified
1 country Visit →
Modal A100 80GB (serverless) NVIDIA A100 80GB 80 GB $2.78 /hr
Verified
2 countries Visit →
RunPod A100 80GB flex worker NVIDIA A100 80GB 80 GB $2.88 /hr
Verified
3 countries Visit →
Baseten A100 80GB (autoscaling) NVIDIA A100 80GB 80 GB $3.18 /hr
Verified
1 country Visit →
Replicate L40S (per second) NVIDIA L40S 48 GB $3.51 /hr
Verified
1 country Visit →
Modal H100 (serverless) NVIDIA H100 80 GB $3.95 /hr
Verified
2 countries Visit →
RunPod H100 flex worker NVIDIA H100 80 GB $4.18 /hr
Verified
3 countries Visit →
Baseten H100 (autoscaling) NVIDIA H100 80 GB $4.32 /hr
Verified
1 country Visit →
RunPod Serverless H100 NVIDIA H100 80 GB $4.32 /hr
Verified
3 countries Visit →
Replicate A100 80GB (per second) NVIDIA A100 80GB 80 GB $5.04 /hr
Verified
1 country Visit →
Replicate H100 (per second) NVIDIA H100 80 GB $5.04 /hr
Verified
1 country Visit →

Frequently asked questions

How is serverless GPU priced?
You are billed per second (or per request) only while the GPU runs your job, so idle time is free and the GPU scales to zero. The table shows the GPU type and per-unit rate for each provider.
When should I use serverless GPU instead of a dedicated GPU?
Serverless GPU suits spiky or low-volume inference where a dedicated GPU sits idle. For steady, high-use workloads a rented GPU on our GPU cloud page is usually cheaper per hour.
What is the cheapest serverless GPU provider?
Sort by the per-second or per-request rate for the GPU you need. Because billing is usage-based, the cheapest choice also depends on how bursty your traffic is.
What are cold starts on serverless GPU?
A cold start is the time to load the model onto a fresh GPU after idle, which can be several seconds. Where a provider publishes cold-start time it is shown; it matters for latency-sensitive inference.
Which GPUs are available serverless?
The GPU column lists the accelerator behind each serverless endpoint and its VRAM. Match the VRAM to your model size just as you would for a dedicated GPU.
Does serverless GPU scale to zero?
Yes - that is the core benefit. Instances spin up on demand and shut down when idle, so you pay nothing between requests, unlike an always-on rented GPU.