LLM inference pricing - compare token cost | DeployCue Skip to content
DeployCue

LLM inference pricing

Compare token pricing for hosted LLM inference: output $/M tokens, context window, quality, and provider count.

17 models

GPT-5 Closed
OpenAI
$15.00
output / 1M tokens
Elo 1410
Context 400K
In $/1M $2.50
Providers 3

OpenAI's flagship frontier model.

Anthropic
$75.00
output / 1M tokens
Elo 1430
Context 1000K
In $/1M $15.00
Providers 4

Anthropic's most capable model for hard reasoning and coding.

Google
$10.00
output / 1M tokens
Elo 1390
Context 2000K
In $/1M $1.25
Providers 2

Google's flagship multimodal model with a 2M-token context.

Anthropic
$15.00
output / 1M tokens
Elo 1400
Context 1000K
In $/1M $3.00
Providers 4

Balanced Claude tier; strong coding at lower cost than Opus.

DeepSeek R1 Open weights
DeepSeek
$0.990
output / 1M tokens
Params 671B
Elo 1370
Context 128K
In $/1M $0.550
Providers 3

Open-weight reasoning model competitive with closed o-series.

GPT-5 mini Closed
OpenAI
$1.80
output / 1M tokens
Elo 1360
Context 400K
In $/1M $0.450
Providers 3

Smaller, cheaper GPT-5 tier for high-volume workloads.

DeepSeek V3 Open weights
DeepSeek
$0.890
output / 1M tokens
Params 671B
Elo 1350
Context 128K
In $/1M $0.490
Providers 3

Open-weight MoE model offering frontier quality at low cost.

Google
$2.50
output / 1M tokens
Elo 1345
Context 1000K
In $/1M $0.300
Providers 2

Cost-efficient Gemini tier tuned for speed and scale.

GPT-4.1 Closed
OpenAI
$8.00
output / 1M tokens
Elo 1335
Context 1000K
In $/1M $2.00
Providers 3

Long-context GPT-4 generation model.

Anthropic
$5.00
output / 1M tokens
Elo 1320
Context 200K
In $/1M $1.00
Providers 4

Fast, inexpensive Claude tier for latency-sensitive apps.

Llama 3.1 405B Open weights
Meta
$3.00
output / 1M tokens
Params 405B
Elo 1305
Context 128K
In $/1M $3.00
Providers 4

Meta's largest open-weight model, near-frontier quality.

Mistral Large 2 Open weights
Mistral AI
$6.00
output / 1M tokens
Params 123B
Elo 1295
Context 128K
In $/1M $2.00
Providers 4

Mistral's flagship 123B model with strong multilingual coverage.

Llama 3.3 70B Open weights
Meta
$0.400
output / 1M tokens
Params 70B
Elo 1290
Context 128K
In $/1M $0.230
Providers 5

Open-weight 70B model rivaling much larger closed models.

Qwen 2.5 72B Open weights
Alibaba
$0.400
output / 1M tokens
Params 72B
Elo 1285
Context 131K
In $/1M $0.130
Providers 4

Alibaba's strong open-weight 72B model with broad language support.

GPT-4.1 nano Closed
OpenAI
$0.400
output / 1M tokens
Elo 1255
Context 1000K
In $/1M $0.100
Providers 3

The cheapest, fastest GPT-4.1 tier.

Mistral Small 3 Open weights
Mistral AI
$0.600
output / 1M tokens
Params 24B
Elo 1240
Context 128K
In $/1M $0.200
Providers 3

Efficient 24B open-weight model for fast, cheap inference.

Llama 3.1 8B Open weights
Meta
$0.050
output / 1M tokens
Params 8B
Elo 1180
Context 128K
In $/1M $0.030
Providers 4

Compact open-weight model, the workhorse of cheap hosted inference.