LLM inference pricing
Compare token pricing for hosted LLM inference: output $/M tokens, context window, quality, and provider count.
17 models
| Model | Developer | Type | Params | Released | Elo | Context | In $/1M | Out $/1M | Providers | |
|---|---|---|---|---|---|---|---|---|---|---|
| GPT-5 | OpenAI | Closed | - | - | 1410 | 400K | $2.50 | $15.00 | 3 | Compare → |
| Claude Opus 4.8 | Anthropic | Closed | - | - | 1430 | 1000K | $15.00 | $75.00 | 4 | Compare → |
| Gemini 2.5 Pro | Closed | - | - | 1390 | 2000K | $1.25 | $10.00 | 2 | Compare → | |
| Claude Sonnet 4.6 | Anthropic | Closed | - | - | 1400 | 1000K | $3.00 | $15.00 | 4 | Compare → |
| DeepSeek R1 | DeepSeek | Open weights | 671B | - | 1370 | 128K | $0.550 | $0.990 | 3 | Compare → |
| GPT-5 mini | OpenAI | Closed | - | - | 1360 | 400K | $0.450 | $1.80 | 3 | Compare → |
| DeepSeek V3 | DeepSeek | Open weights | 671B | - | 1350 | 128K | $0.490 | $0.890 | 3 | Compare → |
| Gemini 2.5 Flash | Closed | - | - | 1345 | 1000K | $0.300 | $2.50 | 2 | Compare → | |
| GPT-4.1 | OpenAI | Closed | - | - | 1335 | 1000K | $2.00 | $8.00 | 3 | Compare → |
| Claude Haiku 4.5 | Anthropic | Closed | - | - | 1320 | 200K | $1.00 | $5.00 | 4 | Compare → |
| Llama 3.1 405B | Meta | Open weights | 405B | - | 1305 | 128K | $3.00 | $3.00 | 4 | Compare → |
| Mistral Large 2 | Mistral AI | Open weights | 123B | - | 1295 | 128K | $2.00 | $6.00 | 4 | Compare → |
| Llama 3.3 70B | Meta | Open weights | 70B | - | 1290 | 128K | $0.230 | $0.400 | 5 | Compare → |
| Qwen 2.5 72B | Alibaba | Open weights | 72B | - | 1285 | 131K | $0.130 | $0.400 | 4 | Compare → |
| GPT-4.1 nano | OpenAI | Closed | - | - | 1255 | 1000K | $0.100 | $0.400 | 3 | Compare → |
| Mistral Small 3 | Mistral AI | Open weights | 24B | - | 1240 | 128K | $0.200 | $0.600 | 3 | Compare → |
| Llama 3.1 8B | Meta | Open weights | 8B | - | 1180 | 128K | $0.030 | $0.050 | 4 | Compare → |
OpenAI's flagship frontier model.
Anthropic's most capable model for hard reasoning and coding.
Google's flagship multimodal model with a 2M-token context.
Balanced Claude tier; strong coding at lower cost than Opus.
Open-weight reasoning model competitive with closed o-series.
Smaller, cheaper GPT-5 tier for high-volume workloads.
Open-weight MoE model offering frontier quality at low cost.
Cost-efficient Gemini tier tuned for speed and scale.
Long-context GPT-4 generation model.
Fast, inexpensive Claude tier for latency-sensitive apps.
Meta's largest open-weight model, near-frontier quality.
Mistral's flagship 123B model with strong multilingual coverage.
Open-weight 70B model rivaling much larger closed models.
Alibaba's strong open-weight 72B model with broad language support.
The cheapest, fastest GPT-4.1 tier.
Efficient 24B open-weight model for fast, cheap inference.
Compact open-weight model, the workhorse of cheap hosted inference.