Fireworks AI specializes in high-throughput open-model inference powered by its custom FireAttention kernel, delivering token generation speeds that routinely beat other hosting platforms. With HIPAA compliance and a broad catalog spanning Llama, DeepSeek, Qwen, and Mistral models, it is built for latency-sensitive production applications at scale.
DeepSeek R1 inference pricing
- Developer
- DeepSeek
- Quality rank
- #9
- Elo
- 1370
- Context
- 128K
- Weights
- Open
- Lowest output
- $0.990
3 results
| Provider | Plan | Price | Regions | Visit | |||
|---|---|---|---|---|---|---|---|
|
|
DeepSeek R1 Distill 70B | $0.990 | $0.750 | 128K |
$0.990
/M tokens
Input $0.750/1M tokens
Blended $0.810
Verified
|
Global | Visit → |
|
|
DeepSeek R1 (routed) | $2.19 | $0.550 | 128K |
$2.19
/M tokens
Input $0.550/1M tokens
Blended $0.960
Verified
|
Global | Visit → |
| Fireworks AI | DeepSeek R1 | $8.00 | $3.00 | 128K |
$8.00
/M tokens
Input $3.00/1M tokens
Blended $4.25
Verified
|
Global | Visit → |
Providers serving this model
Groq runs inference on custom LPU (Language Processing Unit) silicon rather than GPUs, delivering unmatched tokens-per-second throughput that can make even 70B models feel instant. With ultra-low pricing on Llama and DeepSeek models and a free tier for experimentation, it is the speed leader in the inference market.
OpenRouter acts as a unified gateway that routes API requests across dozens of inference providers - OpenAI, Anthropic, Google, Together, Groq, and more - through a single API key. It automatically selects the best available provider for each model, with transparent pricing and the ability to fallback if one endpoint goes down.