Fireworks AI specializes in high-throughput open-model inference powered by its custom FireAttention kernel, delivering token generation speeds that routinely beat other hosting platforms. With HIPAA compliance and a broad catalog spanning Llama, DeepSeek, Qwen, and Mistral models, it is built for latency-sensitive production applications at scale.
DeepSeek V3 inference pricing
- Developer
- DeepSeek
- Quality rank
- #14
- Elo
- 1350
- Context
- 128K
- Weights
- Open
- Lowest output
- $0.890
3 results
| Provider | Plan | Price | Regions | Visit | |||
|---|---|---|---|---|---|---|---|
|
|
DeepSeek V3 | $0.890 | $0.490 | 128K |
$0.890
/M tokens
Input $0.490/1M tokens
Blended $0.590
Verified
|
Global | Visit → |
| Fireworks AI | DeepSeek V3 | $0.900 | $0.900 | 128K |
$0.900
/M tokens
Input $0.900/1M tokens
Blended $0.900
Verified
|
Global | Visit → |
|
|
DeepSeek V3 | $1.25 | $1.25 | 128K |
$1.25
/M tokens
Input $1.25/1M tokens
Blended $1.25
Verified
|
Global | Visit → |
Providers serving this model
Together AI provides blazing-fast hosted inference for open-weight models including Llama 3.1 (8B through 405B), DeepSeek V3, Qwen 2.5, and Mistral - all at prices far below closed-model APIs. Its optimized serving infrastructure and free tier for experimentation make it the go-to platform for teams that prefer open models without self-hosting overhead.
DeepInfra offers rock-bottom priced hosted inference across a wide catalog of open-weight models, often undercutting competitors by 50-80%. With per-token billing as low as $0.03/M input on small models and aggressive pricing on DeepSeek V3 and Llama 70B, it is the cost champion for high-volume, budget-sensitive inference workloads.