Fireworks AI specializes in high-throughput open-model inference powered by its custom FireAttention kernel, delivering token generation speeds that routinely beat other hosting platforms. With HIPAA compliance and a broad catalog spanning Llama, DeepSeek, Qwen, and Mistral models, it is built for latency-sensitive production applications at scale.
Qwen 2.5 72B inference pricing
- Developer
- Alibaba
- Quality rank
- #32
- Elo
- 1285
- Context
- 131K
- Weights
- Open
- Lowest output
- $0.400
4 results
| Provider | Plan | Price | Regions | Visit | |||
|---|---|---|---|---|---|---|---|
|
|
Qwen 2.5 72B | $0.400 | $0.130 | 131K |
$0.400
/M tokens
Input $0.130/1M tokens
Blended $0.198
Verified
|
Global | Visit → |
| Fireworks AI | Qwen 2.5 72B | $0.900 | $0.900 | 131K |
$0.900
/M tokens
Input $0.900/1M tokens
Blended $0.900
Verified
|
Global | Visit → |
|
|
Qwen 2.5 72B (routed) | $0.900 | $0.900 | 131K |
$0.900
/M tokens
Input $0.900/1M tokens
Blended $0.900
Verified
|
Global | Visit → |
|
|
Qwen 2.5 72B Turbo | $0.900 | $0.900 | 131K |
$0.900
/M tokens
Input $0.900/1M tokens
Blended $0.900
Verified
|
Global | Visit → |
Providers serving this model
Together AI provides blazing-fast hosted inference for open-weight models including Llama 3.1 (8B through 405B), DeepSeek V3, Qwen 2.5, and Mistral - all at prices far below closed-model APIs. Its optimized serving infrastructure and free tier for experimentation make it the go-to platform for teams that prefer open models without self-hosting overhead.
OpenRouter acts as a unified gateway that routes API requests across dozens of inference providers - OpenAI, Anthropic, Google, Together, Groq, and more - through a single API key. It automatically selects the best available provider for each model, with transparent pricing and the ability to fallback if one endpoint goes down.
DeepInfra offers rock-bottom priced hosted inference across a wide catalog of open-weight models, often undercutting competitors by 50-80%. With per-token billing as low as $0.03/M input on small models and aggressive pricing on DeepSeek V3 and Llama 70B, it is the cost champion for high-volume, budget-sensitive inference workloads.