Qwen 2.5 72B API pricing comparison | DeployCue Skip to content
DeployCue

Qwen 2.5 72B inference pricing

Developer
Alibaba
Quality rank
#32
Elo
1285
Context
131K
Weights
Open
Lowest output
$0.400
Lowest output
$0.400
Median
$0.900
Highest
$0.900

4 results

Provider Plan Price Regions Visit
DeepInfra Qwen 2.5 72B $0.400 $0.130 131K $0.400 /M tokens
Input $0.130/1M tokens
Blended $0.198
Verified
Global Visit →
Fireworks AI Qwen 2.5 72B $0.900 $0.900 131K $0.900 /M tokens
Input $0.900/1M tokens
Blended $0.900
Verified
Global Visit →
OpenRouter Qwen 2.5 72B (routed) $0.900 $0.900 131K $0.900 /M tokens
Input $0.900/1M tokens
Blended $0.900
Verified
Global Visit →
Together AI Qwen 2.5 72B Turbo $0.900 $0.900 131K $0.900 /M tokens
Input $0.900/1M tokens
Blended $0.900
Verified
Global Visit →

Providers serving this model

Fireworks AI specializes in high-throughput open-model inference powered by its custom FireAttention kernel, delivering token generation speeds that routinely beat other hosting platforms. With HIPAA compliance and a broad catalog spanning Llama, DeepSeek, Qwen, and Mistral models, it is built for latency-sensitive production applications at scale.

Together AI provides blazing-fast hosted inference for open-weight models including Llama 3.1 (8B through 405B), DeepSeek V3, Qwen 2.5, and Mistral - all at prices far below closed-model APIs. Its optimized serving infrastructure and free tier for experimentation make it the go-to platform for teams that prefer open models without self-hosting overhead.

OpenRouter acts as a unified gateway that routes API requests across dozens of inference providers - OpenAI, Anthropic, Google, Together, Groq, and more - through a single API key. It automatically selects the best available provider for each model, with transparent pricing and the ability to fallback if one endpoint goes down.

DeepInfra logo 3

DeepInfra offers rock-bottom priced hosted inference across a wide catalog of open-weight models, often undercutting competitors by 50-80%. With per-token billing as low as $0.03/M input on small models and aggressive pricing on DeepSeek V3 and Llama 70B, it is the cost champion for high-volume, budget-sensitive inference workloads.

Frequently asked questions

How much does Qwen 2.5 72B cost per million tokens?
The lowest input price we track for Qwen 2.5 72B is $0.130 per million tokens. Output tokens cost more; the table shows input, output, and blended pricing for every inference provider.
What is the cheapest Qwen 2.5 72B API provider?
Sort the table by output or blended price to find the cheapest Qwen 2.5 72B endpoint. Prices for the same model vary widely between providers, so the cheapest provider can be several times less than the most expensive.
Which providers serve the Qwen 2.5 72B API?
Every provider with a published Qwen 2.5 72B endpoint appears above, with input and output token pricing, context window, and throughput.
What is Qwen 2.5 72B's context window?
Qwen 2.5 72B supports a 131K context window. A larger context window lets you pass more tokens (documents, code, history) in a single request.
Is Qwen 2.5 72B open weight or closed source?
Qwen 2.5 72B is an open-weight model, so you can self-host it on any GPU provider, which usually beats managed API pricing at scale.
What is blended LLM cost?
Blended cost weights input and output token prices by a typical 3:1 ratio so you can rank providers by one number instead of comparing two prices separately.