Llama 3.3 70B inference pricing

Developer: Meta
Quality rank: #30
Elo: 1290
Context: 128K
Weights: Open
Lowest output: $0.400

Lowest output

$0.400

Median

$0.790

Highest

$0.900

5 results

Provider	Plan	Output $/1M	Input $/1M	Context	Price	Regions	Visit
DeepInfra	Llama 3.3 70B	$0.400	$0.230	128K	$0.400 /M tokens Input $0.230/1M tokens Blended $0.273 Verified Jun 20, 2026	Global	Visit →
Amazon Web Services	Bedrock Llama 3.3 70B	$0.720	$0.720	128K	$0.720 /M tokens Input $0.720/1M tokens Blended $0.720 Verified Jun 20, 2026	Global	Visit →
Groq	Llama 3.3 70B Versatile	$0.790	$0.590	128K	$0.790 /M tokens Input $0.590/1M tokens Blended $0.640 Verified Jun 20, 2026	Global	Visit →
Together AI	Llama 3.3 70B Turbo	$0.880	$0.880	128K	$0.880 /M tokens Input $0.880/1M tokens Blended $0.880 Verified Jun 20, 2026	Global	Visit →
Fireworks AI	Llama 3.3 70B	$0.900	$0.900	128K	$0.900 /M tokens Input $0.900/1M tokens Blended $0.900 Verified Jun 20, 2026	Global	Visit →

Providers serving this model

Amazon Web Services

Amazon Web Services is the world's largest cloud provider with 200+ services across compute, storage, databases, ML, and networking. Dominates in enterprise with the broadest global region footprint and the deepest service catalog, but pricing complexity and egress fees add up at scale.

Fireworks AI

Fireworks AI specializes in high-throughput open-model inference powered by its custom FireAttention kernel, delivering token generation speeds that routinely beat other hosting platforms. With HIPAA compliance and a broad catalog spanning Llama, DeepSeek, Qwen, and Mistral models, it is built for latency-sensitive production applications at scale.

Together AI

Together AI provides blazing-fast hosted inference for open-weight models including Llama 3.1 (8B through 405B), DeepSeek V3, Qwen 2.5, and Mistral - all at prices far below closed-model APIs. Its optimized serving infrastructure and free tier for experimentation make it the go-to platform for teams that prefer open models without self-hosting overhead.

Groq

Groq runs inference on custom LPU (Language Processing Unit) silicon rather than GPUs, delivering unmatched tokens-per-second throughput that can make even 70B models feel instant. With ultra-low pricing on Llama and DeepSeek models and a free tier for experimentation, it is the speed leader in the inference market.

DeepInfra

DeepInfra offers rock-bottom priced hosted inference across a wide catalog of open-weight models, often undercutting competitors by 50-80%. With per-token billing as low as $0.03/M input on small models and aggressive pricing on DeepSeek V3 and Llama 70B, it is the cost champion for high-volume, budget-sensitive inference workloads.

Frequently asked questions

How much does Llama 3.3 70B cost per million tokens?

The lowest input price we track for Llama 3.3 70B is $0.230 per million tokens. Output tokens cost more; the table shows input, output, and blended pricing for every inference provider.

What is the cheapest Llama 3.3 70B API provider?

Sort the table by output or blended price to find the cheapest Llama 3.3 70B endpoint. Prices for the same model vary widely between providers, so the cheapest provider can be several times less than the most expensive.

Which providers serve the Llama 3.3 70B API?

Every provider with a published Llama 3.3 70B endpoint appears above, with input and output token pricing, context window, and throughput.

What is Llama 3.3 70B's context window?

Llama 3.3 70B supports a 128K context window. A larger context window lets you pass more tokens (documents, code, history) in a single request.

Is Llama 3.3 70B open weight or closed source?

Llama 3.3 70B is an open-weight model, so you can self-host it on any GPU provider, which usually beats managed API pricing at scale.

What is blended LLM cost?

Blended cost weights input and output token prices by a typical 3:1 ratio so you can rank providers by one number instead of comparing two prices separately.