Llama 3.1 405B inference pricing

Developer: Meta
Quality rank: #24
Elo: 1305
Context: 128K
Weights: Open
Lowest output: $3.00

Lowest output

$3.00

Median

$3.00

Highest

$3.50

4 results

Provider	Plan	Output $/1M	Input $/1M	Context	Price	Regions	Visit
Fireworks AI	Llama 3.1 405B	$3.00	$3.00	128K	$3.00 /M tokens Input $3.00/1M tokens Blended $3.00 Verified Jun 20, 2026	Global	Visit →
Google Cloud	Vertex Llama 3.1 405B	$3.00	$3.00	128K	$3.00 /M tokens Input $3.00/1M tokens Blended $3.00 Verified Jun 20, 2026	Global	Visit →
OpenRouter	Llama 3.1 405B (routed)	$3.00	$3.00	128K	$3.00 /M tokens Input $3.00/1M tokens Blended $3.00 Verified Jun 20, 2026	Global	Visit →
Together AI	Llama 3.1 405B Turbo	$3.50	$3.50	128K	$3.50 /M tokens Input $3.50/1M tokens Blended $3.50 Verified Jun 20, 2026	Global	Visit →

Providers serving this model

Google Cloud

Google Cloud Platform combines world-class data analytics, AI infrastructure (TPUs, Vertex AI), and the original managed Kubernetes. Its global fiber backbone and Preemptible VMs offer compelling price-performance for data-heavy and containerized workloads.

Fireworks AI

Fireworks AI specializes in high-throughput open-model inference powered by its custom FireAttention kernel, delivering token generation speeds that routinely beat other hosting platforms. With HIPAA compliance and a broad catalog spanning Llama, DeepSeek, Qwen, and Mistral models, it is built for latency-sensitive production applications at scale.

Together AI

Together AI provides blazing-fast hosted inference for open-weight models including Llama 3.1 (8B through 405B), DeepSeek V3, Qwen 2.5, and Mistral - all at prices far below closed-model APIs. Its optimized serving infrastructure and free tier for experimentation make it the go-to platform for teams that prefer open models without self-hosting overhead.

OpenRouter

OpenRouter acts as a unified gateway that routes API requests across dozens of inference providers - OpenAI, Anthropic, Google, Together, Groq, and more - through a single API key. It automatically selects the best available provider for each model, with transparent pricing and the ability to fallback if one endpoint goes down.

Frequently asked questions

How much does Llama 3.1 405B cost per million tokens?

The lowest input price we track for Llama 3.1 405B is $3.00 per million tokens. Output tokens cost more; the table shows input, output, and blended pricing for every inference provider.

What is the cheapest Llama 3.1 405B API provider?

Sort the table by output or blended price to find the cheapest Llama 3.1 405B endpoint. Prices for the same model vary widely between providers, so the cheapest provider can be several times less than the most expensive.

Which providers serve the Llama 3.1 405B API?

Every provider with a published Llama 3.1 405B endpoint appears above, with input and output token pricing, context window, and throughput.

What is Llama 3.1 405B's context window?

Llama 3.1 405B supports a 128K context window. A larger context window lets you pass more tokens (documents, code, history) in a single request.

Is Llama 3.1 405B open weight or closed source?

Llama 3.1 405B is an open-weight model, so you can self-host it on any GPU provider, which usually beats managed API pricing at scale.

What is blended LLM cost?

Blended cost weights input and output token prices by a typical 3:1 ratio so you can rank providers by one number instead of comparing two prices separately.