Google Cloud Platform combines world-class data analytics, AI infrastructure (TPUs, Vertex AI), and the original managed Kubernetes. Its global fiber backbone and Preemptible VMs offer compelling price-performance for data-heavy and containerized workloads.
Llama 3.1 405B inference pricing
- Developer
- Meta
- Quality rank
- #24
- Elo
- 1305
- Context
- 128K
- Weights
- Open
- Lowest output
- $3.00
4 results
| Provider | Plan | Price | Regions | Visit | |||
|---|---|---|---|---|---|---|---|
| Fireworks AI | Llama 3.1 405B | $3.00 | $3.00 | 128K |
$3.00
/M tokens
Input $3.00/1M tokens
Blended $3.00
Verified
|
Global | Visit → |
|
|
Vertex Llama 3.1 405B | $3.00 | $3.00 | 128K |
$3.00
/M tokens
Input $3.00/1M tokens
Blended $3.00
Verified
|
Global | Visit → |
|
|
Llama 3.1 405B (routed) | $3.00 | $3.00 | 128K |
$3.00
/M tokens
Input $3.00/1M tokens
Blended $3.00
Verified
|
Global | Visit → |
|
|
Llama 3.1 405B Turbo | $3.50 | $3.50 | 128K |
$3.50
/M tokens
Input $3.50/1M tokens
Blended $3.50
Verified
|
Global | Visit → |
Providers serving this model
Fireworks AI specializes in high-throughput open-model inference powered by its custom FireAttention kernel, delivering token generation speeds that routinely beat other hosting platforms. With HIPAA compliance and a broad catalog spanning Llama, DeepSeek, Qwen, and Mistral models, it is built for latency-sensitive production applications at scale.
Together AI provides blazing-fast hosted inference for open-weight models including Llama 3.1 (8B through 405B), DeepSeek V3, Qwen 2.5, and Mistral - all at prices far below closed-model APIs. Its optimized serving infrastructure and free tier for experimentation make it the go-to platform for teams that prefer open models without self-hosting overhead.
OpenRouter acts as a unified gateway that routes API requests across dozens of inference providers - OpenAI, Anthropic, Google, Together, Groq, and more - through a single API key. It automatically selects the best available provider for each model, with transparent pricing and the ability to fallback if one endpoint goes down.