Amazon Web Services is the world's largest cloud provider with 200+ services across compute, storage, databases, ML, and networking. Dominates in enterprise with the broadest global region footprint and the deepest service catalog, but pricing complexity and egress fees add up at scale.
Llama 3.3 70B inference pricing
- Developer
- Meta
- Quality rank
- #30
- Elo
- 1290
- Context
- 128K
- Weights
- Open
- Lowest output
- $0.400
5 results
| Provider | Plan | Price | Regions | Visit | |||
|---|---|---|---|---|---|---|---|
|
|
Llama 3.3 70B | $0.400 | $0.230 | 128K |
$0.400
/M tokens
Input $0.230/1M tokens
Blended $0.273
Verified
|
Global | Visit → |
|
|
Bedrock Llama 3.3 70B | $0.720 | $0.720 | 128K |
$0.720
/M tokens
Input $0.720/1M tokens
Blended $0.720
Verified
|
Global | Visit → |
|
|
Llama 3.3 70B Versatile | $0.790 | $0.590 | 128K |
$0.790
/M tokens
Input $0.590/1M tokens
Blended $0.640
Verified
|
Global | Visit → |
|
|
Llama 3.3 70B Turbo | $0.880 | $0.880 | 128K |
$0.880
/M tokens
Input $0.880/1M tokens
Blended $0.880
Verified
|
Global | Visit → |
| Fireworks AI | Llama 3.3 70B | $0.900 | $0.900 | 128K |
$0.900
/M tokens
Input $0.900/1M tokens
Blended $0.900
Verified
|
Global | Visit → |
Providers serving this model
Fireworks AI specializes in high-throughput open-model inference powered by its custom FireAttention kernel, delivering token generation speeds that routinely beat other hosting platforms. With HIPAA compliance and a broad catalog spanning Llama, DeepSeek, Qwen, and Mistral models, it is built for latency-sensitive production applications at scale.
Together AI provides blazing-fast hosted inference for open-weight models including Llama 3.1 (8B through 405B), DeepSeek V3, Qwen 2.5, and Mistral - all at prices far below closed-model APIs. Its optimized serving infrastructure and free tier for experimentation make it the go-to platform for teams that prefer open models without self-hosting overhead.
Groq runs inference on custom LPU (Language Processing Unit) silicon rather than GPUs, delivering unmatched tokens-per-second throughput that can make even 70B models feel instant. With ultra-low pricing on Llama and DeepSeek models and a free tier for experimentation, it is the speed leader in the inference market.
DeepInfra offers rock-bottom priced hosted inference across a wide catalog of open-weight models, often undercutting competitors by 50-80%. With per-token billing as low as $0.03/M input on small models and aggressive pricing on DeepSeek V3 and Llama 70B, it is the cost champion for high-volume, budget-sensitive inference workloads.