Fireworks AI specializes in high-throughput open-model inference powered by its custom FireAttention kernel, delivering token generation speeds that routinely beat other hosting platforms. With HIPAA compliance and a broad catalog spanning Llama, DeepSeek, Qwen, and Mistral models, it is built for latency-sensitive production applications at scale.
Llama 3.1 8B inference pricing
- Developer
- Meta
- Quality rank
- #70
- Elo
- 1180
- Context
- 128K
- Weights
- Open
- Lowest output
- $0.050
4 results
| Provider | Plan | Price | Regions | Visit | |||
|---|---|---|---|---|---|---|---|
|
|
Llama 3.1 8B | $0.050 | $0.030 | 128K |
$0.050
/M tokens
Input $0.030/1M tokens
Blended $0.035
Verified
|
Global | Visit → |
|
|
Llama 3.1 8B Instant | $0.080 | $0.050 | 128K |
$0.080
/M tokens
Input $0.050/1M tokens
Blended $0.058
Verified
|
Global | Visit → |
|
|
Llama 3.1 8B Turbo | $0.180 | $0.180 | 128K |
$0.180
/M tokens
Input $0.180/1M tokens
Blended $0.180
Verified
|
Global | Visit → |
| Fireworks AI | Llama 3.1 8B | $0.200 | $0.200 | 128K |
$0.200
/M tokens
Input $0.200/1M tokens
Blended $0.200
Verified
|
Global | Visit → |
Providers serving this model
Together AI provides blazing-fast hosted inference for open-weight models including Llama 3.1 (8B through 405B), DeepSeek V3, Qwen 2.5, and Mistral - all at prices far below closed-model APIs. Its optimized serving infrastructure and free tier for experimentation make it the go-to platform for teams that prefer open models without self-hosting overhead.
Groq runs inference on custom LPU (Language Processing Unit) silicon rather than GPUs, delivering unmatched tokens-per-second throughput that can make even 70B models feel instant. With ultra-low pricing on Llama and DeepSeek models and a free tier for experimentation, it is the speed leader in the inference market.
DeepInfra offers rock-bottom priced hosted inference across a wide catalog of open-weight models, often undercutting competitors by 50-80%. With per-token billing as low as $0.03/M input on small models and aggressive pricing on DeepSeek V3 and Llama 70B, it is the cost champion for high-volume, budget-sensitive inference workloads.