Llama-4-Maverick-17B-128E-Instruct-FP8
by meta-llama
402B params · image-text-to-text · 158 likes · 63.9k downloads
Llama-4-Maverick-17B-128E-Instruct-FP8 is a 402B parameter model. At Q4 quantization it requires 201GB of VRAM. It requires a GPU with at least 201GB of VRAM.
Inference providers
| Provider | $/1M in | $/1M out | Throughput |
|---|---|---|---|
| Novita | 89 tok/s | ||
| Together AI | 48 tok/s |