NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4
by nvidia
335B params · text-generation · 202 likes · 331.3k downloads
NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 is a 335B parameter model. At Q4 quantization it requires 168GB of VRAM. It requires a GPU with at least 168GB of VRAM.
Inference providers
| Provider | $/1M in | $/1M out | Throughput |
|---|---|---|---|
| Together AI | 149 tok/s | ||
| Fireworks | 138 tok/s |