GLM-5.2-FP8
by zai-org
753B params · text-generation · 119 likes · 217.4k downloads
GLM-5.2-FP8 is a 753B parameter model. At Q4 quantization it requires 377GB of VRAM. It requires a GPU with at least 377GB of VRAM.
Inference providers
| Provider | $/1M in | $/1M out | Throughput |
|---|---|---|---|
| Z.ai | 37 tok/s |