H100 SXM5 64 GB

NVIDIA

64GB VRAM · 2020 GB/s bandwidth · 267.6 FP16 TFLOPS · 700W TDP

The H100 SXM5 64 GB has 64GB of VRAM with 2020 GB/s memory bandwidth and 267.6 TFLOPS FP16 compute. At Q4 quantization, it can comfortably run Gemma 3 4B (727 tok/s), Qwen 2.5 7B (382 tok/s), Llama 3.1 8B (362 tok/s). Models larger than ~109B parameters won't fit even at Q4. Electricity cost is approximately $76/month at 700W TDP.

What LLMs can you run?

Model	Params	Q4 Weight	Fit	Decode
Gemma 3 4B	4.0B	2 GB	comfortable	727 tok/s
Qwen 2.5 7B	7.6B	4 GB	comfortable	382 tok/s
Llama 3.1 8B	8.0B	4 GB	comfortable	362 tok/s
Mistral Small 24B	24.0B	12 GB	comfortable	121 tok/s
Gemma 3 27B	27.4B	14 GB	comfortable	106 tok/s
Qwen 2.5 Coder 32B	32.5B	16 GB	comfortable	89 tok/s
Llama 3.3 70B	70.6B	35 GB	comfortable	41 tok/s
Qwen 2.5 72B	72.7B	36 GB	comfortable	40 tok/s
Llama 3.1 405B	405B	202 GB	won't fit
DeepSeek R1 671B	671B	336 GB	won't fit

Similar GPUs

GPU	VRAM	BW	TFLOPS	TDP
Radeon Instinct MI200	64GB	1640 GB/s	181.0	300W
Radeon Instinct MI210	64GB	1640 GB/s	181.0	300W
M1 Ultra 64GB	64GB	800 GB/s	42.5	120W
M2 Ultra 64GB	64GB	800 GB/s	54.4	120W
M4 Max 64GB	64GB	546 GB/s	36.9	75W

Compare with another GPU

Select another GPU to compare specs and model performance side by side.

Where should you run your model?

H100 SXM5 64 GB

What LLMs can you run?

Similar GPUs

Compare with another GPU