vram.run Models Hardware Providers Cloud State of Inference

API provider data is live · Hardware & cloud pricing curated 2026-02-23

M4 Max 128GB

Apple Silicon

128GB VRAM · 546 GB/s bandwidth · 36.9 FP16 TFLOPS · 75W TDP · $4,999 street price

The M4 Max 128GB has 128GB of VRAM with 546 GB/s memory bandwidth and 36.9 TFLOPS FP16 compute. At Q4 quantization, it can comfortably run Gemma 3 4B (120 tok/s), Qwen 2.5 7B (63 tok/s), Llama 3.1 8B (59 tok/s). Models larger than ~218B parameters won't fit even at Q4. Electricity cost is approximately $8/month at 75W TDP.

What LLMs can you run?

Model	Params	Q4 Weight	Fit	Decode
Gemma 3 4B	4.0B	2 GB	comfortable	120 tok/s
Qwen 2.5 7B	7.6B	4 GB	comfortable	63 tok/s
Llama 3.1 8B	8.0B	4 GB	comfortable	59 tok/s
Mistral Small 24B	24.0B	12 GB	tight	20 tok/s
Gemma 3 27B	27.4B	14 GB	tight	17 tok/s
Qwen 2.5 Coder 32B	32.5B	16 GB	tight	14 tok/s
Llama 3.3 70B	70.6B	35 GB	tight	6 tok/s
Qwen 2.5 72B	72.7B	36 GB	tight	6 tok/s
Llama 3.1 405B	405B	202 GB	won't fit
DeepSeek R1 671B	671B	336 GB	won't fit

Similar GPUs

GPU	VRAM	BW	TFLOPS	TDP
M3 Max 128GB	128GB	400 GB/s	32.8	75W
M1 Ultra 128GB	128GB	800 GB/s	42.5	120W
M2 Ultra 128GB	128GB	800 GB/s	54.4	120W
M3 Ultra 128GB	128GB	800 GB/s	65.5	120W
GB10	128GB	273 GB/s	29.7	140W

Compare with another GPU

Select another GPU to compare specs and model performance side by side.

Install CLI [email protected] Raw data · MIT · API data: live · HW/Cloud data: curated 2026-02-23 · v0.6.0