vram.run Models Hardware Providers Cloud State of Inference

API provider data is live · Hardware & cloud pricing curated 2026-02-23

M3 Max 128GB

Apple Silicon

128GB VRAM · 400 GB/s bandwidth · 32.8 FP16 TFLOPS · 75W TDP · $4,499 street price

The M3 Max 128GB has 128GB of VRAM with 400 GB/s memory bandwidth and 32.8 TFLOPS FP16 compute. At Q4 quantization, it can comfortably run Gemma 3 4B (86 tok/s), Qwen 2.5 7B (45 tok/s), Llama 3.1 8B (42 tok/s). Models larger than ~218B parameters won't fit even at Q4. Electricity cost is approximately $8/month at 75W TDP.

What LLMs can you run?

Model	Params	Q4 Weight	Fit	Decode
Gemma 3 4B	4.0B	2 GB	comfortable	86 tok/s
Qwen 2.5 7B	7.6B	4 GB	comfortable	45 tok/s
Llama 3.1 8B	8.0B	4 GB	comfortable	42 tok/s
Mistral Small 24B	24.0B	12 GB	tight	14 tok/s
Gemma 3 27B	27.4B	14 GB	tight	12 tok/s
Qwen 2.5 Coder 32B	32.5B	16 GB	tight	10 tok/s
Llama 3.3 70B	70.6B	35 GB	tight	4 tok/s
Qwen 2.5 72B	72.7B	36 GB	tight	4 tok/s
Llama 3.1 405B	405B	202 GB	won't fit
DeepSeek R1 671B	671B	336 GB	won't fit

Similar GPUs

GPU	VRAM	BW	TFLOPS	TDP
GB10	128GB	273 GB/s	29.7	140W
Jetson T5000	128GB	273 GB/s	51.7	40W
M4 Max 128GB	128GB	546 GB/s	36.9	75W
M1 Ultra 128GB	128GB	800 GB/s	42.5	120W
M2 Ultra 128GB	128GB	800 GB/s	54.4	120W

Compare with another GPU

Select another GPU to compare specs and model performance side by side.

Install CLI [email protected] Raw data · MIT · API data: live · HW/Cloud data: curated 2026-02-23 · v0.6.0