vram.run Models Hardware Providers Cloud State of Inference

API provider data is live · Hardware & cloud pricing curated 2026-02-23

M2 Ultra 128GB

Apple Silicon

128GB VRAM · 800 GB/s bandwidth · 54.4 FP16 TFLOPS · 120W TDP · $5,499 street price

The M2 Ultra 128GB has 128GB of VRAM with 800 GB/s memory bandwidth and 54.4 TFLOPS FP16 compute. At Q4 quantization, it can comfortably run Gemma 3 4B (168 tok/s), Qwen 2.5 7B (88 tok/s), Llama 3.1 8B (83 tok/s). Models larger than ~218B parameters won't fit even at Q4. Electricity cost is approximately $13/month at 120W TDP.

What LLMs can you run?

Model	Params	Q4 Weight	Fit	Decode
Gemma 3 4B	4.0B	2 GB	comfortable	168 tok/s
Qwen 2.5 7B	7.6B	4 GB	comfortable	88 tok/s
Llama 3.1 8B	8.0B	4 GB	comfortable	83 tok/s
Mistral Small 24B	24.0B	12 GB	tight	28 tok/s
Gemma 3 27B	27.4B	14 GB	tight	24 tok/s
Qwen 2.5 Coder 32B	32.5B	16 GB	tight	20 tok/s
Llama 3.3 70B	70.6B	35 GB	tight	9 tok/s
Qwen 2.5 72B	72.7B	36 GB	tight	9 tok/s
Llama 3.1 405B	405B	202 GB	won't fit
DeepSeek R1 671B	671B	336 GB	won't fit

Similar GPUs

GPU	VRAM	BW	TFLOPS	TDP
M1 Ultra 128GB	128GB	800 GB/s	42.5	120W
M3 Ultra 128GB	128GB	800 GB/s	65.5	120W
M4 Max 128GB	128GB	546 GB/s	36.9	75W
M3 Max 128GB	128GB	400 GB/s	32.8	75W
GB10	128GB	273 GB/s	29.7	140W

Compare with another GPU

Select another GPU to compare specs and model performance side by side.

Install CLI [email protected] Raw data · MIT · API data: live · HW/Cloud data: curated 2026-02-23 · v0.6.0