M3 Ultra 192GB

Apple Silicon

192GB VRAM · 800 GB/s bandwidth · 65.5 FP16 TFLOPS · 120W TDP · $6,999 street price

The M3 Ultra 192GB has 192GB of VRAM with 800 GB/s memory bandwidth and 65.5 TFLOPS FP16 compute. At Q4 quantization, it can comfortably run Gemma 3 4B (172 tok/s), Qwen 2.5 7B (90 tok/s), Llama 3.1 8B (85 tok/s). Models larger than ~326B parameters won't fit even at Q4. Electricity cost is approximately $13/month at 120W TDP.

What LLMs can you run?

Model	Params	Q4 Weight	Fit	Decode
Gemma 3 4B	4.0B	2 GB	comfortable	172 tok/s
Qwen 2.5 7B	7.6B	4 GB	comfortable	90 tok/s
Llama 3.1 8B	8.0B	4 GB	comfortable	85 tok/s
Mistral Small 24B	24.0B	12 GB	tight	28 tok/s
Gemma 3 27B	27.4B	14 GB	tight	25 tok/s
Qwen 2.5 Coder 32B	32.5B	16 GB	tight	21 tok/s
Llama 3.3 70B	70.6B	35 GB	tight	9 tok/s
Qwen 2.5 72B	72.7B	36 GB	tight	9 tok/s
Llama 3.1 405B	405B	202 GB	won't fit
DeepSeek R1 671B	671B	336 GB	won't fit

Similar GPUs

GPU	VRAM	BW	TFLOPS	TDP
M2 Ultra 192GB	192GB	800 GB/s	54.4	120W
Radeon Instinct MI300A	192GB	10300 GB/s	653.7	750W
Radeon Instinct MI300X	192GB	10300 GB/s	653.7	750W
Radeon Instinct MI308X	192GB	10300 GB/s	653.7	750W
B300	144GB	4099 GB/s	1231.8	1400W

Compare with another GPU

Select another GPU to compare specs and model performance side by side.

Where should you run your model?

M3 Ultra 192GB

What LLMs can you run?

Similar GPUs

Compare with another GPU