GeForce RTX 4070 Ti SUPER

NVIDIA

16GB VRAM · 672 GB/s bandwidth · 44.1 FP16 TFLOPS · 285W TDP

The GeForce RTX 4070 Ti SUPER has 16GB of VRAM with 672 GB/s memory bandwidth and 44.1 TFLOPS FP16 compute. At Q4 quantization, it can comfortably run Gemma 3 4B (218 tok/s), Qwen 2.5 7B (114 tok/s), Llama 3.1 8B (108 tok/s). Models larger than ~27B parameters won't fit even at Q4. Electricity cost is approximately $31/month at 285W TDP.

What LLMs can you run?

Model	Params	Q4 Weight	Fit	Decode
Gemma 3 4B	4.0B	2 GB	comfortable	218 tok/s
Qwen 2.5 7B	7.6B	4 GB	comfortable	114 tok/s
Llama 3.1 8B	8.0B	4 GB	comfortable	108 tok/s
Mistral Small 24B	24.0B	12 GB	comfortable	36 tok/s
Gemma 3 27B	27.4B	14 GB	won't fit
Qwen 2.5 Coder 32B	32.5B	16 GB	won't fit
Llama 3.3 70B	70.6B	35 GB	won't fit
Qwen 2.5 72B	72.7B	36 GB	won't fit
Llama 3.1 405B	405B	202 GB	won't fit
DeepSeek R1 671B	671B	336 GB	won't fit

Similar GPUs

GPU	VRAM	BW	TFLOPS	TDP
GeForce RTX 4070 Ti SUPER AD102	16GB	672 GB/s	44.1	285W
Radeon RX 9070	16GB	644 GB/s	72.2	220W
Radeon RX 9070 XT	16GB	644 GB/s	97.3	304W
GeForce RTX 4080	16GB	716 GB/s	48.7	320W
Radeon RX 7800 XT	16GB	624 GB/s	74.7	263W

Compare with another GPU

Select another GPU to compare specs and model performance side by side.

Where should you run your model?

GeForce RTX 4070 Ti SUPER

What LLMs can you run?

Similar GPUs

Compare with another GPU