vram.run Models Hardware Providers Cloud State of Inference

API provider data is live · Hardware & cloud pricing curated 2026-02-23

RTX 2000 Ada Generation

NVIDIA

16GB VRAM · 256 GB/s bandwidth · 12.0 FP16 TFLOPS · 70W TDP

The RTX 2000 Ada Generation has 16GB of VRAM with 256 GB/s memory bandwidth and 12.0 TFLOPS FP16 compute. At Q4 quantization, it can comfortably run Gemma 3 4B (83 tok/s), Qwen 2.5 7B (43 tok/s), Llama 3.1 8B (41 tok/s). Models larger than ~27B parameters won't fit even at Q4. Electricity cost is approximately $8/month at 70W TDP.

What LLMs can you run?

Model	Params	Q4 Weight	Fit	Decode
Gemma 3 4B	4.0B	2 GB	comfortable	83 tok/s
Qwen 2.5 7B	7.6B	4 GB	comfortable	43 tok/s
Llama 3.1 8B	8.0B	4 GB	comfortable	41 tok/s
Mistral Small 24B	24.0B	12 GB	tight	13 tok/s
Gemma 3 27B	27.4B	14 GB	won't fit
Qwen 2.5 Coder 32B	32.5B	16 GB	won't fit
Llama 3.3 70B	70.6B	35 GB	won't fit
Qwen 2.5 72B	72.7B	36 GB	won't fit
Llama 3.1 405B	405B	202 GB	won't fit
DeepSeek R1 671B	671B	336 GB	won't fit

Similar GPUs

GPU	VRAM	BW	TFLOPS	TDP
Arc Pro B50	16GB	224 GB/s	21.3	70W
GeForce RTX 4060 Ti 16 GB	16GB	288 GB/s	22.1	165W
Radeon RX 7600 XT	16GB	288 GB/s	45.1	190W
Quadro P5000	16GB	288 GB/s	0.1	180W
A16 PCIe	16GB	200 GB/s	4.5	250W

Compare with another GPU

Select another GPU to compare specs and model performance side by side.

Install CLI [email protected] Raw data · MIT · API data: live · HW/Cloud data: curated 2026-02-23 · v0.6.0