Qwen2.5-1.5B-Instruct

by Qwen

1.5B params · text-generation · 630 likes · 7.9M downloads

Qwen2.5-1.5B-Instruct is a 1.5B parameter model. At Q4 quantization it requires 1GB of VRAM. It runs comfortably on GeForce RTX 4090 (850 tok/s), GeForce RTX 5090 (1275 tok/s), M4 Max 128GB (311 tok/s).

Inference providers

Provider	$/1M in	$/1M out	Throughput
Featherless

GPU compatibility

GPU	VRAM	Q4 Decode	Verdict
GeForce RTX 4090	24GB	850 tok/s	comfortable
GeForce RTX 5090	32GB	1275 tok/s	comfortable
M4 Max 128GB	128GB	311 tok/s	comfortable
M4 Pro 48GB	48GB	155 tok/s	comfortable
M4 Pro 24GB	24GB	155 tok/s	comfortable
A100 PCIe 80 GB	80GB	1558 tok/s	comfortable
H100 SXM5 80 GB	80GB	3134 tok/s	comfortable
GeForce RTX 3090	24GB	752 tok/s	comfortable
Radeon RX 7900 XTX	24GB	621 tok/s	comfortable
GeForce RTX 4080	16GB	603 tok/s	comfortable

Where should you run your model?

Qwen2.5-1.5B-Instruct

Inference providers

GPU compatibility