I've been largely using Qwen3.5-122b at 6 bit quant locally for some c++/go/pyth...

3836293648 · 2026-04-16T21:08:27 1776373707

How much VRAM do you need for that?

jwitthuhn · 2026-04-17T11:26:05 1776425165

128GB on a mac with unified memory. The model itself takes something like 110 of that and then I have ~16 left over to hold a reasonably sized context and 2 for the OS.

I do have a dedicated machine for it though because I can't run an IDE at the same time as that model.

canpan · 2026-04-17T07:05:04 1776409504

Not OP, but I ran 122b successfully with normal RAM offloading. You dont need all that much VRAM, which is super expensive. I used 96gb ram + 16gb vram gpu. But it's not very fast in that setup, maybe 15 token per second. Still, you can give it a task and come back later and its done. (Disclaimer: I build that PC before stuff got expensive)

seemaze · 2026-04-16T22:59:40 1776380380

I squeeze Qwen3.5-122B-A10B at Q6 into 128GB. It's a great model.

mistercheese · 2026-04-17T04:13:45 1776399225

Wow what kind of hardware do you have? Mac Studio, dgx spark, strix halo? How fast is it?