I've been largely using Qwen3.5-122b at 6 bit quant locally for some c++/go/python dev lately because it is quite capable as long as I can give it pretty specific asks within the codebase and it will produce code that needs minimal massaging to fit into the project.
I do have a $20 claude sub I can fall back to for anything qwen struggles with, but with 3.5 I have been very pleased with the results.
128GB on a mac with unified memory. The model itself takes something like 110 of that and then I have ~16 left over to hold a reasonably sized context and 2 for the OS.
I do have a dedicated machine for it though because I can't run an IDE at the same time as that model.
Not OP, but I ran 122b successfully with normal RAM offloading. You dont need all that much VRAM, which is super expensive. I used 96gb ram + 16gb vram gpu. But it's not very fast in that setup, maybe 15 token per second. Still, you can give it a task and come back later and its done. (Disclaimer: I build that PC before stuff got expensive)
I do have a $20 claude sub I can fall back to for anything qwen struggles with, but with 3.5 I have been very pleased with the results.