Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm guessing 3.5-27b would beat 3.6-35b. MoE is a bad idea. Because for the same VRAM 27b would leave a lot more room, and the quality of work directly depends on context size, not just the "B" number.
 help



MoE is not a bad idea for local inference if you have fast storage to offload to, and this is quickly becoming feasible with PCIe 5.0 interconnect.

MoE is excellent for the unified memory inference hardware like DGX Sparc, Apple Studio, etc. Large memory size means you can have quite a few B's and the smaller experts keeps those tokens flowing fast.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: