Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seeing a 20B model competing with o3's performance is mind blowing like just a year ago, most of us would've called this impossible - not just the intelligence leap, but getting this level of capability in such a compact size.

I think that the point that makes me more excited is that we can train trillion-parameter giants and distill them down to just billions without losing the magic. Imagine coding with Claude 4 Opus-level intelligence packed into a 10B model running locally at 2000 tokens/sec - like instant AI collaboration. That would fundamentally change how we develop software.



10B * 2000 t/s = 20,000 GB/s memory bandwidth . Apple hardware can do 1k GB/s .


That’s why MoE is needed.


It's not even a 20b model. It's 20b MoE with 3.6b active params.

But it does not actually compete with o3 performance. Not even close. As usual, the metrics are bullshit. You don't know how good the model actually is until you grill it yourself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: