Seeing a 20B model competing with o3's performance is mind blowing like just a y...

coolspot · 2025-08-05T19:04:28 1754420668

10B * 2000 t/s = 20,000 GB/s memory bandwidth . Apple hardware can do 1k GB/s .

oezi · 2025-08-05T21:42:47 1754430167

That’s why MoE is needed.

int_19h · 2025-08-05T23:47:49 1754437669

It's not even a 20b model. It's 20b MoE with 3.6b active params.

But it does not actually compete with o3 performance. Not even close. As usual, the metrics are bullshit. You don't know how good the model actually is until you grill it yourself.