> we demonstrated running gpt-oss-120b on two RNGD chips [snip] at 5.8 ms per ou...

sanxiyn · 2026-01-15T02:59:14 1768445954

I think you are comparing latency with throughput. You can't take the inverse of latency to get throughput because concurrency is unknown. But then, RNGD result is probably with concurrency=1.

binary132 · 2026-01-15T02:28:13 1768444093

I thought they were saying it was more efficient, as in tokens per watt. I didn’t see a direct comparison on that metric but maybe I didn’t look well enough.

nl · 2026-01-15T02:50:47 1768445447

Probably. Companies sell on efficiency when they know they lose on performance.

tormeh · 2026-01-15T07:28:03 1768462083

If you have an efficient chip you can just have more of them and come out ahead. This isn't a CPU where single core performance is all that important.

fennecfoxy · 2026-01-15T10:14:35 1768472075

Only if the price is right...

avereveard · 2026-01-15T08:50:17 1768467017

Eh if there's a human on the other side single stream performance is going to matter to them.

binary132 · 2026-01-15T05:31:20 1768455080

Right, but datacenters also very much operate on electrical cost so it’s not without merit.