Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> we demonstrated running gpt-oss-120b on two RNGD chips [snip] at 5.8 ms per output token

That's 86 token/second/chip

By comparison, a H100 will do 2390 token/second/GPU

Am I comparing the wrong things somehow?

[1] https://inferencemax.semianalysis.com/





I think you are comparing latency with throughput. You can't take the inverse of latency to get throughput because concurrency is unknown. But then, RNGD result is probably with concurrency=1.

I thought they were saying it was more efficient, as in tokens per watt. I didn’t see a direct comparison on that metric but maybe I didn’t look well enough.

Probably. Companies sell on efficiency when they know they lose on performance.

If you have an efficient chip you can just have more of them and come out ahead. This isn't a CPU where single core performance is all that important.

Only if the price is right...

Eh if there's a human on the other side single stream performance is going to matter to them.

Right, but datacenters also very much operate on electrical cost so it’s not without merit.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: