Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
vkaufmann
3 months ago
|
parent
|
context
|
favorite
| on:
Show HN: I taught GPT-OSS-120B to see using Google...
GPT-OSS-120B runs like hell on my DGX Spark
embedding-shape
3 months ago
[–]
The MXFP4 variant I suppose? My setup (RTX Pro 6000) does around ~140 tok/s with llama.cpp, around 160 tok/s with vLLM.
vkaufmann
3 months ago
|
parent
[–]
yep MXFP4 really fast :D
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: