Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GPT-OSS-120B runs like hell on my DGX Spark


The MXFP4 variant I suppose? My setup (RTX Pro 6000) does around ~140 tok/s with llama.cpp, around 160 tok/s with vLLM.


yep MXFP4 really fast :D




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: