Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Qwen3.6-Plus: Towards Real World Agents (qwen.ai)
21 points by meetpateltech 15 days ago | hide | past | favorite | 4 comments


Capability benchmarks miss safety entirely.

Agent scores 95% on task completion. Ship it. But that same agent has 48% attack success rate via prompt injection in our pentest against these models. Meaning roughly half the time you feed it a malicious prompt, it does what the attacker wants.

"Ready for production" needs a safety column next to the capability column.


These OSS model makers need to stop benchmarking against old models. Showing how it performs against Opus 4.5, GLM-5 when we have Opus 4.6 and GLM-5.1 just tells me that it's not comparable to SOTA.


No word on weights.

Is this the end of Qwen as cool local models?


It's a point update to the closed-weight Qwen3.5-Plus. Of course there are no weights. Alibaba has consistently not released weights for their best models.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: