Qwen3.6-Plus: Towards Real World Agents

riteshkew1001 · 2026-04-04T11:00:03 1775300403

Capability benchmarks miss safety entirely.

Agent scores 95% on task completion. Ship it. But that same agent has 48% attack success rate via prompt injection in our pentest against these models. Meaning roughly half the time you feed it a malicious prompt, it does what the attacker wants.

"Ready for production" needs a safety column next to the capability column.

samusiam · 2026-04-02T12:03:54 1775131434

These OSS model makers need to stop benchmarking against old models. Showing how it performs against Opus 4.5, GLM-5 when we have Opus 4.6 and GLM-5.1 just tells me that it's not comparable to SOTA.

solarkraft · 2026-04-02T08:57:10 1775120230

No word on weights.

Is this the end of Qwen as cool local models?

yorwba · 2026-04-02T10:43:05 1775126585

It's a point update to the closed-weight Qwen3.5-Plus. Of course there are no weights. Alibaba has consistently not released weights for their best models.