Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Capability benchmarks miss safety entirely.

Agent scores 95% on task completion. Ship it. But that same agent has 48% attack success rate via prompt injection in our pentest against these models. Meaning roughly half the time you feed it a malicious prompt, it does what the attacker wants.

"Ready for production" needs a safety column next to the capability column.

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: