Agent scores 95% on task completion. Ship it. But that same agent has 48% attack success rate via prompt injection in our pentest against these models. Meaning roughly half the time you feed it a malicious prompt, it does what the attacker wants.
"Ready for production" needs a safety column next to the capability column.
Agent scores 95% on task completion. Ship it. But that same agent has 48% attack success rate via prompt injection in our pentest against these models. Meaning roughly half the time you feed it a malicious prompt, it does what the attacker wants.
"Ready for production" needs a safety column next to the capability column.