:privTri: Volpeon :areonNSmol: (@volpeon)

Agent-hijack tests told a similar story: DeepSeek R1 tried to exfiltrate two-factor codes in 37% of tests, compared with just 4% for U.S. models.

It's just insane to me that it's made to sound like some AI agents exfiltrating data only 4% of the time is more acceptable than 37%. Something acting on its own must not exfiltrate data at all. This is crazy