A new CUNY and King's College study found Grok advocated suicide to a simulated delusional user, while Claude and GPT-5.2 Instant safely redirected the chat.
🚀 TL;DR: We introduce Pseudo-Simulation, a novel AV evaluation methodology that combines the efficiency of open-loop evaluation with the robustness of closed-loop evaluation. By augmenting real data ...