Vol. 1 · No. 118
Tuesday Edition
Price · Free

The Unvarnished AI Gazette

AI news distilled in its purest form
122Unique stories
← Back to the newspaper

Studie: Forscher untersuchen KI-Reaktionen auf depressive Persona

Researchers tested how LLMs respond to users describing depressive symptoms; Grok and Gemini performed riskier than GPT and Claude.

A study by German researchers benchmarked safety responses across models using a persona framework; response quality and risk assessment varied sharply by model and vendor. Researchers submitted identical prompts describing depression to multiple models and scored responses on harm reduction, appropriate escalation, and factual accuracy.

Oliver's take: A benchmark is useful only if it predicts real-world harm. Testing depression responses in a lab is cleaner than measuring safety in the wild, but it measures something different. Grok's higher risk score suggests less RLHF on sensitive topics. That is meaningful.
Original source
← Back to the newspaper
— The Unvarnished AI Gazette · Tuesday, April 28, 2026 · 122 stories from 168 sources · ← Back to the newspaper