1 articles with this tag
New research reveals that common LLM safety interventions fail under realistic data mixing, leading to conditional misalignment that standard evaluations miss.