#LLM Safety
3 articles with this tag
AI Research
Conditional Misalignment: A New AI Risk
New research reveals that common LLM safety interventions fail under realistic data mixing, leading to conditional misalignment that standard evaluations miss.
6 days ago
AI Research
LLMs Plan, But Do They Plan Safely?
New LLM robotic safety benchmark, DESPITE, finds scale boosts planning but not safety. Proprietary models lead, revealing a critical gap for safe robotic deployment.
14 days ago
AI Research
Enhancing LLM Trust via Instruction Hierarchy
A new dataset, IH-Challenge, dramatically improves LLM instruction hierarchy robustness, boosting safety and reducing adversarial vulnerabilities.
about 2 months ago