1 articles with this tag
Researchers introduce activation steering, a novel LLM alignment runtime defense, with projection-aware methods showing significant improvements in safety and general capabilities.