1 articles with this tag
A new benchmark, LABSHIELD, reveals a 32.0% safety performance gap in MLLMs for autonomous labs, highlighting the urgent need for safety-focused AI reasoning.