#Model Evaluation
3 articles with this tag
Artificial Intelligence
OpenAI Simulates AI Deployments
OpenAI's new deployment simulation technique replays past conversations with candidate models to predict real-world behavior and mitigate risks before release.
4 days ago
AI Research
Unmasking LVLM Hallucinations
New research introduces the HalluScope benchmark, revealing textual priors as the main driver of LVLM hallucinations. A new framework, HalluVL-DPO, uses preference optimization to improve visual grounding.
about 2 months ago
AI Research
LLM Fragility Under Lexical Constraints
LLMs collapse under simple lexical constraints, revealing fragility in instruction tuning and flawed evaluation methods.
2 months ago