#Model Evaluation

2 articles with this tag

Unmasking LVLM Hallucinations

New research introduces the HalluScope benchmark, revealing textual priors as the main driver of LVLM hallucinations. A new framework, HalluVL-DPO, uses preference optimization to improve visual grounding.

11 days ago

AI Research

LLM Fragility Under Lexical Constraints

LLMs collapse under simple lexical constraints, revealing fragility in instruction tuning and flawed evaluation methods.

20 days ago