#Model Evaluation
2 articles with this tag
AI Research
Unmasking LVLM Hallucinations
New research introduces the HalluScope benchmark, revealing textual priors as the main driver of LVLM hallucinations. A new framework, HalluVL-DPO, uses preference optimization to improve visual grounding.
11 days ago
AI Research
LLM Fragility Under Lexical Constraints
LLMs collapse under simple lexical constraints, revealing fragility in instruction tuning and flawed evaluation methods.
20 days ago