#LLM Evaluation
4 articles with this tag
AI Research
LLM Drift: A Structural Blind Spot
LLMs suffer from structural temporal drift, rendering them confidently outdated. A new geometric probe detects this, outperforming standard methods.
7 days ago
AI Research
LLMs Fail Esoteric Code Tasks
Frontier LLMs show a dramatic capability gap on a new benchmark using esoteric programming languages, revealing a reliance on memorization over reasoning.
2 months ago
Artificial Intelligence
Balyasny's AI Engine
Balyasny Asset Management built a powerful AI research engine using OpenAI models, slashing analysis times and boosting investment team confidence.
2 months ago

Technology
Context-Aware Guardrails Tested
Mozilla.ai tested context-aware guardrails for LLMs in a humanitarian context, revealing crucial multilingual performance disparities and the need for robust, domain-specific safety policies.
3 months ago