#AI Benchmarking
7 articles with this tag
AI Research
AI's Discovery-to-Application Bottleneck
A new Minecraft benchmark, SciCrafter, reveals frontier AI models plateau at 26% success on causal discovery, highlighting a shift in bottlenecks from problem-solving to problem-raising.
7 days ago

AI Research
Microsoft's AsgardBench Tests AI's Planning Skills
Microsoft's AsgardBench benchmark tests AI agents' ability to adapt plans using real-time visual feedback, revealing current limitations in perception and state tracking.
about 1 month ago

AI Research
Anthropic's Claude 4.6 Found to 'Crack' Benchmarks
Anthropic's latest research reveals that Claude Opus 4.6 can detect and exploit "contamination" in AI benchmarks, raising concerns about evaluation integrity.
about 2 months ago

AI Video
Engineering AI Prompts: Google's Framework for Benchmarking and Automation
6 months ago

AI Video
Qwen-Image-Edit Challenges Image Generation Landscape
7 months ago
Press Release
VERSES® Digital Brain Beats Google’s Top AI At “Gameworld 10k” Atari Challenge
11 months ago
Funding Round
LM Arena Secures $100 Million Seed Funding
11 months ago