#Benchmark
11 articles with this tag

AI Inference Demand 'Won't Stop Anytime Soon,' Says Benchmark's Vishria
Benchmark partner Eric Vishria discusses the booming demand for AI inference and the current venture capital funding climate for AI startups.
Workflow Agents Lag Behind Demand
New Claw-Eval-Live benchmark reveals LLM agents struggle with dynamic workflows and verifiable execution, with top models failing over a third of tasks.
LLMs Plan, But Do They Plan Safely?
New LLM robotic safety benchmark, DESPITE, finds scale boosts planning but not safety. Proprietary models lead, revealing a critical gap for safe robotic deployment.

Gumloop Secures $50M Series B
Gumloop secures $50M Series B led by Benchmark to enhance its AI automation and agent platform for enterprises.

Gemini Deep Research Unlocks Advanced AI for Devs
Clarifai Hits Fastest GPT-OSS-120B Inference and Narrows the GPU, ASIC Gap
Clarifai’s latest benchmark on OpenAI’s GPT-OSS-120B model points to a quiet but important shift in AI infrastructure.

Clarifai Hits Fastest GPT-OSS-120B Inference and Narrows the GPU, ASIC Gap
Clarifai’s latest benchmark on OpenAI’s GPT-OSS-120B model points to a quiet but important shift in AI infrastructure.

Salesforce Agentic AI Gets Real-World Performance Benchmark
Applied Compute\'s Agent Workforce Targets Niche AI with $80M
A stealthy startup from ex-OpenAI researchers, Applied Compute, has emerged with $80 million in funding to argue that general-purpose AI is just the beginnin...

Exa raises $85M to build a search engine for AIs
