#Benchmark

10 articles with this tag

Workflow Agents Lag Behind Demand
AI Research

Workflow Agents Lag Behind Demand

New Claw-Eval-Live benchmark reveals LLM agents struggle with dynamic workflows and verifiable execution, with top models failing over a third of tasks.

3 days ago
LLMs Plan, But Do They Plan Safely?
AI Research

LLMs Plan, But Do They Plan Safely?

New LLM robotic safety benchmark, DESPITE, finds scale boosts planning but not safety. Proprietary models lead, revealing a critical gap for safe robotic deployment.

13 days ago
Gumloop Secures $50M Series B
Artificial Intelligence

Gumloop Secures $50M Series B

Gumloop secures $50M Series B led by Benchmark to enhance its AI automation and agent platform for enterprises.

about 2 months ago
Gemini Deep Research Unlocks Advanced AI for Devs
AI Research

Gemini Deep Research Unlocks Advanced AI for Devs

5 months ago
Startup News

Clarifai Hits Fastest GPT-OSS-120B Inference and Narrows the GPU–ASIC Gap

\n Clarifai’s latest benchmark on OpenAI’s GPT-OSS-120B model points to a quiet but important shift in AI infrastructure.

5 months ago
Clarifai Hits Fastest GPT-OSS-120B Inference and Narrows the GPU–ASIC Gap
Startup News

Clarifai Hits Fastest GPT-OSS-120B Inference and Narrows the GPU–ASIC Gap

\n Clarifai’s latest benchmark on OpenAI’s GPT-OSS-120B model points to a quiet but important shift in AI infrastructure.

5 months ago
Salesforce Agentic AI Gets Real-World Performance Benchmark
AI Research

Salesforce Agentic AI Gets Real-World Performance Benchmark

6 months ago
Funding Round

Applied Compute\'s Agent Workforce Targets Niche AI with $80M

\n A stealthy startup from ex-OpenAI researchers, Applied Compute, has emerged with $80 million in funding to argue that general-purpose AI is just the beginnin...

6 months ago
Exa raises $85M to build a search engine for AIs
Startup News

Exa raises $85M to build a search engine for AIs

8 months ago
Darwinian Evolution and Silicon Valley's AI Imperative
AI Video

Darwinian Evolution and Silicon Valley's AI Imperative

10 months ago