#AI Evaluation

12 articles with this tag

Kaggle Community Benchmarks Decentralize AI Evaluation

Kaggle Community Benchmarks provide a dynamic, transparent framework for evaluating LLMs on complex, real-world tasks like code generation and tool use.

22 days ago

Funding Round

LMArena Series A lands $150M to standardize AI evaluation

30 days ago

AI Research

Salesforce Agentforce Metrics Evolve for AI Service Insight

about 2 months ago

AI Video

Terminal-Bench 2.0 and Harbor Reset the Bar for AI Agent Evaluation

3 months ago

AI Research

Agentforce Elevates AI Agent Evaluation Standards

3 months ago

Startup News

AI dubbing benchmark arrives to separate hype from reality

3 months ago

AI Video

Terminal Bench: The Quiet Ascent of a New AI Evaluation Standard

4 months ago

AI Video

Agent Evaluation: The Crucial Difference in AI System Performance

4 months ago

AI Video

Unmasking the Biases of AI Judges: A Critical Look at LLM Fairness

5 months ago

AI Video

AI Judging AI: IBM's watsonx Scales LLM Evaluation

5 months ago

AI Video

Unpacking AI's Invisible Rules: A Frog's Perspective

5 months ago

AI Video

Generative AI's Blind Spot: Evaluating Human Perception

6 months ago