#AI Evaluation
12 articles with this tag

AI Research
Kaggle Community Benchmarks Decentralize AI Evaluation
Kaggle Community Benchmarks provide a dynamic, transparent framework for evaluating LLMs on complex, real-world tasks like code generation and tool use.
22 days ago
Funding Round
LMArena Series A lands $150M to standardize AI evaluation
30 days ago
AI Research
Salesforce Agentforce Metrics Evolve for AI Service Insight
about 2 months ago

AI Video
Terminal-Bench 2.0 and Harbor Reset the Bar for AI Agent Evaluation
3 months ago

AI Research
Agentforce Elevates AI Agent Evaluation Standards
3 months ago

Startup News
AI dubbing benchmark arrives to separate hype from reality
3 months ago

AI Video
Terminal Bench: The Quiet Ascent of a New AI Evaluation Standard
4 months ago

AI Video
Agent Evaluation: The Crucial Difference in AI System Performance
4 months ago

AI Video
Unmasking the Biases of AI Judges: A Critical Look at LLM Fairness
5 months ago

AI Video
AI Judging AI: IBM's watsonx Scales LLM Evaluation
5 months ago

AI Video
Unpacking AI's Invisible Rules: A Frog's Perspective
5 months ago

AI Video
Generative AI's Blind Spot: Evaluating Human Perception
6 months ago