#Evaluation

10 articles with this tag

Dat Ngo on Arize: LLM Observability Platform
Artificial Intelligence

Dat Ngo on Arize: LLM Observability Platform

Dat Ngo from Arize AI explains their LLM observability, evaluation, and experimentation platform, crucial for building robust GenAI applications.

13 days ago
LLM Evaluators: Beyond Naive Judgments
Artificial Intelligence

LLM Evaluators: Beyond Naive Judgments

Mahmoud Malaeb of Argenta discusses the limitations of naive LLM judges and introduces GEPA, an optimization framework for building more accurate LLM evaluators using a data flywheel approach.

2 months ago
The Hidden Cost of Autonomy: AI Agent Evaluation
AI Research

The Hidden Cost of Autonomy: AI Agent Evaluation

5 months ago
OpenAI Says Business AI Evaluation Is the Key to ROI
AI Research

OpenAI Says Business AI Evaluation Is the Key to ROI

7 months ago
Evals Reimagined: Braintrust's Engineering Approach to AI Development
AI Video

Evals Reimagined: Braintrust's Engineering Approach to AI Development

10 months ago
Building Reliable AI: The Imperative of Application-Layer Evals
AI Video

Building Reliable AI: The Imperative of Application-Layer Evals

11 months ago
DeepMind Proposes Radical Shift in AI Intelligence Benchmarking
Startup News

DeepMind Proposes Radical Shift in AI Intelligence Benchmarking

Google DeepMind has unveiled a significant new initiative aimed at fundamentally rethinking how artificial intelligence capabilities are measured. In an announcement on its blog, the leading AI research institution detailed a comprehensive framework designed to...

11 months ago
The Unseen Challenge of Reliable AI
AI Video

The Unseen Challenge of Reliable AI

11 months ago
The State of AI Engineering: Insights from Amplify's 2025 Report with Barr Yaron
AI Video

The State of AI Engineering: Insights from Amplify's 2025 Report with Barr Yaron

11 months ago
AI Video

The State of AI Engineering: Insights from Amplify\'s 2025 Report with Barr Yaron

\"Evaluation/evals\" stands as the single most painful aspect of AI Engineering today, a stark revelation from Amplify Partners\' recent 2025 AI Engineering...

11 months ago