#AI Benchmarks

7 articles with this tag

OpenAI Unveils LifeSciBench

OpenAI's LifeSciBench is a new benchmark designed to test AI's real-world applicability in complex life science research, moving beyond basic question answering.

7 days ago

AI Research

Task Fidelity Scaling Laws: Kobie Crawford on AI Data Quality

Kobie Crawford of Snorkel discusses 'Task Fidelity Scaling Laws,' emphasizing how data quality impacts AI model performance and outlining Snorkel's approach to creating verifiable datasets.

22 days ago

Artificial Intelligence

Exa Unveils New Code Search Benchmarks

Exa.ai releases 'WebCode', a new benchmark suite for evaluating search performance in coding agents, addressing limitations in existing tools.

3 months ago

Technology

Poetiq's Small Team Sets New Together AI Benchmark Records

Poetiq's small team is setting new Together AI benchmark records by leveraging recursively self-improving meta-systems to optimize existing LLMs.

4 months ago

Artificial Intelligence

Claude Sonnet 4.6 Ups the AI Ante

Anthropic's Claude Sonnet 4.6 launches with major upgrades in coding, reasoning, and computer use, plus a 1M token context window.

4 months ago

AI Research

Step 3.5 Flash: AI's New Efficiency Standard

Step 3.5 Flash AI model revolutionizes AI efficiency with a 196B parameter foundation and 11B active parameters, offering competitive performance with lower latency.

4 months ago

Artificial Intelligence

Claude Opus 4.6: Smarter, Faster, and Longer Context

Anthropic's Claude Opus 4.6 launches with a 1M token context window, enhanced coding, and state-of-the-art benchmark performance.

5 months ago