#Coding Agents

13 articles with this tag

Evaluating Coding Agents: Lessons from SWE-rebench

Ibragim Badertdinov from Nebius shares key lessons from evaluating coding agents using the SWE-rebench benchmark, highlighting the importance of real-world tasks, reliable verification, and cost-effectiveness.

27 days ago

Artificial Intelligence

Devin's 80% Moment: AI Coding Agents Evolve

Walden Yan and Cole Murray discuss Devin's '80% moment' in AI coding, highlighting background agents, multiple PRs, and the end of hand-held coding.

about 1 month ago

Artificial Intelligence

Hugging Face's Ben Burtenshaw on AI System Engineering

Ben Burtenshaw from Hugging Face discusses how AI coding agents can be used for AI system engineering, kernel optimization, and building multi-agent autoresearch labs.

about 1 month ago

Technology

Coding Agent Inference Benchmark Revealed

Together AI unveils a new benchmark for coding agent inference, highlighting performance under real-world load and significant cost advantages.

about 1 month ago

Technology

Marlene Mhangami: Playwright for Functionality Testing

Marlene Mhangami from Microsoft and GitHub discusses leveraging Playwright and AI agents for effective functionality testing, emphasizing clean code and behavior-driven development.

about 2 months ago

Artificial Intelligence

OpenAI's "Parameter Golf" Reveals AI's Role

OpenAI's "Parameter Golf" competition revealed how AI coding agents are transforming machine learning research, pushing innovation under tight constraints.

about 2 months ago

Technology

VIBE✓ adds friction to AI coding agents

Mozilla.ai's VIBE✓ framework introduces deliberate friction to coding agent workflows, mitigating automation bias and ensuring human oversight.

about 2 months ago

Artificial Intelligence

Embedding OpenClaw Coding Agent in Your Product

Matthias Luebken from Tavon.ai discusses embedding the OpenClaw coding agent, Pi, into products, highlighting its utility for developers and the future of AI in software systems.

about 2 months ago

Artificial Intelligence

OpenAI's Safety Playbook for Codex

OpenAI details its robust safety measures for its Codex AI coding agent, emphasizing sandboxing, network controls, and detailed telemetry for secure deployment.

about 2 months ago

Technology

Databricks Tames Coding AI Chaos

Databricks introduces Unity AI Gateway to manage AI coding agents, offering centralized governance, cost controls, and observability for enterprises.

3 months ago

Technology

Databricks Centralizes Coding AI

Databricks launches AI Gateway to centralize governance, security, and cost controls for the growing number of AI coding agents used by enterprises.

3 months ago

Artificial Intelligence

Exa Unveils New Code Search Benchmarks

Exa.ai releases 'WebCode', a new benchmark suite for evaluating search performance in coding agents, addressing limitations in existing tools.

3 months ago

Artificial Intelligence

AI Agents Leveled Up by Harness Engineering

LangChain's harness engineering approach dramatically improved an AI coding agent's performance by refining its surrounding system, not the core model.

4 months ago