#Coding Agents
13 articles with this tag

Evaluating Coding Agents: Lessons from SWE-rebench
Ibragim Badertdinov from Nebius shares key lessons from evaluating coding agents using the SWE-rebench benchmark, highlighting the importance of real-world tasks, reliable verification, and cost-effectiveness.

Devin's 80% Moment: AI Coding Agents Evolve
Walden Yan and Cole Murray discuss Devin's '80% moment' in AI coding, highlighting background agents, multiple PRs, and the end of hand-held coding.

Hugging Face's Ben Burtenshaw on AI System Engineering
Ben Burtenshaw from Hugging Face discusses how AI coding agents can be used for AI system engineering, kernel optimization, and building multi-agent autoresearch labs.

Coding Agent Inference Benchmark Revealed
Together AI unveils a new benchmark for coding agent inference, highlighting performance under real-world load and significant cost advantages.

Marlene Mhangami: Playwright for Functionality Testing
Marlene Mhangami from Microsoft and GitHub discusses leveraging Playwright and AI agents for effective functionality testing, emphasizing clean code and behavior-driven development.
OpenAI's "Parameter Golf" Reveals AI's Role
OpenAI's "Parameter Golf" competition revealed how AI coding agents are transforming machine learning research, pushing innovation under tight constraints.

VIBE✓ adds friction to AI coding agents
Mozilla.ai's VIBE✓ framework introduces deliberate friction to coding agent workflows, mitigating automation bias and ensuring human oversight.

Embedding OpenClaw Coding Agent in Your Product
Matthias Luebken from Tavon.ai discusses embedding the OpenClaw coding agent, Pi, into products, highlighting its utility for developers and the future of AI in software systems.
OpenAI's Safety Playbook for Codex
OpenAI details its robust safety measures for its Codex AI coding agent, emphasizing sandboxing, network controls, and detailed telemetry for secure deployment.
Databricks Tames Coding AI Chaos
Databricks introduces Unity AI Gateway to manage AI coding agents, offering centralized governance, cost controls, and observability for enterprises.
Databricks Centralizes Coding AI
Databricks launches AI Gateway to centralize governance, security, and cost controls for the growing number of AI coding agents used by enterprises.

Exa Unveils New Code Search Benchmarks
Exa.ai releases 'WebCode', a new benchmark suite for evaluating search performance in coding agents, addressing limitations in existing tools.

AI Agents Leveled Up by Harness Engineering
LangChain's harness engineering approach dramatically improved an AI coding agent's performance by refining its surrounding system, not the core model.