#Software Engineering
39 articles with this tag

Evaluating Coding Agents: Lessons from SWE-rebench
Ibragim Badertdinov from Nebius shares key lessons from evaluating coding agents using the SWE-rebench benchmark, highlighting the importance of real-world tasks, reliable verification, and cost-effectiveness.

Nvidia's Huang: AI Job Fears Are 'Nonsense'
Nvidia CEO Jensen Huang dismisses AI job loss fears as 'nonsense,' arguing AI actually drives demand for more software engineers.

Can LLMs Generate Enterprise-Quality Code?
Prasenjit Sarkar of Sonar discusses whether LLMs can generate enterprise-quality code, highlighting challenges and Sonar's AC/DC framework for agentic development.

Sakana AI: Finance Agents Take Shape
Sakana AI is deploying AI agents to revolutionize financial operations, with engineers focusing on practical integration and enterprise-grade reliability.

Google DeepMind Explains AI Agent Building Struggles
Philipp Schmid from Google DeepMind explains the core challenges senior engineers face when building AI agents, contrasting traditional engineering with agentic development.
Braintrust Cedes Coding to Codex
Braintrust is dramatically speeding up its development cycle by integrating OpenAI's Codex, turning customer requests into code previews in minutes.

Cursor's RL Infrastructure for Training Composer
Cursor details its distributed infrastructure for training its AI coding model, Composer, using reinforcement learning on 'Fireworks'.

DeepMind's Scale: How Agents Run at Google
Google DeepMind's KP Sawhney and Ian Ballantyne reveal how they run AI agents at scale, discussing the architecture, tools, and challenges involved in managing complex automated tasks.
LinkedIn Engineer Builds Community
LinkedIn engineer Rishika builds community through mentorship and online content, extending her impact beyond her core role.
AI Agents Build Better AI
LinkedIn Engineering details how AI agents are revolutionizing model development through automated, iterative refinement loops.
LinkedIn Unifies Hiring Data
LinkedIn's new unified integrations platform standardizes hiring data, slashing onboarding times and powering AI recruitment tools.

Lawrence Jones on Fighting AI with AI
Lawrence Jones of incident.io discusses how AI can be used to debug and manage complex AI systems, highlighting the importance of structured data and automated analysis pipelines.
Viverra: Verifying AI-Generated Code
Viverra tackles the trust deficit in AI-generated code by automatically producing formally verified annotations, enhancing developer comprehension and productivity.

Mike Spitz on Post-Engineer Engineering Org
Mike Spitz discusses how AI agents are transforming engineering by boosting productivity and changing workflows, advocating for a phased approach to adoption.
Sea Bets Big on AI Coding with OpenAI Codex
Sea Limited is deploying OpenAI Codex across its developer organization, aiming to transform software development in Southeast Asia through AI-native workflows and agentic collaboration.
LLMs Tame Software Requirements
VERIMED leverages LLMs and SMT solvers to formally audit natural-language software requirements, turning ambiguity into testable signals and boosting verified accuracy.
Beyond Model Capability: The Harness for SE Agents
Autonomous software engineering agents' reliability hinges on a novel 'AI Harness' system, not just model capability, enabling verifiably correct changes.

Building an AI Chess Coach: Take Take Take
Anant Dole and Asbjorn Steinskog discuss building an AI chess coach, the limitations of LLMs in chess, and their eval framework.

Sakana AI's Defense Push
Sakana AI is building AI for defense, with engineers developing critical command and control systems for national security.

Matt Pocock: Engineering Fundamentals Still Crucial in AI
Matt Pocock, author of 'AI Hero', emphasizes that engineering fundamentals are more crucial than ever for building robust AI systems.
Coding Agents' Stealth Vulnerabilities Unmasked
New benchmark MOSAIC-Bench reveals production coding agents can be tricked into shipping exploitable code via sequenced, innocuous tasks, bypassing current safety reviews.

Cursor's AI Agents Get Worktree Boost
David Gomes of Cursor detailed the integration of Git worktrees into AI agents, enabling isolated task execution and reducing code complexity.

OpenAI's Ryan Lopopolo on Harnessing AI for Software Engineering
OpenAI's Ryan Lopopolo discusses how AI agents are reshaping software engineering, emphasizing the shift towards human oversight and strategic prompt design.

Anthropic's Claude Opus 4.7 Arrives, Sharper Than Ever
Anthropic unveils Claude Opus 4.7, boosting AI's coding prowess, multimodal input, and safety features for enterprise use.

Cursor's Agents Get Visual
Cursor agents now generate interactive visualizations, enhancing data exploration and collaboration beyond text-based reports.

IBM's Jeff Crume on AI Tech Debt
Jeff Crume of IBM explains how AI systems can accrue technical debt, the risks involved, and how to mitigate it through strategic planning and discipline.

Copilot's Agentic Leap
GitHub Copilot's evolution into agent-driven development automates complex analysis, freeing developers for creative tasks through effective AI collaboration.
Externalizing Agent Harnesses with Language
Researchers introduce Natural-Language Agent Harnesses (NLAHs) and an Intelligent Harness Runtime (IHR) to externalize agent control logic, enabling greater transferability and scientific study.

Devin AI: The Future of Software Engineering?
Scott Wu and Russell Kaplan of Cognition AI discuss Devin, their AI software engineer, and its potential to revolutionize the tech industry.
Bridging the AI Code Quality Gap
A new benchmark, c-CRAB, reveals current AI code review agents only solve ~40% of tasks, highlighting gaps and potential for human-AI collaboration in code quality assurance.

Anthropic's Claude Masters Autonomous Coding
Anthropic details a new multi-agent system that enables Claude to autonomously generate complex full-stack applications, moving beyond previous limitations in AI coding.

MiniMax M2.7 Hints at AI Self-Evolution
MiniMax's M2.7 model showcases early signs of AI self-evolution, excelling in software engineering and professional tasks while driving organizational AI transformation.

GitHub Grapples With Recent Outages
GitHub details recent availability issues, citing rapid growth and architectural flaws, and outlines plans for enhanced resilience.

Potpie AI Secures $2.2M for Engineering Agents
Potpie AI secured $2.2 million in pre-seed funding to integrate AI agents into complex engineering systems by unifying context across codebases.

AI Product Development Shifts to Execution
AI product development has shifted from experimentation to execution, focusing on application-layer innovation and economic viability.

Beyond Snippets: The Evolving Landscape of AI Code Evaluation

Beyond Vibe Coding: The Architect's Blueprint for AI-Driven Software

The AI Engineer: A Full-Stack Architect of Tomorrow
