#Software Engineering

39 articles with this tag

Evaluating Coding Agents: Lessons from SWE-rebench
AI Research

Evaluating Coding Agents: Lessons from SWE-rebench

Ibragim Badertdinov from Nebius shares key lessons from evaluating coding agents using the SWE-rebench benchmark, highlighting the importance of real-world tasks, reliable verification, and cost-effectiveness.

about 2 hours ago
Nvidia's Huang: AI Job Fears Are 'Nonsense'
Artificial Intelligence

Nvidia's Huang: AI Job Fears Are 'Nonsense'

Nvidia CEO Jensen Huang dismisses AI job loss fears as 'nonsense,' arguing AI actually drives demand for more software engineers.

3 days ago
Can LLMs Generate Enterprise-Quality Code?
Artificial Intelligence

Can LLMs Generate Enterprise-Quality Code?

Prasenjit Sarkar of Sonar discusses whether LLMs can generate enterprise-quality code, highlighting challenges and Sonar's AC/DC framework for agentic development.

4 days ago
Sakana AI: Finance Agents Take Shape
Technology

Sakana AI: Finance Agents Take Shape

Sakana AI is deploying AI agents to revolutionize financial operations, with engineers focusing on practical integration and enterprise-grade reliability.

4 days ago
Google DeepMind Explains AI Agent Building Struggles
AI Research

Google DeepMind Explains AI Agent Building Struggles

Philipp Schmid from Google DeepMind explains the core challenges senior engineers face when building AI agents, contrasting traditional engineering with agentic development.

5 days ago
Braintrust Cedes Coding to Codex
Artificial Intelligence

Braintrust Cedes Coding to Codex

Braintrust is dramatically speeding up its development cycle by integrating OpenAI's Codex, turning customer requests into code previews in minutes.

6 days ago
Cursor's RL Infrastructure for Training Composer
Artificial Intelligence

Cursor's RL Infrastructure for Training Composer

Cursor details its distributed infrastructure for training its AI coding model, Composer, using reinforcement learning on 'Fireworks'.

9 days ago
DeepMind's Scale: How Agents Run at Google
AI Research

DeepMind's Scale: How Agents Run at Google

Google DeepMind's KP Sawhney and Ian Ballantyne reveal how they run AI agents at scale, discussing the architecture, tools, and challenges involved in managing complex automated tasks.

11 days ago
LinkedIn Engineer Builds Community
tech

LinkedIn Engineer Builds Community

LinkedIn engineer Rishika builds community through mentorship and online content, extending her impact beyond her core role.

14 days ago
AI Agents Build Better AI
tech

AI Agents Build Better AI

LinkedIn Engineering details how AI agents are revolutionizing model development through automated, iterative refinement loops.

14 days ago
LinkedIn Unifies Hiring Data
tech

LinkedIn Unifies Hiring Data

LinkedIn's new unified integrations platform standardizes hiring data, slashing onboarding times and powering AI recruitment tools.

14 days ago
Lawrence Jones on Fighting AI with AI
Artificial Intelligence

Lawrence Jones on Fighting AI with AI

Lawrence Jones of incident.io discusses how AI can be used to debug and manage complex AI systems, highlighting the importance of structured data and automated analysis pipelines.

18 days ago
Viverra: Verifying AI-Generated Code
AI Research

Viverra: Verifying AI-Generated Code

Viverra tackles the trust deficit in AI-generated code by automatically producing formally verified annotations, enhancing developer comprehension and productivity.

20 days ago
Mike Spitz on Post-Engineer Engineering Org
Artificial Intelligence

Mike Spitz on Post-Engineer Engineering Org

Mike Spitz discusses how AI agents are transforming engineering by boosting productivity and changing workflows, advocating for a phased approach to adoption.

20 days ago
Sea Bets Big on AI Coding with OpenAI Codex
Artificial Intelligence

Sea Bets Big on AI Coding with OpenAI Codex

Sea Limited is deploying OpenAI Codex across its developer organization, aiming to transform software development in Southeast Asia through AI-native workflows and agentic collaboration.

21 days ago
LLMs Tame Software Requirements
AI Research

LLMs Tame Software Requirements

VERIMED leverages LLMs and SMT solvers to formally audit natural-language software requirements, turning ambiguity into testable signals and boosting verified accuracy.

21 days ago
Beyond Model Capability: The Harness for SE Agents
AI Research

Beyond Model Capability: The Harness for SE Agents

Autonomous software engineering agents' reliability hinges on a novel 'AI Harness' system, not just model capability, enabling verifiably correct changes.

21 days ago
Building an AI Chess Coach: Take Take Take
Artificial Intelligence

Building an AI Chess Coach: Take Take Take

Anant Dole and Asbjorn Steinskog discuss building an AI chess coach, the limitations of LLMs in chess, and their eval framework.

22 days ago
Sakana AI's Defense Push
Technology

Sakana AI's Defense Push

Sakana AI is building AI for defense, with engineers developing critical command and control systems for national security.

25 days ago
Matt Pocock: Engineering Fundamentals Still Crucial in AI
Artificial Intelligence

Matt Pocock: Engineering Fundamentals Still Crucial in AI

Matt Pocock, author of 'AI Hero', emphasizes that engineering fundamentals are more crucial than ever for building robust AI systems.

28 days ago
Coding Agents' Stealth Vulnerabilities Unmasked
AI Research

Coding Agents' Stealth Vulnerabilities Unmasked

New benchmark MOSAIC-Bench reveals production coding agents can be tricked into shipping exploitable code via sequenced, innocuous tasks, bypassing current safety reviews.

29 days ago
Cursor's AI Agents Get Worktree Boost
Artificial Intelligence

Cursor's AI Agents Get Worktree Boost

David Gomes of Cursor detailed the integration of Git worktrees into AI agents, enabling isolated task execution and reducing code complexity.

about 1 month ago
OpenAI's Ryan Lopopolo on Harnessing AI for Software Engineering
Artificial Intelligence

OpenAI's Ryan Lopopolo on Harnessing AI for Software Engineering

OpenAI's Ryan Lopopolo discusses how AI agents are reshaping software engineering, emphasizing the shift towards human oversight and strategic prompt design.

about 2 months ago
Anthropic's Claude Opus 4.7 Arrives, Sharper Than Ever
Artificial Intelligence

Anthropic's Claude Opus 4.7 Arrives, Sharper Than Ever

Anthropic unveils Claude Opus 4.7, boosting AI's coding prowess, multimodal input, and safety features for enterprise use.

about 2 months ago
Cursor's Agents Get Visual
Technology

Cursor's Agents Get Visual

Cursor agents now generate interactive visualizations, enhancing data exploration and collaboration beyond text-based reports.

about 2 months ago
IBM's Jeff Crume on AI Tech Debt
Artificial Intelligence

IBM's Jeff Crume on AI Tech Debt

Jeff Crume of IBM explains how AI systems can accrue technical debt, the risks involved, and how to mitigate it through strategic planning and discipline.

about 2 months ago
Copilot's Agentic Leap
Technology

Copilot's Agentic Leap

GitHub Copilot's evolution into agent-driven development automates complex analysis, freeing developers for creative tasks through effective AI collaboration.

2 months ago
Externalizing Agent Harnesses with Language
AI Research

Externalizing Agent Harnesses with Language

Researchers introduce Natural-Language Agent Harnesses (NLAHs) and an Intelligent Harness Runtime (IHR) to externalize agent control logic, enabling greater transferability and scientific study.

2 months ago
Devin AI: The Future of Software Engineering?
Artificial Intelligence

Devin AI: The Future of Software Engineering?

Scott Wu and Russell Kaplan of Cognition AI discuss Devin, their AI software engineer, and its potential to revolutionize the tech industry.

2 months ago
Bridging the AI Code Quality Gap
AI Research

Bridging the AI Code Quality Gap

A new benchmark, c-CRAB, reveals current AI code review agents only solve ~40% of tasks, highlighting gaps and potential for human-AI collaboration in code quality assurance.

2 months ago
Anthropic's Claude Masters Autonomous Coding
Artificial Intelligence

Anthropic's Claude Masters Autonomous Coding

Anthropic details a new multi-agent system that enables Claude to autonomously generate complex full-stack applications, moving beyond previous limitations in AI coding.

2 months ago
MiniMax M2.7 Hints at AI Self-Evolution
Artificial Intelligence

MiniMax M2.7 Hints at AI Self-Evolution

MiniMax's M2.7 model showcases early signs of AI self-evolution, excelling in software engineering and professional tasks while driving organizational AI transformation.

3 months ago
GitHub Grapples With Recent Outages
Technology

GitHub Grapples With Recent Outages

GitHub details recent availability issues, citing rapid growth and architectural flaws, and outlines plans for enhanced resilience.

3 months ago
Potpie AI Secures $2.2M for Engineering Agents
Funding Round

Potpie AI Secures $2.2M for Engineering Agents

Potpie AI secured $2.2 million in pre-seed funding to integrate AI agents into complex engineering systems by unifying context across codebases.

3 months ago
AI Product Development Shifts to Execution
Investor News

AI Product Development Shifts to Execution

AI product development has shifted from experimentation to execution, focusing on application-layer innovation and economic viability.

4 months ago
Beyond Snippets: The Evolving Landscape of AI Code Evaluation
AI Video

Beyond Snippets: The Evolving Landscape of AI Code Evaluation

6 months ago
Beyond Vibe Coding: The Architect's Blueprint for AI-Driven Software
AI Video

Beyond Vibe Coding: The Architect's Blueprint for AI-Driven Software

6 months ago
The AI Engineer: A Full-Stack Architect of Tomorrow
AI Video

The AI Engineer: A Full-Stack Architect of Tomorrow

10 months ago
AI's Code: More Artifact, Less Architecture
AI Video

AI's Code: More Artifact, Less Architecture

10 months ago
#Software Engineering Articles | StartupHub.ai