#Software Engineering

50 articles with this tag

AI Agents as Supply Chain Actors: Patch Pilot's Security Model

Moritz Johner of Form3 discusses the limitations of automated dependency patching tools and the security considerations of using AI agents with production code access, introducing their 'Patch Pilot' system.

about 16 hours ago

Artificial Intelligence

Anthropic's Claude Code Creator on AI's Future

Boris Cherny, Head of Claude Code at Anthropic, discusses the evolution of AI models, the future of software engineering, and Anthropic's focus on AI safety.

1 day ago

Artificial Intelligence

Bala Ramdoss on Generative UI for Agentic CX

Bala Ramdoss of Amazon discusses generative UI, the critical layer between LLM output and product experience, emphasizing rendering contracts, streaming, and BFF patterns for agentic CX.

1 day ago

Artificial Intelligence

AI Agents Need Feature Flags for Safety, Says Engineer

Backend engineer Sachin Gupta argues AI agents need specialized feature flags beyond traditional tools to manage their complex behaviors and mitigate risks.

3 days ago

AI Research

Google DeepMind VP on AI's Role in Coding's Future

Google DeepMind's Benoit Schillings discusses how AI is reshaping software engineering, from code generation to scientific discovery.

4 days ago

Artificial Intelligence

AI's AI Reckoning Hits Software Engineers

Bloomberg Businessweek's Mark Milian discusses how AI is transforming software engineering, shifting roles from coding to AI direction and impacting the talent pipeline.

4 days ago

Artificial Intelligence

Addy Osmani: Own Your Verdict in the Age of AI Agents

Addy Osmani, former Google Cloud AI Director, discusses the evolving role of engineers in the age of AI agents, emphasizing judgment, accountability, and ownership.

7 days ago

Artificial Intelligence

OpenAI Flags Major Flaws in SWE-Bench Pro

OpenAI's audit reveals approximately 30% of SWE-Bench Pro's coding tasks are flawed, prompting the company to retract its recommendation for the benchmark.

10 days ago

Technology

GitHub Fixes Repo Ownership

GitHub implemented a durable ownership system for all active repositories, archiving thousands of unmanaged ones and mandating ownership for new creations.

12 days ago

Technology

Octonous Streamlines AI Safety Work

Mozilla.ai's AI Safety Engineer leverages Octonous to automate policy creation, monitor libraries, and aggregate research, boosting efficiency.

12 days ago

Artificial Intelligence

Duolingo's Angel Lee on AI Discernment vs. Approval

Angel Ortmann Lee from Duolingo discusses building AI systems for discernment, not approval, and the dangers of automation bias.

13 days ago

AI Research

SWE-Marathon: Evaluating AI Coding Agents at Scale

Rishi Desai from Abundant AI introduces SWE-Marathon, a benchmark evaluating AI coding agents on billion-token scale tasks, revealing current limitations and the need for robust verification.

14 days ago

Artificial Intelligence

Wandero AI's Kalandadze on the 'Missing Layer' Post-Launch

Wandero AI's CTO, Raphael Kalandadze, discusses the critical 'missing layer' of post-launch operations for AI agents, emphasizing the need for continuous monitoring and improvement loops.

16 days ago

AI Research

Soheil Feizi on Continual Learning for AI Agents

Soheil Feizi of RELAI explains the challenges and principles behind continual learning for AI agents, focusing on replayable, holistic, lifelong, and efficient improvements.

16 days ago

Artificial Intelligence

OpenAI's Bug Hunt: 18-Year-Old Flaw Found

OpenAI uncovered two hidden bugs, including an 18-year-old software flaw, by analyzing crash data like an epidemiologist.

21 days ago

Technology

Cursor Brings AI Coding to iOS

Cursor's new iOS app allows developers to manage AI coding agents from their phones, enabling development on the go.

22 days ago

Artificial Intelligence

Dominik Tornow: The Prompt is the Platform in AI

Dominik Tornow of Resonate argues that AI development is shifting towards prompt engineering, making 'The Prompt is the Platform' a reality by 2026.

22 days ago

Artificial Intelligence

Microsoft Experts on Debugging Non-Deterministic AI Agents

Microsoft experts Tisha Chawla and Susheem Koul discuss the challenges of debugging AI agents in production and introduce strategies for ensuring replayability and observability.

22 days ago

Artificial Intelligence

Angie Jones on Building Autonomous Engineering Orgs

Angie Jones of Agentic AI Foundation discusses building autonomous engineering organizations, emphasizing AI as a collaborator and the importance of tailored integration.

23 days ago

Artificial Intelligence

Agents Building Agents: Nearform's AI Approach

Alfonso Graziano from Nearform explores how AI agents can build and improve other AI agents, detailing the 'Harness Engineering' methodology for reliable AI development.

23 days ago

Artificial Intelligence

OpenGov's Gabe De Mesa on Scaling AI Agents in Production

Gabe De Mesa of OpenGov details how the company built and scaled its OG Assist AI agent, highlighting the use of Effect, A2A protocol, sandboxing, and developer velocity tools.

25 days ago

Technology

GitHub Copilot Harness Efficiency

GitHub reveals its agentic harness matches model performance with superior token efficiency, supporting over 20 LLMs.

25 days ago

Artificial Intelligence

The Dawn of AI Agents: Building the First

Experts discuss the evolution of AI agents, from early experimental tools to indispensable collaborators in software engineering.

26 days ago

Artificial Intelligence

Anthropic's Fiona Fung on Building AI-Pilled Engineering Teams

Anthropic's Fiona Fung discusses how AI is transforming engineering teams, emphasizing initiative, growth mindset, and collaboration.

30 days ago

Artificial Intelligence

Nextdoor engineers build faster with Codex

Nextdoor engineers are using OpenAI's Codex to accelerate development, enabling end-to-end feature building and faster debugging.

about 1 month ago

Artificial Intelligence

Anthropic Unleashes Claude Fable 5, Mythos 5

Anthropic launches Claude Fable 5 for general use and Mythos 5 for specialized cybersecurity, showcasing advanced capabilities with new safety measures and competitive pricing.

about 1 month ago

AI Research

Code2LoRA: Repository Context without Overhead

Code2LoRA generates dynamic LoRA adapters for code LLMs, offering repository context without inference overhead and adapting to evolving codebases.

about 2 months ago

Artificial Intelligence

OpenClaw's Vincent Koc on 'Dark Factories' and AI Speed

Vincent Koc of OpenClaw discusses the rapid acceleration of AI development, comparing it to the industrial revolution and highlighting OpenClaw's efficient "dark factory" approach.

about 2 months ago

AI Research

Evaluating Coding Agents: Lessons from SWE-rebench

Ibragim Badertdinov from Nebius shares key lessons from evaluating coding agents using the SWE-rebench benchmark, highlighting the importance of real-world tasks, reliable verification, and cost-effectiveness.

about 2 months ago

Artificial Intelligence

Nvidia's Huang: AI Job Fears Are 'Nonsense'

Nvidia CEO Jensen Huang dismisses AI job loss fears as 'nonsense,' arguing AI actually drives demand for more software engineers.

about 2 months ago

Artificial Intelligence

Can LLMs Generate Enterprise-Quality Code?

Prasenjit Sarkar of Sonar discusses whether LLMs can generate enterprise-quality code, highlighting challenges and Sonar's AC/DC framework for agentic development.

about 2 months ago

Technology

Sakana AI: Finance Agents Take Shape

Sakana AI is deploying AI agents to revolutionize financial operations, with engineers focusing on practical integration and enterprise-grade reliability.

about 2 months ago

AI Research

Google DeepMind Explains AI Agent Building Struggles

Philipp Schmid from Google DeepMind explains the core challenges senior engineers face when building AI agents, contrasting traditional engineering with agentic development.

about 2 months ago

Artificial Intelligence

Braintrust Cedes Coding to Codex

Braintrust is dramatically speeding up its development cycle by integrating OpenAI's Codex, turning customer requests into code previews in minutes.

about 2 months ago

Artificial Intelligence

Cursor's RL Infrastructure for Training Composer

Cursor details its distributed infrastructure for training its AI coding model, Composer, using reinforcement learning on 'Fireworks'.

about 2 months ago

AI Research

DeepMind's Scale: How Agents Run at Google

Google DeepMind's KP Sawhney and Ian Ballantyne reveal how they run AI agents at scale, discussing the architecture, tools, and challenges involved in managing complex automated tasks.

about 2 months ago

tech

LinkedIn Engineer Builds Community

LinkedIn engineer Rishika builds community through mentorship and online content, extending her impact beyond her core role.

2 months ago

tech

AI Agents Build Better AI

LinkedIn Engineering details how AI agents are revolutionizing model development through automated, iterative refinement loops.

2 months ago

tech

LinkedIn Unifies Hiring Data

LinkedIn's new unified integrations platform standardizes hiring data, slashing onboarding times and powering AI recruitment tools.

2 months ago

Artificial Intelligence

Lawrence Jones on Fighting AI with AI

Lawrence Jones of incident.io discusses how AI can be used to debug and manage complex AI systems, highlighting the importance of structured data and automated analysis pipelines.

2 months ago

AI Research

Viverra: Verifying AI-Generated Code

Viverra tackles the trust deficit in AI-generated code by automatically producing formally verified annotations, enhancing developer comprehension and productivity.

2 months ago

Artificial Intelligence

Mike Spitz on Post-Engineer Engineering Org

Mike Spitz discusses how AI agents are transforming engineering by boosting productivity and changing workflows, advocating for a phased approach to adoption.

2 months ago

Artificial Intelligence

Sea Bets Big on AI Coding with OpenAI Codex

Sea Limited is deploying OpenAI Codex across its developer organization, aiming to transform software development in Southeast Asia through AI-native workflows and agentic collaboration.

2 months ago

AI Research

LLMs Tame Software Requirements

VERIMED leverages LLMs and SMT solvers to formally audit natural-language software requirements, turning ambiguity into testable signals and boosting verified accuracy.

2 months ago

AI Research

Beyond Model Capability: The Harness for SE Agents

Autonomous software engineering agents' reliability hinges on a novel 'AI Harness' system, not just model capability, enabling verifiably correct changes.

2 months ago

Artificial Intelligence

Building an AI Chess Coach: Take Take Take

Anant Dole and Asbjorn Steinskog discuss building an AI chess coach, the limitations of LLMs in chess, and their eval framework.

2 months ago

Technology

Sakana AI's Defense Push

Sakana AI is building AI for defense, with engineers developing critical command and control systems for national security.

2 months ago

Artificial Intelligence

Matt Pocock: Engineering Fundamentals Still Crucial in AI

Matt Pocock, author of 'AI Hero', emphasizes that engineering fundamentals are more crucial than ever for building robust AI systems.

2 months ago

AI Research

Coding Agents' Stealth Vulnerabilities Unmasked

New benchmark MOSAIC-Bench reveals production coding agents can be tricked into shipping exploitable code via sequenced, innocuous tasks, bypassing current safety reviews.

3 months ago

Artificial Intelligence

Cursor's AI Agents Get Worktree Boost

David Gomes of Cursor detailed the integration of Git worktrees into AI agents, enabling isolated task execution and reducing code complexity.

3 months ago