Artificial Intelligence

Preferred on Google

Can LLMs Generate Enterprise-Quality Code?

Prasenjit Sarkar of Sonar discusses whether LLMs can generate enterprise-quality code, highlighting challenges and Sonar's AC/DC framework for agentic development.

May 31 at 7:02 PM7 min read

Prasenjit Sarkar presenting on LLMs and enterprise code quality at an AI Engineer event. — AI Engineer

Visual TL;DR. AI shifts software engineering leads to LLM code quality questioned. LLM code quality questioned but Benchmarks insufficient. Benchmarks insufficient due to Training data challenges. Training data challenges leading to Sonar's LLM evaluation. Sonar's LLM evaluation with AC/DC Framework. AC/DC Framework enabling Enterprise-ready code.

AI shifts software engineering: developers now instruct AI agents, review output
LLM code quality questioned: enterprise needs more than functional correctness
Benchmarks insufficient: HumanEval/MBPP miss enterprise quality factors
Training data challenges: LLM nature and data limitations impact code
Sonar's LLM evaluation: framework to assess AI-generated code quality
AC/DC Framework: Sonar's solution for agentic development
Enterprise-ready code: goal for LLM-generated software

Visual TL;DRQuickExplainDeeper

Prasenjit Sarkar, representing Sonar, delivered a presentation titled "Can LLMs Generate Enterprise Quality Code?" exploring the capabilities and limitations of Large Language Models (LLMs) in producing production-ready software.

Can LLMs Generate Enterprise-Quality Code? - AI Engineer — Can LLMs Generate Enterprise-Quality Code? — from AI Engineer

The Shifting Landscape of Software Engineering

Sarkar began by highlighting how AI has fundamentally altered software engineering. He quoted Asarav Karpathy, who noted that the way developers write code has changed significantly, shifting from direct coding in an editor to instructing AI agents. This new paradigm involves developers providing tasks in natural language and reviewing the AI's output in parallel.

Evaluating LLM Code: Beyond Benchmarks

The presentation questioned the trustworthiness of LLM-generated code, particularly in enterprise environments. Sarkar emphasized that standard benchmarks, such as HumanEval and MBPP, primarily measure functional correctness and algorithm implementation. However, they fail to account for crucial enterprise quality aspects like security, real-world reliability, engineering discipline, code maintainability, and context awareness.

He warned that high scores on these benchmarks can mask code that is dangerous, unmaintainable, and riddled with technical debt. The core issue identified is that LLMs, by their nature, are probabilistic, produce different results each time, have limited context, do not understand the entire codebase, and are not easily explainable or diagnosable.

Sonar's LLM Evaluation Framework

To address these challenges, Sonar has developed a framework to assess the true quality of LLM-generated code. This framework involves analyzing over 50 leading LLMs using a dataset of 4,000+ distinct Java programming assignments. The analysis utilizes SonarQube Enterprise to detect complex bugs, vulnerabilities, and code smells.

The evaluation revealed that models with higher functional performance often produce code that is more verbose and complex. For instance, Gemini 3.1 Pro High, despite having the highest pass rate and accuracy, also showed a high issue density. Similarly, other top-performing models like Opus 4.5 Thinking and GPT-4.5 Turbo demonstrated trade-offs between performance and code quality metrics.

Challenges Rooted in Training Data and LLM Nature

Sarkar elaborated on the challenges contributing to these quality issues. These include mixed-quality code in training sets, where models can learn from outdated or bad patterns alongside good ones. Additionally, built-in security flaws within training data can lead LLMs to generate unsafe code. Subtle logic errors can also slip into the training pool, causing models to produce code that fails or misbehaves in production.

The inherent nature of LLMs as probabilistic systems also poses a challenge. They produce varying results for the same prompt and lack a deep understanding of the overall codebase, making their output difficult to diagnose and improve.

Sonar's Solution: The AC/DC Framework

Sonar offers a comprehensive solution for agentic development through its AC/DC framework: Guide, Verify, and Solve.

Guide: This phase involves Sonar Context Augmentation and SonarSweep, which aim to provide the LLM with the necessary context of the codebase.
Verify: SonarQube, in its server, cloud, and CLI forms, analyzes the generated code for bugs, vulnerabilities, and code smells, ensuring it meets enterprise standards.
Solve: The SonarQube Remediation agent then assists in fixing identified issues.

Sonar's approach allows developers to integrate LLM-generated code into their workflows more reliably. The process involves guiding the LLM with contextual data, verifying the output through SonarQube analysis, and then solving any identified issues before committing the code.

Sarkar concluded by inviting attendees to visit Sonar's booth for more information and a demo, where they could also enter a raffle for AirPods Pro and grab some Sonar swag.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#Prasenjit Sarkar #Sonar #LLM #Artificial Intelligence #Software Engineering #Code Quality #AI Tools #Agentic Development