Can LLMs Generate Enterprise-Quality Code?

Prasenjit Sarkar of Sonar discusses whether LLMs can generate enterprise-quality code, highlighting challenges and Sonar's AC/DC framework for agentic development.

7 min read
Prasenjit Sarkar presenting on LLMs and enterprise code quality at an AI Engineer event.
AI Engineer

Prasenjit Sarkar, representing Sonar, delivered a presentation titled "Can LLMs Generate Enterprise Quality Code?" exploring the capabilities and limitations of Large Language Models (LLMs) in producing production-ready software.

Can LLMs Generate Enterprise-Quality Code? - AI Engineer
Can LLMs Generate Enterprise-Quality Code? — from AI Engineer

Visual TL;DR. AI shifts software engineering leads to LLM code quality questioned. LLM code quality questioned but Benchmarks insufficient. Benchmarks insufficient due to Training data challenges. Training data challenges leading to Sonar's LLM evaluation. Sonar's LLM evaluation with AC/DC Framework. AC/DC Framework enabling Enterprise-ready code.

  1. AI shifts software engineering: developers now instruct AI agents, review output
  2. LLM code quality questioned: enterprise needs more than functional correctness
  3. Benchmarks insufficient: HumanEval/MBPP miss enterprise quality factors
  4. Training data challenges: LLM nature and data limitations impact code
  5. Sonar's LLM evaluation: framework to assess AI-generated code quality
  6. AC/DC Framework: Sonar's solution for agentic development
  7. Enterprise-ready code: goal for LLM-generated software
Visual TL;DR
Visual TL;DR — startuphub.ai AI shifts software engineering leads to LLM code quality questioned. Training data challenges leading to Sonar's LLM evaluation. Sonar's LLM evaluation with AC/DC Framework leading to with AI shifts software engineering LLM code quality questioned Training data challenges Sonar's LLM evaluation AC/DC Framework From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai AI shifts software engineering leads to LLM code quality questioned. Training data challenges leading to Sonar's LLM evaluation. Sonar's LLM evaluation with AC/DC Framework leading to with AI shiftssoftware… LLM code qualityquestioned Training datachallenges Sonar's LLMevaluation AC/DC Framework From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai AI shifts software engineering leads to LLM code quality questioned. Training data challenges leading to Sonar's LLM evaluation. Sonar's LLM evaluation with AC/DC Framework leading to with AI shifts software engineering developers now instruct AI agents, reviewoutput LLM code quality questioned enterprise needs more than functionalcorrectness Training data challenges LLM nature and data limitations impactcode Sonar's LLM evaluation framework to assess AI-generated codequality AC/DC Framework Sonar's solution for agentic development From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai AI shifts software engineering leads to LLM code quality questioned. Training data challenges leading to Sonar's LLM evaluation. Sonar's LLM evaluation with AC/DC Framework leading to with AI shiftssoftware… developers nowinstruct AI agents,review output LLM code qualityquestioned enterprise needsmore thanfunctional… Training datachallenges LLM nature and datalimitations impactcode Sonar's LLMevaluation framework to assessAI-generated codequality AC/DC Framework Sonar's solutionfor agenticdevelopment From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai AI shifts software engineering leads to LLM code quality questioned. LLM code quality questioned but Benchmarks insufficient. Benchmarks insufficient due to Training data challenges. Training data challenges leading to Sonar's LLM evaluation. Sonar's LLM evaluation with AC/DC Framework. AC/DC Framework enabling Enterprise-ready code but due to leading to with enabling AI shifts software engineering developers now instruct AI agents, reviewoutput LLM code quality questioned enterprise needs more than functionalcorrectness Benchmarks insufficient HumanEval/MBPP miss enterprise qualityfactors Training data challenges LLM nature and data limitations impactcode Sonar's LLM evaluation framework to assess AI-generated codequality AC/DC Framework Sonar's solution for agentic development Enterprise-ready code goal for LLM-generated software From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai AI shifts software engineering leads to LLM code quality questioned. LLM code quality questioned but Benchmarks insufficient. Benchmarks insufficient due to Training data challenges. Training data challenges leading to Sonar's LLM evaluation. Sonar's LLM evaluation with AC/DC Framework. AC/DC Framework enabling Enterprise-ready code but due to leading to with enabling AI shiftssoftware… developers nowinstruct AI agents,review output LLM code qualityquestioned enterprise needsmore thanfunctional… Benchmarksinsufficient HumanEval/MBPP missenterprise qualityfactors Training datachallenges LLM nature and datalimitations impactcode Sonar's LLMevaluation framework to assessAI-generated codequality AC/DC Framework Sonar's solutionfor agenticdevelopment Enterprise-readycode goal forLLM-generatedsoftware From startuphub.ai · The publishers behind this format

The Shifting Landscape of Software Engineering

Sarkar began by highlighting how AI has fundamentally altered software engineering. He quoted Asarav Karpathy, who noted that the way developers write code has changed significantly, shifting from direct coding in an editor to instructing AI agents. This new paradigm involves developers providing tasks in natural language and reviewing the AI's output in parallel.

Related startups

Evaluating LLM Code: Beyond Benchmarks

The presentation questioned the trustworthiness of LLM-generated code, particularly in enterprise environments. Sarkar emphasized that standard benchmarks, such as HumanEval and MBPP, primarily measure functional correctness and algorithm implementation. However, they fail to account for crucial enterprise quality aspects like security, real-world reliability, engineering discipline, code maintainability, and context awareness.

He warned that high scores on these benchmarks can mask code that is dangerous, unmaintainable, and riddled with technical debt. The core issue identified is that LLMs, by their nature, are probabilistic, produce different results each time, have limited context, do not understand the entire codebase, and are not easily explainable or diagnosable.

Sonar's LLM Evaluation Framework

To address these challenges, Sonar has developed a framework to assess the true quality of LLM-generated code. This framework involves analyzing over 50 leading LLMs using a dataset of 4,000+ distinct Java programming assignments. The analysis utilizes SonarQube Enterprise to detect complex bugs, vulnerabilities, and code smells.

The evaluation revealed that models with higher functional performance often produce code that is more verbose and complex. For instance, Gemini 3.1 Pro High, despite having the highest pass rate and accuracy, also showed a high issue density. Similarly, other top-performing models like Opus 4.5 Thinking and GPT-4.5 Turbo demonstrated trade-offs between performance and code quality metrics.

Challenges Rooted in Training Data and LLM Nature

Sarkar elaborated on the challenges contributing to these quality issues. These include mixed-quality code in training sets, where models can learn from outdated or bad patterns alongside good ones. Additionally, built-in security flaws within training data can lead LLMs to generate unsafe code. Subtle logic errors can also slip into the training pool, causing models to produce code that fails or misbehaves in production.

The inherent nature of LLMs as probabilistic systems also poses a challenge. They produce varying results for the same prompt and lack a deep understanding of the overall codebase, making their output difficult to diagnose and improve.

Sonar's Solution: The AC/DC Framework

Sonar offers a comprehensive solution for agentic development through its AC/DC framework: Guide, Verify, and Solve.

  • Guide: This phase involves Sonar Context Augmentation and SonarSweep, which aim to provide the LLM with the necessary context of the codebase.
  • Verify: SonarQube, in its server, cloud, and CLI forms, analyzes the generated code for bugs, vulnerabilities, and code smells, ensuring it meets enterprise standards.
  • Solve: The SonarQube Remediation agent then assists in fixing identified issues.

Sonar's approach allows developers to integrate LLM-generated code into their workflows more reliably. The process involves guiding the LLM with contextual data, verifying the output through SonarQube analysis, and then solving any identified issues before committing the code.

Sarkar concluded by inviting attendees to visit Sonar's booth for more information and a demo, where they could also enter a raffle for AirPods Pro and grab some Sonar swag.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.