Preferred on Google

Can you prove AI ROI in Software Engineering? (120k Devs Study) – Yegor Denisov-Blanch, Stanford

Dec 12, 2025 at 12:47 AM4 min read

Can you prove AI ROI in Software Engineering? (120k Devs Study) – Yegor Denisov-Blanch, Stanford

The Quest for Measurable AI ROI in Software Engineering

"Can you prove AI ROI in Software Engineering?" This question, posed by Yegor Denisov-Blanch, a researcher from Stanford, cuts to the heart of a critical challenge facing enterprises today. As companies pour millions into AI tools for software development, the ability to demonstrate tangible returns on this investment remains elusive for many. Denisov-Blanch's presentation at the AI Engineer Code Summit aimed to demystify this complex issue, offering data-driven insights and a practical playbook for measuring AI's true impact.

Denisov-Blanch began by highlighting a common pitfall: the overreliance on activity metrics. While metrics like pull request (PR) counts or DORA scores can indicate increased activity, they often fail to prove actual improvement. "Benchmarks show models can write code, but in enterprise deployments ROI is hard to measure, easy to bias, and often distorted by activity metrics (PR counts, DORA) that say 'more' without proving 'better'," he stated. This disconnect between activity and genuine value creation is a key reason why many AI initiatives fall short of expectations.

Related startups

To address this, Denisov-Blanch's research team employed a sophisticated methodology. They analyzed data from over 120,000 software engineers across more than 600 companies. Their approach involved a combination of time-series analysis using Git historical data and cross-sectional analysis across companies. Crucially, they developed a machine learning model capable of replicating panels of expert evaluations for every code commit. This model assessed factors such as implementation time, quality, maintainability, and complexity, correlating these with expert judgments to achieve "exceptional correlation" with an R-squared value of 0.85.

One of the initial insights revealed was that "the rich get richer." The data suggested a widening gap between early AI adopters who master its application and those who struggle. Denisov-Blanch illustrated this with a projection: "Illustrative AI Productivity Impact: Accelerating Divergence Through 2030." This projected that top-performing teams, by effectively leveraging AI, could see their productivity gains accelerate, potentially creating a "10x gap" with laggard teams by 2030. This underscores the urgency for organizations to not only adopt AI but to do so strategically.

Further analysis revealed that simple metrics like "token spend" are a weak predictor of AI productivity gains. The research indicated a complex relationship, with a "sweet spot" for token usage. "AI usage quality matters more than usage volume," Denisov-Blanch emphasized, pointing out that "token spend tells you who is using AI, not who is getting benefit." This suggests that focusing solely on the quantity of AI interaction can be misleading.

The environment in which AI is deployed also plays a crucial role. Denisov-Blanch presented a "Task Composition by AI Involvement vs. Environment Cleanliness Index" chart, illustrating how clean engineering environments amplify AI's benefits. He noted that "clean code amplifies AI gains," allowing AI to "complete a larger share of sprint tasks." Conversely, "AI use degrades cleanliness" in messy codebases, leading to negative outcomes. This highlights the importance of investing in code hygiene and good engineering practices as a foundation for successful AI adoption.

Measuring ROI effectively requires moving beyond simplistic metrics and focusing on engineering outcomes. Denisov-Blanch proposed a framework that ties AI usage to concrete engineering achievements. The primary metric should be "engineering output," which is measured not just by lines of code or PRs, but by the quality and impact of the work. He stressed the importance of "guardrail metrics" to ensure that AI adoption doesn't negatively affect critical aspects like code quality or introduce excessive rework. "Keep guardrail metrics healthy... while increasing the primary metric (engineering output)," he advised.

The research also identified different patterns of AI adoption. From "no observable AI use" (Level 0) to "orchestrated agentic workflows" (Level 4), the maturity of AI integration varied significantly. The data suggested that companies achieving the most substantial gains were those that moved beyond basic "systematized prompting" or "agent-backed development" to more sophisticated integrations where AI actively participates in complex workflows.

Ultimately, Denisov-Blanch's findings underscore a critical truth: AI adoption is not a one-size-fits-all solution. The ROI of AI in software engineering is heavily dependent on how it's implemented, measured, and integrated into existing workflows and company culture. By focusing on clean code, thoughtful measurement of engineering outcomes, and strategic adoption patterns, organizations can move beyond the hype and unlock the true potential of AI to drive meaningful productivity gains.

© 2025 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#AI #Artificial Intelligence #Can you prove #Technology

AI Daily Digest

Get the most important AI news daily.

+40k readers