Cielara Code Outperforms Rivals

Cielara Code, from Causal Dynamics Lab, significantly improves AI coding agent performance by mapping production software, outperforming rivals in key benchmarks.

Diagram illustrating Cielara Code's 6-layer causal graph for software environments.
Cielara Code uses a causal graph to provide AI agents with context.

AI coding tools are outpacing human review capabilities, creating a critical bottleneck. New research from Causal Dynamics Lab (CDL) pinpoints the issue: AI agents spend disproportionate time searching for files rather than making edits. CDL's new product, Cielara Code, addresses this directly.

In independent tests, Cielara Code outperformed both Claude Code (Opus-4.6) and OpenAI Codex (GPT-5.4) in code localization accuracy.

Agents Get Lost in the Code

CDL analyzed thousands of coding sessions, finding that AI agents dedicate 56.8% of their actions to reading files and 24.2% to using grep. Actual code edits accounted for less than 1%.

The problem intensifies with complexity; tasks involving more than six files saw a significant drop in recall and a fourfold increase in compute cost for failed attempts. This mirrors the "dynamic verification debt" highlighted in the 2025 DORA report, which noted a 7.2% drop in deployment stability with AI coding tools.

"Every coding agent out there today uses grep, which is like a surgeon operating without imaging," stated Hasibul Haque, CEO at Causal Dynamics Lab.

Related startups

Cielara Code aims to provide AI agents with a clear, contextual map of the production environment. This allows them to locate and modify code faster and more accurately.

Cielara Code's Causal Graph Approach

Cielara Code employs a 6-layer causal graph to model a customer's production environment. This graph details code function, origin, ownership, limitations, deployment, and runtime behavior.

This structured representation, called a Code Dependency Causal Graph, tracks four relationship types, enabling agents to navigate code contextually rather than through brute-force file searching. Failures can be traced back to specific changes, approvers, and reasons.

Benchmark Dominance

On MULocBench, Cielara Code achieved 0.752 recall@5, surpassing Claude Code's 0.727 and Codex's 0.707. Task completion time was reduced from 141.84 seconds to 128.62 seconds.

This resulted in fewer incorrect edits, fewer failed runs, and a 30-40% reduction in compute cost per task.

REASONARA: Scalable Causal Memory

Underpinning Cielara Code is REASONARA, a graph-structured causal memory layer. It manages over 125 million tokens of context, retrieving only relevant information, drastically reducing token usage compared to full-context methods.

REASONARA achieved high scores on benchmarks like UltraDomain (94%) and LoCoMo (92%), running 5-8x faster than Codex's high-reasoning mode.

Cielara Code acts as a safety layer for AI coding agents, enhancing output reliability without replacement. It is currently used by 11 Fortune 100 and over 40 Fortune 500 companies.

"Enterprises need solutions to problems they cannot solve with people alone," commented Phillip Miller, VP, Global Chief Information Security Officer at H&R Block.

The CDL team includes former Uber platform engineering lead Hasibul Haque, ex-Uber engineer Ryan Turner, and researchers Dr. Xuchao Zhang (Microsoft Research) and Dr. Liang Zhao (Emory University).

The Future of AI-Driven Development

CDL plans to expand its Production World Model to fully simulate code, infrastructure, and policy changes. This will create a permanent reasoning layer for enterprise AI agents.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.