Cielara Code Outperforms Rivals

AI coding tools are outpacing human review capabilities, creating a critical bottleneck. New research from Causal Dynamics Lab (CDL) pinpoints the issue: AI agents spend disproportionate time searching for files rather than making edits. CDL's new product, Cielara Code, addresses this directly.

In independent tests, Cielara Code outperformed both Claude Code (Opus-4.6) and OpenAI Codex (GPT-5.4) in code localization accuracy.

Agents Get Lost in the Code

CDL analyzed thousands of coding sessions, finding that AI agents dedicate 56.8% of their actions to reading files and 24.2% to using grep. Actual code edits accounted for less than 1%.

The problem intensifies with complexity; tasks involving more than six files saw a significant drop in recall and a fourfold increase in compute cost for failed attempts. This mirrors the "dynamic verification debt" highlighted in the 2025 DORA report, which noted a 7.2% drop in deployment stability with AI coding tools.

"Every coding agent out there today uses grep, which is like a surgeon operating without imaging," stated Hasibul Haque, CEO at Causal Dynamics Lab.

Cielara Code aims to provide AI agents with a clear, contextual map of the production environment. This allows them to locate and modify code faster and more accurately.

Cielara Code's Causal Graph Approach

Cielara Code employs a 6-layer causal graph to model a customer's production environment. This graph details code function, origin, ownership, limitations, deployment, and runtime behavior.

This structured representation, called a Code Dependency Causal Graph, tracks four relationship types, enabling agents to navigate code contextually rather than through brute-force file searching. Failures can be traced back to specific changes, approvers, and reasons.

Benchmark Dominance

On MULocBench, Cielara Code achieved 0.752 recall@5, surpassing Claude Code's 0.727 and Codex's 0.707. Task completion time was reduced from 141.84 seconds to 128.62 seconds.

This resulted in fewer incorrect edits, fewer failed runs, and a 30-40% reduction in compute cost per task.

REASONARA: Scalable Causal Memory

Underpinning Cielara Code is REASONARA, a graph-structured causal memory layer. It manages over 125 million tokens of context, retrieving only relevant information, drastically reducing token usage compared to full-context methods.

REASONARA achieved high scores on benchmarks like UltraDomain (94%) and LoCoMo (92%), running 5-8x faster than Codex's high-reasoning mode.

Cielara Code acts as a safety layer for AI coding agents, enhancing output reliability without replacement. It is currently used by 11 Fortune 100 and over 40 Fortune 500 companies.

"Enterprises need solutions to problems they cannot solve with people alone," commented Phillip Miller, VP, Global Chief Information Security Officer at H&R Block.

The CDL team includes former Uber platform engineering lead Hasibul Haque, ex-Uber engineer Ryan Turner, and researchers Dr. Xuchao Zhang (Microsoft Research) and Dr. Liang Zhao (Emory University).

The Future of AI-Driven Development

CDL plans to expand its Production World Model to fully simulate code, infrastructure, and policy changes. This will create a permanent reasoning layer for enterprise AI agents.