Mistral AI's Leanstral Tackles Trust in Code Generation

Mistral AI introduces Leanstral, an open-source code agent for Lean 4, aiming to bring formal verification to AI-generated code and boost engineering efficiency.

3 min read
Mistral AI logo with abstract code elements.
Image credit: mistral.ai

Mistral AI has unveiled Leanstral, an open-source code agent specifically built for Lean 4. This development marks a significant step towards AI systems capable of not only generating code but also formally proving its correctness against strict specifications. The move aims to accelerate engineering velocity by reducing the reliance on manual human review for critical applications.

The core challenge addressed by Leanstral is the scaling bottleneck in AI-assisted code generation for domains like advanced mathematics and mission-critical software. Current AI agents excel at producing code, but verifying that output, especially in high-stakes scenarios, demands significant human expertise and time. Leanstral is designed to mitigate this by providing outputs that are inherently verifiable.

A New Agent for Formal Verification

Leanstral is positioned as the first open-source code agent for Lean 4, a powerful proof assistant used for expressing complex mathematical objects and software specifications. Unlike generalist models or those focused on single mathematical problems, Leanstral is optimized for formal repositories and boasts 6 billion active parameters. This focus on efficiency and specialized training is key to its utility.

Mistral AI is making Leanstral accessible through multiple avenues: weights are released under an Apache 2.0 license, it's integrated into Mistral Vibe, and available via a free API endpoint. The company also plans to release a technical report on its training methodology and a new evaluation suite, FLTEval, designed to move beyond competition math benchmarks.

Efficiency and Performance Gains

The architecture of Leanstral is highly sparse and optimized for proof engineering tasks. By leveraging parallel inference and Lean's verification capabilities, the agent achieves strong performance and cost-efficiency compared to closed-source alternatives. This approach allows it to be both performant and economical.

In evaluations using the FLTEval suite, Leanstral demonstrated a significant efficiency advantage over larger open-source models. For instance, it outperformed models like GLM5-744B-A40B and Kimi-K2.5-1T-32B in a single pass, a feat requiring multiple passes from competitors. Even when compared to Qwen3.5-397B-A17B, Leanstral achieved a superior score with fewer passes and scaled linearly.

When pitted against proprietary models like Anthropic's Claude family, Leanstral presented a compelling value proposition. Its pass@2 score of 26.3 surpassed Sonnet by 2.6 points at a cost of $36, compared to Sonnet's $549. While Claude Opus 4.6 remains the performance leader, its cost is prohibitively higher.

Real-World Applications

Leanstral has shown practical utility in use cases such as assisting with complex code migrations. In one example, it successfully diagnosed and proposed a fix for a breaking change in a Lean 4 script by identifying issues with definitional equality and suggesting the use of `abbrev` over `def`.

The agent also proved capable of translating code from other languages, like Rocq, into Lean and even generating formal proofs for program properties. This demonstrates its potential in bridging different formal systems and accelerating the development of verifiable software.

Leanstral is available now for users to explore. It can be accessed zero-setup within Mistral Vibe using the command `/leanstall`, via the labs API endpoint `labs-leanstral-2603`, or by downloading the model weights directly for self-hosting.