MiniMax is pushing the boundaries of AI development with its latest model, the MiniMax M2.7. This new iteration is not just an upgrade; it represents an early step towards AI models actively participating in their own evolution, a concept the company calls "early echoes of self-evolution." This approach leverages user feedback to accelerate development cycles.
The M2.7 model is designed to build complex agent harnesses and execute intricate productivity tasks. It utilizes advanced capabilities like Agent Teams, sophisticated Skills, and dynamic tool search. In a significant development, MiniMax allowed M2.7 to update its own memory and construct numerous complex skills for reinforcement learning experiments, then refine its learning process based on those results, initiating a cycle of self-improvement.
Performance and Capabilities
In real-world software engineering, M2.7 shows impressive results. It handles end-to-end project delivery, log analysis, bug troubleshooting, and code security. On the SWE-Pro benchmark, it scored 56.22%, nearing top-tier performance. Its capabilities extend to full project delivery scenarios (VIBE-Pro 55.6%) and understanding complex engineering systems on Terminal Bench 2 (57.0%).
The model also excels in professional office software domains. Its ELO score on GDPval-AA is 1495, the highest among open-source models, showcasing enhanced expertise and task completion. M2.7 demonstrates significant improvements in complex editing tasks across Excel, PPT, and Word, handling multi-round revisions with high fidelity. It maintains a 97% skill adherence rate while working with over 40 complex skills, each exceeding 2,000 tokens.
Furthermore, M2.7 exhibits strong character consistency and emotional intelligence, opening new avenues for product innovation. These advancements are accelerating MiniMax's own transformation into an AI-native organization.
Building an Agent for Model Self-Evolution
MiniMax has developed an internal workflow where M2-series models can self-evolve. This process explores the limits of the model's agentic capabilities, using complex skills, memory, and external modules for adaptability. Agents are tasked with creating research agent harnesses that interact with different project groups, supporting data pipelines, training environments, and collaboration.
An example workflow involves the RL team, where an agent assists researchers from literature review to experiment monitoring. The agent handles tasks like log reading, debugging, and code fixes, reducing the need for extensive human intervention. M2.7 autonomously handles 30%-50% of this workflow.
The model’s ability to recursively evolve its own harness is key. The internal harness autonomously collects feedback, builds evaluation sets, and iterates on its architecture, skills, and memory mechanisms. In one instance, M2.7 optimized a model's programming performance through an iterative loop of analyzing failures, planning changes, modifying code, and evaluating results, achieving a 30% performance improvement.
MiniMax is exploring fully autonomous AI self-evolution, coordinating data construction, training, inference, and evaluation. Preliminary tests in low-resource scenarios involved M2.7 participating in 22 machine learning competitions. Using a harness with short-term memory, self-feedback, and self-optimization, M2.7 achieved a 66.6% medal rate, placing it among top-tier models.
Professional Software Engineering Deep Dive
M2.7's software engineering capabilities extend to debugging live production environments. It correlates monitoring metrics with deployment timelines for causal reasoning, analyzes trace sampling, and proposes hypotheses. The model can even connect to databases to verify root causes and suggest non-blocking index creation for immediate fixes before submitting code changes.
This model goes beyond code generation, demonstrating a deep understanding of production systems, from observability analysis to SRE-level decision-making. This has, in multiple instances, reduced recovery time for live incidents to under three minutes.
On benchmarks, M2.7 achieved 56.22% on SWE-Pro, matching GPT-5.3-Codex. It also performed strongly on SWE Multilingual (76.5) and Multi SWE Bench (52.7). For end-to-end project delivery, M2.7 scored 55.6% on VIBE-Pro, comparable to Opus 4.6. Its understanding of complex engineering systems is evident on Terminal Bench 2 (57.0%) and NL2Repo (39.8%).
Native Agent Teams, enabling multi-agent collaboration, are crucial for development efficiency. These teams require models to internalize role boundaries, adversarial reasoning, and protocol adherence. An internal prototype development team uses Agent Teams for building product prototypes.
Professional Work and Entertainment
Beyond coding, M2.7 shows promise in office scenarios, driven by domain expertise and task delivery. Its ELO score of 1495 on GDPval-AA places it among the top models, behind only Opus 4.6, Sonnet 4.6, and GPT5.4. The model excels at high-fidelity editing of Word, Excel, and PPT documents, handling multiple revision rounds.
M2.7 demonstrates robust interaction with complex environments, achieving 46.3% accuracy on Toolathon. It maintains a 97% skill compliance rate across 40 complex skills in MM Claw testing. In finance, it can autonomously read reports, build revenue forecast models, and produce analysis reports, acting akin to a junior analyst.
The rise of personal agents like OpenClaw highlights the demand for AI with high emotional intelligence and character consistency. M2.7's capabilities in these areas are crucial for developing more engaging and human-like AI companions.
