OpenAI's unreleased models are demonstrating unprecedented capabilities, signaling a transformative shift in AI proficiency. As AI commentator Matthew Berman highlighted, these advancements feel akin to the moment AI surpassed human champions in chess, indicating a rapid acceleration in the field.
Berman's recent video showcased two pivotal developments from OpenAI's experimental labs. The first, dubbed o3-Alpha, is an alleged variant of the existing o3 model, exhibiting extraordinary coding prowess. This model recently secured second place in the AtCoder World Tour Finals 2025 Heuristic Contest in Tokyo, a notoriously challenging coding competition. Its performance was nearly on par with human grandmasters, even generating complex interactive web applications like a fully functional Space Invaders game and a 3D Pokédex from simple prompts. This level of detail and functionality from basic instructions is truly impressive.
The second revelation concerns an experimental reasoning LLM that achieved gold medal-level performance at the 2025 International Math Olympiad (IMO). This is a monumental feat, as the model operated under the same stringent rules as human contestants, including timed sessions with no external tools or internet access, and the requirement to produce natural language proofs. Alexander Wei from OpenAI confirmed this achievement, stating he was "excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world's most prestigious math competition—the International Math Olympiad (IMO)."
This progress underscores a core insight known as "The Bitter Lesson" in AI research. This principle suggests that focusing on general learning methods and scaling computation, rather than hand-coding specific human knowledge or heuristics, ultimately yields superior AI. As Berman articulated, "Time and time again, the best achievements in artificial intelligence have come from just scaling up and taking humans out of the loop." This approach, exemplified by early chess AI transitioning from hand-coded tactics to self-play, and Tesla's shift to end-to-end neural networks for self-driving, appears to be the path forward for achieving advanced intelligence.
The IMO model's success, which involved crafting intricate, watertight arguments at the level of human mathematicians, further validates this scaling paradigm. Importantly, Wei clarified that this math-solving LLM is an experimental research model, not the upcoming GPT-5, and there are no immediate plans for its broad release. Nevertheless, these unreleased models signal a future where AI's capabilities in complex problem-solving and creative generation will continue to expand exponentially, driven by relentless scaling and general-purpose learning.

