Reinforcement Fine-Tuning: Osmosis AI fine-tunes agents past FMs

Foundation models like GPT-5 and Claude Sonnet 4 have redefined AI, but their broad capabilities often falter when it comes to practical, nuanced tasks. A new startup, Osmosis AI, is emerging from stealth with $6.3 million in seed funding, co-led by CRV and Audacious Ventures, to tackle this very problem. They claim their approach, centered on 'Reinforcement fine-tuning', can make AI agents "better, faster, and cheaper" than their foundation model counterparts.

Osmosis's own research highlights the shortcomings of generalized models. In a recent analysis of leading closed and open-source models using Anthropic's Model Context Protocol (MCP) for tool integration, they found significant "unforced errors." Models frequently failed to use necessary tools, or conversely, invoked irrelevant ones, leading to degraded performance. GPT-5, despite generally performing best, saw notable success rate drops when presented with only relevant tools versus a broader set. The study even noted GPT-5's tendency to unnecessarily call `get_me` (a Slack tool) in general knowledge queries, suggesting a post-training skew.

We've raised $7M to help companies build AI agents that actually learn and work.@Osmosis_AI is a platform for companies to fine-tune models that outperform foundation models with reinforcement learning.

Better, faster, and cheaper. pic.twitter.com/v9Cl9Vdqjt
— Kasey Zhang (@_WEEXIAO) October 15, 2025

This isn't just an academic curiosity; it's a critical bottleneck for companies trying to deploy reliable AI agents. As Osmosis points out, "Any company that builds AI agents will need to use tools," and the current state of generalized foundation models makes addressing specific edge cases difficult. The promise of AI agents automating complex workflows hinges on their ability to reliably interact with external systems.

The Rise of Specialized AI Agents

This is where Osmosis AI steps in. Their platform offers "forward deployed reinforcement learning," allowing engineers to leverage advanced techniques like GRPO and DAPO for 'Reinforcement fine-tuning' without the usual infrastructure headaches. By focusing on a particular domain, use case, or set of tools, Osmosis helps companies train task-specific models that demonstrably outperform foundation models. They emphasize multi-turn tool training, enabling agents to learn complex sequences of actions, and offer continuous improvement with real-time data ingestion and hourly re-training runs.

Founded by Baiqing (Andy) L., formerly the youngest tech lead at TikTok, and his middle-school friend and co-founder, Osmosis is betting on this specialized approach. Their platform aims to deliver domain-specific extraction, robust tool use expertise, and specialized code generation. With plans for open-source models on the horizon, Osmosis is positioning 'Reinforcement fine-tuning' as the key to unlocking truly effective AI agents, moving beyond the limitations of generalized, off-the-shelf solutions.

Reinforcement Fine-Tuning: Osmosis AI fine-tunes agents past FMs

Related startups

The Rise of Specialized AI Agents

AI Daily Digest