Microsoft Research has unveiled Fara-7B, an open-weight, ultra-compact agentic small language model (SLM) specifically engineered for computer use. This 7-billion-parameter model represents a significant pivot from conventional text-based chatbots, leveraging computer interfaces like a mouse and keyboard to execute real-world tasks directly on behalf of users. Its introduction signals a strategic push towards efficient, on-device agentic AI, building on Microsoft's earlier Phi models and Copilot+ PC deployments.
Fara-7B distinguishes itself by operating through visual perception of a webpage, taking actions based on directly predicted coordinates without relying on accessibility trees or separate screen parsing models. This human-like interaction method, combined with its compact 7B parameter count, allows Fara-7B to achieve state-of-the-art performance within its size class, even competing with larger, more resource-intensive agentic systems. According to the announcement, the model's efficiency is further underscored by its ability to complete tasks with significantly fewer steps than comparable models, leading to substantial cost savings in token consumption.
