As Large Language Models (LLMs) rapidly integrate into the economy, understanding their precise impact on the labor market is paramount. New research introduces the Skill Automation Feasibility Index (SAFI), a critical framework for benchmarking frontier LLMs against granular occupational skills. This work, detailed on arXiv, provides empirical data for policymakers and investors navigating the evolving landscape of AI-driven automation.
Benchmarking LLMs for Skill Automation Feasibility
The study rigorously evaluates four leading LLMs, LLaMA 3.3 70B, Mistral Large, Qwen 2.5 72B, and Gemini 2.5 Flash, across 263 text-based tasks representative of all 35 skills defined by the U.S. Department of Labor's O*NET taxonomy. The resulting Skill Automation Feasibility Index (SAFI) reveals that skills like Mathematics (SAFI: 73.2) and Programming (71.8) are most susceptible to automation. Conversely, Active Listening (42.2) and Reading Comprehension (45.5) exhibit the lowest feasibility scores, indicating areas where human skills remain robust against current LLM capabilities. The convergence of performance across the evaluated models, with a narrow 3.6-point spread, suggests that the potential for text-based automation is more intrinsically tied to the nature of the skill itself rather than specific model architectures.