Quantifying LLM Impact on Labor Skills

New research introduces the Skill Automation Feasibility Index (SAFI), benchmarking LLMs and revealing a capability-demand inversion. AI augmentation is prevalent, not pure automation.

2 min read
Quantifying LLM Impact on Labor Skills

As Large Language Models (LLMs) rapidly integrate into the economy, understanding their precise impact on the labor market is paramount. New research introduces the Skill Automation Feasibility Index (SAFI), a critical framework for benchmarking frontier LLMs against granular occupational skills. This work, detailed on arXiv, provides empirical data for policymakers and investors navigating the evolving landscape of AI-driven automation.

Benchmarking LLMs for Skill Automation Feasibility

The study rigorously evaluates four leading LLMs, LLaMA 3.3 70B, Mistral Large, Qwen 2.5 72B, and Gemini 2.5 Flash, across 263 text-based tasks representative of all 35 skills defined by the U.S. Department of Labor's O*NET taxonomy. The resulting Skill Automation Feasibility Index (SAFI) reveals that skills like Mathematics (SAFI: 73.2) and Programming (71.8) are most susceptible to automation. Conversely, Active Listening (42.2) and Reading Comprehension (45.5) exhibit the lowest feasibility scores, indicating areas where human skills remain robust against current LLM capabilities. The convergence of performance across the evaluated models, with a narrow 3.6-point spread, suggests that the potential for text-based automation is more intrinsically tied to the nature of the skill itself rather than specific model architectures.

Related startups

The Capability-Demand Inversion and AI Augmentation

A striking finding is the "capability-demand inversion": skills most critical for jobs exposed to AI are precisely those where LLMs currently underperform according to the benchmark. This highlights a strategic gap and an opportunity for human expertise. Furthermore, by cross-referencing with real-world AI adoption data, the research proposes an AI Impact Matrix. This framework categorizes skills into High Displacement Risk, Upskilling Required, AI-Augmented, and Lower Displacement Risk. Crucially, the analysis indicates that 78.7% of observed AI interactions in the workplace are currently augmentation, enhancing human capabilities rather than replacing them outright. The SAFI, measuring LLM performance on text-based skill representations, provides a vital tool for understanding this nuanced interaction.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.