1 articles with this tag
Frontier models score 25% on Polymath's Horizon-SWE benchmark. That gap — between what today's best agents can do and what software teams actually need — is the market Polymath is building for.