Cloudflare's AI Engineering Stack Revealed

Cloudflare has revealed the inner workings of its proprietary AI engineering stack, built entirely on its own platform. In the last month, 93% of the company's R&D organization has leveraged AI coding tools powered by this internal infrastructure. This initiative, launched eleven months ago, aimed to deeply integrate AI into the engineering workflow, necessitating the creation of MCP servers, an access layer, and essential AI tooling.

A dedicated tiger team, dubbed iMARS (Internal MCP Agent/Server Rollout Squad), spearheaded the project, with the Dev Productivity team ultimately taking ownership. The numbers underscore the impact: 3,683 internal users actively engaged with AI coding tools, generating 47.95 million AI requests. Over 295 teams are utilizing these agentic AI tools, with AI Gateway processing 20.18 million requests and routing 241.37 billion tokens monthly. Workers AI alone processed 51.83 billion tokens.

The adoption of these tools has directly fueled developer velocity, evidenced by a dramatic increase in merge requests. The 4-week rolling average climbed from approximately 5,600 per week to over 8,700, peaking at 10,952 in late March, nearly doubling the Q4 baseline.

The architecture spans three core layers: platform, knowledge, and enforcement. The platform layer handles authentication, routing, and inference, leveraging Cloudflare Access for Zero Trust security, AI Gateway for centralized LLM control, and Workers AI for on-platform inference with open-weight models. The MCP Server Portal, built with Workers and Access, provides a single OAuth endpoint for multiple tools.

Platform Layer: Security and Developer Experience

Cloudflare Access ensures secure authentication and Zero Trust policy enforcement for over 3,600 internal users. All LLM requests are routed through AI Gateway, enabling centralized management of provider keys, cost tracking, and data retention policies. AI Gateway handles approximately 688,000 requests and 10.57 billion tokens daily, routing to four providers.

While Frontier models handle complex tasks, Workers AI is becoming a significant component, accounting for 8.84% of internal requests. Workers AI, Cloudflare's serverless AI inference platform, offers substantial cost savings and reduced latency by keeping inference on the same network as other Cloudflare services.

The platform utilizes Workers AI for tasks like documentation review and generating context files, significantly reducing costs. For example, a security agent processing over 7 billion tokens daily on Kimi K2.5 via Workers AI costs 77% less than a proprietary model equivalent, as detailed in Cloudflare's AI Code Review Overhaul.

A single proxy Worker centralizes control, enabling per-user attribution, model catalog management, and permission enforcement without client configuration changes. This pattern allows for seamless integration of future coding assistant tools.

The setup is streamlined via a single command, opencode auth login, which configures providers, models, and permissions automatically. This eliminates manual configuration for users, ensuring API keys remain server-side.

MCP Server Portal: Unified Access and Code Mode

The MCP Server Portal aggregates 13 production MCP servers, exposing over 182 tools across various services like GitLab, Jira, and Sentry. This unified endpoint simplifies access management through a single Cloudflare Access flow.

MCP servers are built on the Agents SDK, workers-oauth-provider, and Cloudflare Access. Adding new servers involves adapting existing wrappers.

Code Mode addresses the token overhead associated with loading tool schemas upfront. Instead of exposing individual tool definitions, the portal collapses them into two central tools: portal_codemode_search and portal_codemode_execute. This significantly reduces context bloat and token costs, offering a cleaner architecture as more servers are integrated.

Knowledge Layer: Understanding the Engineering Ecosystem

Backstage, the open-source internal developer portal, serves as Cloudflare's service catalog. It tracks over 2,055 services, 167 libraries, and 122 packages, alongside APIs, systems, databases, and teams, with detailed ownership mappings and dependency graphs. This structured data enables agents to understand the broader engineering context beyond just code.

The Backstage MCP server provides agents access to service ownership, dependencies, and API specs directly within coding sessions. This comprehensive knowledge graph is crucial for effective AI agent operation, transforming individual repositories into a connected map of the organization.

Cloudflare's AI Engineering Stack Revealed

Platform Layer: Security and Developer Experience

MCP Server Portal: Unified Access and Code Mode

Knowledge Layer: Understanding the Engineering Ecosystem

AI Daily Digest