Cloudflare Tames AI Crawlers

Cloudflare's new 'Redirects for AI Training' feature uses canonical tags to send AI crawlers to up-to-date content, improving data accuracy.

2 min read
Cloudflare dashboard showing AI Crawl Control settings with Redirects for AI Training toggle enabled.
Enabling Redirects for AI Training in the Cloudflare dashboard.· Cloudflare

Cloudflare is rolling out a new feature designed to tackle a growing problem: AI training crawlers that ignore signals indicating outdated content. Dubbed "Redirects for AI Training," the system automatically converts existing canonical tags into HTTP 301 redirects specifically for verified AI training bots.

This addresses the issue where AI models ingest stale information from deprecated documentation or web pages, leading to inaccurate outputs. Cloudflare observed that bots categorized for AI training visited deprecated content on developers.cloudflare.com at the same rate as current content, despite deprecation banners and noindex tags.

Unlike human users or standard search engine crawlers, AI training bots often rely on cached or modeled data, making it difficult to correct outdated training sets once the information is ingested. Blocking these crawlers outright creates a content void, offering no alternative learning path.

Related startups

The solution leverages the existing web standard of the canonical tag (), which already declares the authoritative version of a page. Cloudflare's AI Crawl Control, part of its broader AI Crawl Control service, identifies verified AI training bots.

How It Works

When a verified AI training bot, such as OpenAI's GPTBot or Anthropic's ClaudeBot, requests a page with a non-self-referencing canonical tag, Cloudflare issues a 301 Moved Permanently redirect to the specified canonical URL. This redirection happens automatically for all paid Cloudflare plans.

Human traffic, general search indexing, and other automated bots remain unaffected. Cross-origin canonicals and self-referencing tags are excluded to prevent unintended consequences.

This approach offers a scalable solution compared to manual redirect rules, which require constant updates and are prone to falling out of sync with content changes.

Real-World Impact

Cloudflare's own developer documentation saw significant crawling of legacy content by major AI bots. After enabling the feature, 100% of AI training crawler requests to pages with canonical tags were successfully redirected in the first week.

The company also introduced Response status code analysis to its Radar AI Insights page. This feature visualizes how AI crawlers interact with websites, showing the distribution of successful, redirected, and error responses across the web.

This data provides a clearer picture of the web's current response to AI crawlers at scale.

Redirects for AI Training aims to ensure AI models are trained on the most accurate and up-to-date information available.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.