Cloudflare is rolling out a new feature for its Browser Rendering service that allows developers to crawl entire websites with a single API call. The new /crawl endpoint, now in open beta, simplifies the process of gathering website data.
Users submit a starting URL, and the service automatically discovers, renders pages in a headless browser, and returns content in multiple formats, including HTML, Markdown, and structured JSON. This capability is particularly useful for training large language models, extracting data for RAG pipeline data extraction, or conducting site-wide research and monitoring. The service, detailed on Cloudflare Docs, operates asynchronously, providing a job ID upon submission and allowing users to check results as pages are processed.
Key Features for Web Scraping and AI
The Cloudflare Browser Rendering crawl endpoint offers several key features for efficient web scraping with AI and data collection. Output formats include HTML, Markdown, and structured JSON, leveraging Workers AI for processing.
Users can control the crawl scope with options for depth, page limits, and URL pattern matching. Automatic page discovery pulls URLs from sitemaps and links, while incremental crawling features like modifiedSince and maxAge prevent redundant fetches, saving time and cost. A static mode allows fetching plain HTML without rendering, speeding up crawls for static sites.
The service also functions as a well-behaved bot, respecting robots.txt directives, including crawl delays. This enhancement is available on both Cloudflare Workers Free and Paid plans, offering a streamlined approach to comprehensive website data acquisition for various applications, from web scraping with AI to building complex RAG pipeline data extraction systems.



