Every AI agent, from coding assistants to customer support bots, fundamentally relies on search to access the right information at the right time. Building this capability from scratch involves complex infrastructure like vector indexes and data pipelines. Cloudflare is simplifying this with its new Cloudflare AI Search, a managed primitive designed to streamline how developers equip their agents with search functionality.
Previously known as AutoRAG, the service now offers a plug-and-play experience. Developers can dynamically create search instances, ingest data, and query them directly from Cloudflare Workers or the Agents SDK.
Hybrid Search and Built-in Indexing
A key feature is hybrid search, which combines semantic (vector) search with traditional keyword matching (BM25) in a single query. This ensures both conceptual understanding and precise term matching, addressing limitations of each approach individually.
Cloudflare AI Search also eliminates the need for separate infrastructure. New instances include built-in storage and vector indexes, powered by Cloudflare's R2 and Vectorize. Data can be uploaded directly via API, and indexing occurs automatically.
The new ai_search_namespaces binding allows for runtime creation and deletion of search instances. This enables developers to spin up unique search contexts per agent, per customer, or per language without requiring redeployments.
Dynamic Context for Customer Support
Consider a customer support agent. It needs access to both broad product documentation and specific customer interaction history. Product docs are too large for context windows, and customer histories grow over time.
AI Search allows for a shared instance for product knowledge, backed by an R2 bucket. Crucially, it enables the creation of per-customer instances that dynamically ingest summaries of past resolutions.
When a customer returns, the agent can query both the shared product docs and their specific history in a single call. This provides a richer, more contextual understanding, preventing repetitive solutions and speeding up resolution times.
The agent code leverages the Agents SDK, defining tools for searching knowledge bases and saving resolution summaries. The LLM then decides when to invoke these tools based on the conversation.
This dynamic context is immediately searchable for future interactions.
Under the Hood: Configurable Retrieval
The retrieval pipeline within AI Search is highly configurable. Hybrid search fuses results from parallel vector and BM25 searches. Developers can fine-tune indexing options like tokenizers (e.g., 'porter' for natural language, 'trigram' for code) and retrieval settings like keyword match modes ('AND' or 'OR').
Fusion methods, such as Reciprocal Rank Fusion (RRF) or max fusion, combine results. Optional reranking using a cross-encoder can further refine results by evaluating query-document pairs.
Boosting allows developers to influence rankings based on document metadata, such as timestamps or custom fields. This helps surface more relevant information, like recent news articles over older ones.
Cross-instance search simplifies querying across different data sources. An agent can search product docs and customer history simultaneously through a single API call, with results merged and ranked automatically.
Simplified Instance Management
Previous versions required manual setup of R2 buckets and Vectorize indexes. New AI Search instances offer integrated storage and indexing, simplifying deployment significantly.
The uploadAndPoll API uploads a file and waits for indexing to complete, enabling immediate searching. Instances can also connect to external data sources like R2 buckets or websites on a sync schedule.
The ai_search_namespaces binding replaces older APIs, providing a cleaner interface for runtime management of search instances, aligning with Cloudflare's broader strategy for AI agents.
