Cloudflare's AI Data Agent

Cloudflare unveils Town Lake, a unified data platform, and Skipper, its AI agent for natural language data querying, enhancing internal data access and governance.

8 min read
Diagram illustrating Cloudflare's unified data platform architecture with Town Lake and Skipper.
Cloudflare's approach to unifying data and enabling AI-driven insights.· Cloudflare

Cloudflare, processing over a billion events per second across its global network, faced a significant data challenge. Information was scattered across dozens of databases, cloud buckets, and streaming platforms, making even simple queries a complex, knowledge-intensive task. This data sprawl hindered effective insight generation.

Visual TL;DR. Data Sprawl Problem leads to Town Lake Platform. Data Sprawl Problem leads to Skipper AI Agent. Hyper-growth leads to fragmentation leads to Data Sprawl Problem. Town Lake Platform leads to Unified Data Access. Skipper AI Agent leads to Unified Data Access. Unified Data Access leads to Enhanced Insight Generation. Skipper AI Agent leads to Auditable Answers.

  1. Data Sprawl Problem: information scattered across dozens of databases and platforms
  2. Hyper-growth leads to fragmentation: too many disparate systems and lack of discoverability
  3. Town Lake Platform: unified data analytics platform with single SQL interface
  4. Skipper AI Agent: AI agent for natural language data querying
  5. Unified Data Access: enables employees to ask questions in plain English
  6. Auditable Answers: receive auditable answers rapidly
  7. Enhanced Insight Generation: combats data fragmentation and hinders insight generation
Visual TL;DR
Visual TL;DR — startuphub.ai Data Sprawl Problem leads to Town Lake Platform. Data Sprawl Problem leads to Skipper AI Agent. Town Lake Platform leads to Unified Data Access. Skipper AI Agent leads to Unified Data Access. Unified Data Access leads to Enhanced Insight Generation Data Sprawl Problem Town Lake Platform Skipper AI Agent Unified Data Access Enhanced Insight Generation From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Data Sprawl Problem leads to Town Lake Platform. Data Sprawl Problem leads to Skipper AI Agent. Town Lake Platform leads to Unified Data Access. Skipper AI Agent leads to Unified Data Access. Unified Data Access leads to Enhanced Insight Generation Data SprawlProblem Town LakePlatform Skipper AI Agent Unified DataAccess Enhanced InsightGeneration From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Data Sprawl Problem leads to Town Lake Platform. Data Sprawl Problem leads to Skipper AI Agent. Town Lake Platform leads to Unified Data Access. Skipper AI Agent leads to Unified Data Access. Unified Data Access leads to Enhanced Insight Generation Data Sprawl Problem information scattered across dozens ofdatabases and platforms Town Lake Platform unified data analytics platform withsingle SQL interface Skipper AI Agent AI agent for natural language dataquerying Unified Data Access enables employees to ask questions inplain English Enhanced Insight Generation combats data fragmentation and hindersinsight generation From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Data Sprawl Problem leads to Town Lake Platform. Data Sprawl Problem leads to Skipper AI Agent. Town Lake Platform leads to Unified Data Access. Skipper AI Agent leads to Unified Data Access. Unified Data Access leads to Enhanced Insight Generation Data SprawlProblem informationscattered acrossdozens of databases… Town LakePlatform unified dataanalytics platformwith single SQL… Skipper AI Agent AI agent fornatural languagedata querying Unified DataAccess enables employeesto ask questions inplain English Enhanced InsightGeneration combats datafragmentation andhinders insight… From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Data Sprawl Problem leads to Town Lake Platform. Data Sprawl Problem leads to Skipper AI Agent. Hyper-growth leads to fragmentation leads to Data Sprawl Problem. Town Lake Platform leads to Unified Data Access. Skipper AI Agent leads to Unified Data Access. Unified Data Access leads to Enhanced Insight Generation. Skipper AI Agent leads to Auditable Answers Data Sprawl Problem information scattered across dozens ofdatabases and platforms Hyper-growth leads to fragmentation too many disparate systems and lack ofdiscoverability Town Lake Platform unified data analytics platform withsingle SQL interface Skipper AI Agent AI agent for natural language dataquerying Unified Data Access enables employees to ask questions inplain English Auditable Answers receive auditable answers rapidly Enhanced Insight Generation combats data fragmentation and hindersinsight generation From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Data Sprawl Problem leads to Town Lake Platform. Data Sprawl Problem leads to Skipper AI Agent. Hyper-growth leads to fragmentation leads to Data Sprawl Problem. Town Lake Platform leads to Unified Data Access. Skipper AI Agent leads to Unified Data Access. Unified Data Access leads to Enhanced Insight Generation. Skipper AI Agent leads to Auditable Answers Data SprawlProblem informationscattered acrossdozens of databases… Hyper-growthleads to… too many disparatesystems and lack ofdiscoverability Town LakePlatform unified dataanalytics platformwith single SQL… Skipper AI Agent AI agent fornatural languagedata querying Unified DataAccess enables employeesto ask questions inplain English Auditable Answers receive auditableanswers rapidly Enhanced InsightGeneration combats datafragmentation andhinders insight… From startuphub.ai · The publishers behind this format

To combat this, Cloudflare developed two internal tools: "Town Lake," a unified data analytics platform, and "Skipper," an AI data agent built on top of it. Town Lake provides a single SQL interface to all of Cloudflare's data, while Skipper enables employees to ask questions in plain English and receive auditable answers rapidly.

The Data Sprawl Problem

Hyper-growth often leads to data fragmentation. Cloudflare experienced this with too many disparate systems, sampled data unsuitable for critical functions like billing, reliance on external vendors for internal reporting, and a lack of discoverability for data assets.

This situation fostered a culture where data infrastructure was viewed as a secondary function rather than critical technology.

Related startups

The Vision for Unified Data

The goal was to create a centralized, secure platform where authorized users could access fresh, accurate data. This included handling both high-volume sampled data for dashboards and precise, unsampled data for billing or security investigations.

Key requirements included automated PII detection, robust security and governance, auditable access logs, and time-bound permissions. Crucially, the entire system was to be built using Cloudflare's own product suite, such as R2 for storage and Workers for compute.

Ultimately, the vision was an interface that democratized data access, moving beyond SQL to empower anyone with a need to know.

Town Lake: The Data Platform

At its core, Town Lake employs a data lakehouse architecture. This combines a query engine with object storage and a metadata layer to present data as a unified database.

Key components include:

  • Query Engine: Apache Trino is used to query data across various sources, including Postgres, ClickHouse, and Iceberg tables on R2, without materializing intermediate results.
  • Data Catalog: Cloudflare's managed Apache Iceberg service, R2 Data Catalog, stores data with features like schema evolution and time travel, optimizing storage costs based on data recency.
  • Metadata Catalog: DataHub centralizes metadata, including table schemas, ownership, lineage, and glossary terms, aiding data discovery.
  • Access Control: Lifeguard manages access rules, integrating with Cloudflare Access for authentication and providing dynamic JSON policies to the query engine.
  • PII Detection: Skimmer, a PII scanning service, uses Workers AI to classify columns for sensitive data, flagging findings for review.
  • Transformation Engine: Transformer, built on Workflows, orchestrates ELT processes using SQL transformations defined in YAML.
  • Ingestion: A dedicated orchestrator manages the extraction, transformation, and loading of data from operational systems into R2 as Iceberg tables.

Governance by Construction: Default-Closed

Town Lake adopts a default-closed security model. Tables are inaccessible until reviewed and approved, with automated scanning for PII.

This process is streamlined through self-serve workflows, where users can easily request reviews for unapproved tables. Sensitive columns are hidden by default, with PII access granted per session and logged.

Skipper: The AI Data Agent

Skipper acts as a conversational AI data agent, translating natural language questions into validated data queries. It leverages Town Lake's capabilities and Cloudflare's developer platform, including Workers AI.

Users interact via a chat interface, posing questions like "Show me the top 10 customers by R2 storage cost." Skipper handles table discovery, query generation, execution, and result presentation, including charts and dashboards.

The agent supports iterative refinement of queries and includes closed-loop reasoning to investigate and correct potential errors.

Context is King for LLMs

To mitigate LLM hallucinations and ensure accurate results, Skipper employs multiple layers of grounded context:

  • Schema and Usage Metadata: DataHub provides comprehensive schema information and historical query patterns.
  • Human Annotations: Descriptions and tags in DataHub offer human-curated context.
  • Code-Derived Knowledge: SQL transformation logic from the Transformer pipeline enriches understanding of data meaning.
  • Curated Data Models: Human-written documents guide users on how to interpret key data concepts.
  • Runtime Introspection: Live queries to Trino serve as a final safety net for context verification.

Skipper's tools are accessible through Workers AI and an MCP server, offering flexibility for different user workflows.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.