Databricks Hits Petabyte Scale Ingest

Databricks Zerobus Ingest achieves petabyte-scale data ingestion at 12 GB/s per table, eliminating infrastructure management.

7 min read
Diagram illustrating Databricks Zerobus Ingest architecture and data flow.
Conceptual overview of Databricks Zerobus Ingest's data handling capabilities.

Databricks has launched Zerobus Ingest, a serverless streaming API designed to handle petabyte-scale data pipelines without requiring manual infrastructure setup. This new service promises to ingest massive volumes of time-series data from sources like IoT sensors and autonomous vehicles directly into Delta tables, governed by Unity Catalog.

Visual TL;DR. Petabyte Scale Ingest leads to Databricks Zerobus Ingest. Databricks Zerobus Ingest enables Eliminates Infrastructure. Databricks Zerobus Ingest features Push-Based API. Databricks Zerobus Ingest uses Autoscaling Mechanism. Autoscaling Mechanism leads to 1 PB in < 24 Hours. 1 PB in < 24 Hours with 12 GB/s Throughput. Databricks Zerobus Ingest integrates with Unity Catalog.

  1. Petabyte Scale Ingest: need for massive data ingestion without infrastructure management
  2. Databricks Zerobus Ingest: serverless streaming API for petabyte-scale data pipelines
  3. Eliminates Infrastructure: no manual setup or management of message queues like Kafka
  4. Push-Based API: accepts data from any producer directly into lakehouse
  5. Autoscaling Mechanism: achieved through dynamic partitioning for efficient scaling
  6. 1 PB in < 24 Hours: demonstrated ingest capability during cosmic data benchmarks
  7. 12 GB/s Throughput: stable ingest rate achieved per single table
  8. Unity Catalog: governs time-series data ingested into Delta tables
Visual TL;DR
Visual TL;DR — startuphub.ai Petabyte Scale Ingest leads to Databricks Zerobus Ingest. Databricks Zerobus Ingest enables Eliminates Infrastructure. Databricks Zerobus Ingest uses Autoscaling Mechanism. Autoscaling Mechanism leads to 1 PB in < 24 Hours enables uses leads to Petabyte Scale Ingest Databricks Zerobus Ingest Eliminates Infrastructure Autoscaling Mechanism 1 PB in < 24 Hours From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Petabyte Scale Ingest leads to Databricks Zerobus Ingest. Databricks Zerobus Ingest enables Eliminates Infrastructure. Databricks Zerobus Ingest uses Autoscaling Mechanism. Autoscaling Mechanism leads to 1 PB in < 24 Hours enables uses leads to Petabyte ScaleIngest DatabricksZerobus Ingest EliminatesInfrastructure AutoscalingMechanism 1 PB in < 24Hours From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Petabyte Scale Ingest leads to Databricks Zerobus Ingest. Databricks Zerobus Ingest enables Eliminates Infrastructure. Databricks Zerobus Ingest uses Autoscaling Mechanism. Autoscaling Mechanism leads to 1 PB in < 24 Hours enables uses leads to Petabyte Scale Ingest need for massive data ingestion withoutinfrastructure management Databricks Zerobus Ingest serverless streaming API forpetabyte-scale data pipelines Eliminates Infrastructure no manual setup or management of messagequeues like Kafka Autoscaling Mechanism achieved through dynamic partitioning forefficient scaling 1 PB in < 24 Hours demonstrated ingest capability duringcosmic data benchmarks From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Petabyte Scale Ingest leads to Databricks Zerobus Ingest. Databricks Zerobus Ingest enables Eliminates Infrastructure. Databricks Zerobus Ingest uses Autoscaling Mechanism. Autoscaling Mechanism leads to 1 PB in < 24 Hours enables uses leads to Petabyte ScaleIngest need for massivedata ingestionwithout… DatabricksZerobus Ingest serverlessstreaming API forpetabyte-scale data… EliminatesInfrastructure no manual setup ormanagement ofmessage queues like… AutoscalingMechanism achieved throughdynamicpartitioning for… 1 PB in < 24Hours demonstrated ingestcapability duringcosmic data… From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Petabyte Scale Ingest leads to Databricks Zerobus Ingest. Databricks Zerobus Ingest enables Eliminates Infrastructure. Databricks Zerobus Ingest features Push-Based API. Databricks Zerobus Ingest uses Autoscaling Mechanism. Autoscaling Mechanism leads to 1 PB in < 24 Hours. 1 PB in < 24 Hours with 12 GB/s Throughput. Databricks Zerobus Ingest integrates with Unity Catalog enables features uses leads to with integrates with Petabyte Scale Ingest need for massive data ingestion withoutinfrastructure management Databricks Zerobus Ingest serverless streaming API forpetabyte-scale data pipelines Eliminates Infrastructure no manual setup or management of messagequeues like Kafka Push-Based API accepts data from any producer directlyinto lakehouse Autoscaling Mechanism achieved through dynamic partitioning forefficient scaling 1 PB in < 24 Hours demonstrated ingest capability duringcosmic data benchmarks 12 GB/s Throughput stable ingest rate achieved per singletable Unity Catalog governs time-series data ingested intoDelta tables From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai Petabyte Scale Ingest leads to Databricks Zerobus Ingest. Databricks Zerobus Ingest enables Eliminates Infrastructure. Databricks Zerobus Ingest features Push-Based API. Databricks Zerobus Ingest uses Autoscaling Mechanism. Autoscaling Mechanism leads to 1 PB in < 24 Hours. 1 PB in < 24 Hours with 12 GB/s Throughput. Databricks Zerobus Ingest integrates with Unity Catalog enables features uses leads to with integrates with Petabyte ScaleIngest need for massivedata ingestionwithout… DatabricksZerobus Ingest serverlessstreaming API forpetabyte-scale data… EliminatesInfrastructure no manual setup ormanagement ofmessage queues like… Push-Based API accepts data fromany producerdirectly into… AutoscalingMechanism achieved throughdynamicpartitioning for… 1 PB in < 24Hours demonstrated ingestcapability duringcosmic data… 12 GB/sThroughput stable ingest rateachieved per singletable Unity Catalog governs time-seriesdata ingested intoDelta tables From startuphub.ai · The publishers behind this format

The system bypasses the need for traditional message queues like Kafka, offering a push-based API that accepts data from any producer and writes it to the lakehouse. According to the Databricks blog post, Zerobus Ingest demonstrated the ability to ingest one petabyte of data in under 24 hours, maintaining a stable throughput of 12 GB/s to a single table during benchmarks.

Related startups

Architectural Innovations

At the core of Zerobus Ingest's capability is its autoscaling mechanism, achieved through dynamic partitioning. Unlike traditional streaming architectures that require pre-provisioning and managing static partitions, Zerobus shifts the unit of ordering from partitions to stream connections. This allows the system to dynamically scale compute resources up or down based on real-time demand.

This approach ensures that pods can be added during ingestion spikes and removed when demand subsides, leading to efficient compute utilization. The system also incorporates a custom, zero-copy protobuf decoder called ZeroParser. This component parses data efficiently without unnecessary memory allocations, achieving high throughput even with dynamic schemas.

Furthermore, Zerobus Ingest implements a latency-optimized write-ahead log (WAL) to ensure data durability and enable quick message handoff. This WAL, combined with gRPC bidirectional streaming, allows clients to receive acknowledgments for committed data offsets, enabling them to safely clear their in-flight buffers.

Benchmarking with Cosmic Data

To prove its capabilities, Databricks used NASA’s NEOWISE dataset, comprising 200 billion data points over 11 years, for its benchmarks. The test involved simulating a real-world fan-in pattern using Locust to coordinate thousands of concurrent streams, stressing the ingestion service at scale.

The results underscore Zerobus Ingest's ability to manage extreme data volumes and fluctuating ingestion patterns seamlessly. This advancement in Databricks Zerobus Ingest significantly simplifies the process of building and maintaining high-throughput, petabyte-scale streaming data pipelines, streamlining real-time data applications.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.