Databricks Streamlines Real-Time Data Apps

Databricks' Zerobus Ingest and Lakebase combine for streamlined IoT data ingestion and low-latency operational applications directly on the Lakehouse.

Mar 6 at 6:33 PM3 min read
Diagram illustrating the Databricks Zerobus Ingest and Lakebase architecture for real-time applications.

Databricks is simplifying the path from raw event data to real-time applications with its Zerobus Ingest and Lakebase offerings. Traditionally, ingesting high-velocity data from sources like IoT devices, clickstream analytics, and application telemetry required complex, multi-hop architectures involving message queues and separate processing jobs. This approach introduced latency, data duplication, and operational overhead. According to Databricks, Zerobus Ingest and Lakebase aim to streamline this entire process.

Zerobus Ingest, part of Lakeflow Connect, provides APIs for pushing event data directly into the Databricks Lakehouse. It eliminates the need for a separate message bus layer, reducing infrastructure complexity and enabling near real-time ingestion at scale, with latencies reportedly as low as 5 seconds. This allows thousands of clients to write data concurrently.

Lakebase: Postgres Power on the Lakehouse

Complementing Zerobus Ingest, Databricks Lakebase offers a fully managed, serverless Postgres database embedded within the Databricks platform. This allows for low-latency operational and transactional workloads to run directly on the same data powering analytics and AI. It effectively bridges the gap between analytical data in the Lakehouse and the needs of real-time applications, removing the need for complex, custom ETL pipelines for reverse ETL. This integration offers a significant advantage for operational workloads on Databricks.

The combined solution addresses a critical challenge: serving curated analytical data for operational use cases. Building custom applications typically required provisioning and managing separate OLTP databases and handling reverse ETL processes. This diverted developer focus from core application development. The Databricks approach integrates these capabilities, allowing for faster app development.

Building a Real-Time Driver Monitoring App

As a practical example, Databricks outlines building a near real-time application for a food delivery company. The architecture involves drivers' phones sending GPS telemetry data via the Zerobus SDK directly to a Delta table in Databricks Unity Catalog. A continuous sync pipeline then pushes this data to a Lakebase Postgres instance.

A FastAPI backend, built using Databricks Apps, connects to Lakebase via WebSockets to stream these real-time updates. A front-end application, also built on Databricks Apps, visualizes the live driver activity and order delivery status for management. This end-to-end flow demonstrates how the platform simplifies data ingestion and real-time application development, eliminating the complexity of traditional multi-hop streaming architectures.

The platform's capabilities extend to integrating with Unity Catalog, Lakeflow Connect, and Spark Declarative Pipelines, providing a unified environment for data and AI initiatives. This approach accelerates time to value by enabling teams to apply analytics and AI tools directly on their data for use cases like fraud detection or predictive maintenance.

Databricks Lakebase, which recently received a boost on Azure, is now generally available on AWS, with Azure support in beta. Zerobus Ingest is available on both AWS and Azure. Databricks Apps are also part of this integrated offering, facilitating the creation of interactive applications directly on the Lakehouse.