Iceberg v3 Ushers In New Data Era

Apache Iceberg v3 advances data interoperability, with Snowflake integrating new capabilities to break down silos and empower AI.

3 min read
Iceberg v3 Ushers In New Data Era
Snowflake

The promise of the open lakehouse, a single, governed data copy accessible by any engine, has long been hampered by "proprietary gravity." While Apache Iceberg emerged as a solution for data interoperability, an open format alone is insufficient in the AI age. Data silos and semantic fragmentation now impose significant taxes on innovation, forcing costly data movement and diluting the context AI requires. Snowflake is actively working toward a future of full interoperability, enabling users to act on data where it resides without compromising governance or semantic context.

Achieving this requires interoperability at every architectural layer, grounded in vendor-neutral, community-driven initiatives. Data interoperability, starting with a common table format, is paramount. Iceberg v3 represents a critical milestone, expanding interoperability to semi-structured data and change data capture (CDC). Snowflake will soon offer broader support for these v3 capabilities.

Related startups

Iceberg v3 Enhancements

With Iceberg v3, more data becomes accessible from more engines. Snowflake will support use cases including:

  • VARIANT data type for semi-structured data within Iceberg tables.
  • Row lineage for tracking modifications across engines, powering row-level CDC.
  • Deletion vectors for more performant row-level deletes and simplified maintenance.
  • Nanosecond-precision timestamps for high-frequency data.
  • Native geospatial type support for geometric information.

Snowflake's commitment extends to breaking down transactional silos with pg_lake, an open-sourced extension that transforms PostgreSQL into a functional part of a data lakehouse. It allows Postgres to query data directly from data lakes and natively manage Iceberg tables.

Governance portability is addressed by Apache Polaris, an open-sourced Iceberg catalog. The goal is to enforce fine-grained access controls consistently across any engine, without vendor lock-in or performance penalties. This is achieved through standards for Policy Exchange, Governance Federation, and Read Restriction APIs.

Semantic context is being standardized with Open Semantic Interchange (OSI), a vendor-neutral specification for metrics, dimensions, and relationships. This aims to provide AI agents with a governed "map of truth," preventing wasted tokens and inaccurate interpretations. Snowflake customers can leverage semantic views in the Snowflake AI Data Cloud, built on constructs that OSI is standardizing.

Snowflake's engineering culture is shifting towards active community collaboration. Engineers have contributed thousands of commits to open source projects, prioritizing operational transparency and seeking community consensus on proposals like collations in Iceberg. Active development is already underway for Iceberg v4, focusing on metadata redesigns to reduce latency for streaming workloads.

True open data interoperability is a collective responsibility, moving beyond "proprietary gravity" to meet the demands of the AI age. No single vendor can solve data silos alone; it requires a diverse community working toward this common goal.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.