Databricks is rolling out a suite of new sketch functions, built on Apache DataSketches, designed to dramatically accelerate common analytical queries. These functions offer approximate answers to complex questions, enabling faster decision-making without the hefty compute costs associated with exact calculations.
The core benefit lies in transforming compute-intensive tasks like percentile calculations, distinct counts, and top-K rankings from minutes or hours into milliseconds. This is achieved by using bounded-memory approximations, typically with a configurable relative error of 1-2%, a trade-off deemed acceptable for many decision-support scenarios. This approach to approximate query processing significantly enhances data analytics performance optimization.