Databricks is exploring the potential of Large Language Model (LLM) agents to solve one of the most persistent challenges in database performance: join order optimization. This research, detailed on the Databricks blog, applies frontier LLMs to a problem that has long vexed traditional query optimizers.
The core issue lies in the combinatorial explosion of possible execution plans as the number of tables in a SQL query increases. Traditional systems often rely on heuristics and cardinality estimators, which can misjudge subquery sizes, leading to inefficient query execution. This is where LLM agents aim to step in, acting as data-driven DBAs.
The Join Order Dilemma
Consider a query joining multiple tables like Actors, Movies, and Companies. The order in which these tables are joined significantly impacts performance. For instance, finding movies starring Scarlett Johansson first and then filtering for Sony productions might be faster or slower than the reverse, depending entirely on the data distribution.